No one doubts the value of data, but inaccurate, low quality, poorly organised data is a growing problem for organisations across multiple industries.

It’s neither new nor controversial to say that the world runs on data. Big data analytics are fundamental to maintaining agility and visibility. This is not to mention unlocking valuable insights that let orangisations stay competitive. Globally, the big data market is expected to grow to more than $401 billion by the end of 2028—up from $220 billion last year. 

Business leaders can pretty much universally agree that data is undeniably important. However, actually leveraging that data into impactful business outcomes remains a huge challenge for a lot of companies. Increasingly, focusing on the volume and variety of data alone leaves organisations without the one thing they really need: data they can trust. 

Data quality, not just quantity 

No matter how sophisticated the analytical tool, the quality of data that goes in determines the quality of insight that comes out. Good quality data is data that is suitable for its intended use. Poor quality data fails to meet this criterion. In other words, poor quality data cannot effectively support the outcomes it is being used to generate.

Raw data often falls into the category of poor quality data. For instance, data collected from social media platforms like Twitter is unstructured. In this raw form, it isn’t particularly useful for analysis or other valuable applications. Nonetheless, raw data can be transformed into good quality data through data cleaning and processing, which typically requires time.

Some bad data, however, is simply inaccurate, misleading, or fundamentally flawed. It can’t be easily refined into anything useful, and its presence in a data set can spoil any results. Data that lacks structure or has issues such as inaccuracy, incompleteness, inconsistencies, and duplication is considered poor quality data.

Is AI solving the problem or creating it? 

Concerns over data quality are as old at spreadsheets and maybe even the abacus. Managing, structuring, and creating insights from data only gets more complicated the more data you gather, and organisations today gather a frighteningly large amount of data as a matter of course.They might not be able to do anything with it, but everyone knows that data is valuable, so organisations take a more is more approach and hoover up as much as they can.  

New tools like generative artificial intelligence (AI) promise to help companies capture the value present in their data. The technology exploded onto the scene, promising rapid and sophisticated data analysis. Now, questionable inputs are being blamed for the hallucinations and other odd behaviours that very publicly undermined LLMs’ effectiveness. The current debacle with Google’s AI-assisted search being trained on reddit posts is a perfect example. 

However, AI has also been criticised for muddying the waters and further degrading the quality of data available. 

“How can we trust all our data in the generative AI economy?” asks Tuna Yemisci, regional director of Middle East, Africa and East Med at Qlik in a recent article. The trend isn’t going away either, with reports coming out earlier this year that observe data quality getting worse. A survey by dbt Labs found in April that poor data quality was the number one concern of the 456 analytics engineers, data engineers, data analysts, and other data professionals who took the survey.

The feedback loop 

Not only is AI undermining the quality of existing data, but bad existing data is undermining attempts to find applications for generative AI. The whole issue is in danger of creating a feedback loop that undermines the tech industry’s biggest bets for the future of digital economic activity. 

“There’s a common assumption that the data (companies) have accumulated over the years is AI-ready, but that’s not the case,” Joseph Ours, a Partner at Centric Consulting wrote in a recent blog post. “The reality is that no one has truly AI-ready data, at least not yet… Rushing into AI projects with incomplete data can be a recipe for disappointment. The power of AI lies in its ability to find patterns and insights humans might overlook. But if the necessary data is unavailable, even the most sophisticated AI cannot generate the insights organisations want most.”

  • Data & AI

Related Stories

We believe in a personal approach

By working closely with our customers at every step of the way we ensure that we capture the dedication, enthusiasm and passion which has driven change within their organisations and inspire others with motivational real-life stories.