Is poor data quality letting your AI down?

The most successful businesses in the future will be those that optimize their AI investment. As companies start their journey to AI readiness, they must develop robust data management strategies to handle increased data volume and complexity, and ensure trusted data is available for business use. Poor quality data is a burden for users trying to build reliable models to extrapolate insights for revenue-generating activities and better business outcomes.

It’s not unusual for business users to prioritize access to the data they need over its quality or usability. The simple truth is if an organization has bad quality data and uses it to feed AI tools, it will inevitably deliver poor quality and untrustworthy results.

Why data quality matters

Data quality is critical because it acts as the bridge between technical and business teams, enabling effective collaboration and maximizing the value derived from data. Depending on the data source and governance requirements, this presents a time-consuming challenge to data scientists who can spend up to 80 percent of their time just cleaning the data before they can even begin to work with it.

Amalgamating data sources is one huge task. The work of combining and transforming multiple data sets, such as raw data from regular business operations, legacy data in a variety of formats, or new data sets acquired following an acquisition or merger, should not be underestimated.

This is important work for business development purposes. Data is critical to better target marketing and sales, direct product innovation and market expansion, improve customer service, and even create an AI chatbot or agent to enhance brand experience. It’s also vital in ensuring compliance with the latest regulations and preparing for likely future requirements in key areas including data privacy and protection, so businesses need to know which data contains sensitive information to secure it and avoid leakage or breach.

But not all data is equal and organizations need to be able to identify the high value data that is business-critical from the low value, low risk data which does not require governance or protecting. The only way to do this is to ensure data is clean and high quality.

Cultivating a data-driven culture

Being data-driven is developing an organization-wide culture that understands and actively seeks to extract value from data to underpin all decision-making, ensuring better business outcomes. It’s less about having the data and more knowing how to optimize it.

This requires a high level of maturity and commitment to developing this capability over time. One of the primary challenges for organizations becoming more data-driven is connecting technical and business teams effectively. This is not a new issue, but many companies have not yet addressed it successfully and it is hindering their ability to become data-driven.

Data teams are often focused on building data governance foundations and setting up various tools and processes to help their organization. However, the business teams may find the data they are getting is too technical, not of the right quality, not in the right format, or simply not the right data they need. The data team may not understand the business context of the request and therefore what data is required, and this unintentional misalignment is a huge challenge for organizations to overcome.

As a result, companies end up with data teams doing their best to build robust data governance systems, but business teams remain unsatisfied and underutilize the data. This is where accelerating data transformation with AI-augmented data quality initiatives becomes mission-critical. Business users need solutions that allow them to work with data independently—changing formats, enriching it, and resolving issues automatically through smart algorithms. This provides the trustworthy data foundation required for implementing successful AI projects.

Successful AI starts with data governance

Despite the current hype surrounding AI, Gartner has, however, estimated a loss of confidence in generative AI projects due to poor data quality, as one main reason, with at least 30 per cent predicted to be abandoned by 2025 at the proof of concept stage.

Ensuring data quality stems from establishing an organization-wide data governance strategy. This will ensure the business is focused on the desired outcomes of using AI and generative AI, rather than rolling out AI regardless of the state of the data that will be used to train it. AI is, however, also a tool that will help get the data into a state of AI readiness by reducing the manual oversight and labor traditionally needed to transform and cleanse data by automating processes and rules. It can also help with profiling and classifying data and detecting anomalies, contributing to the overall health of data sets.

GenAI is able to capture data in non-standard formats including tables, images and even audio, to ensure data quality rules are applied universally. AI also enables non-technical users to self-serve and find the data insights they need by using natural language to process queries, supporting the creation of business value for an organization in any and all of its departments. This process of data democratization is central to the success of any AI initiatives, as restricting their application and benefit to technical teams will severely restrict their impact.

Ultimately, quality is more important than quantity when it comes to AI training data. Each poor quality record will add confusion to the LLM, increasing the risk of hallucinations, and when poor quality data is consistently used, the trustworthiness of the outputs will decline. Today, there is an inflection point created by the rapid advancement of AI toolsets, the exponential increase in data, and digital and AI regulation which means organisations have a window of opportunity to get their data strategy in place. With competitive advantage, market expansion, customer experience and business growth all at stake, the winners will be those who prioritize this transformation now.

We list the best data visualization tools.

This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro