How does an organization go about determining data quality? What are the first steps in an overall process to achieve the organizational goals related to data quality? First, before we can answer any of these questions, I think it’s important to agree upon a standard definition of data quality.
Wikipedia gives the following definition – “Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is “fit for [its] intended uses in operations, decision making and planning”. Moreover, data is deemed of high quality if it correctly represents the real-world construct to which it refers.”
This is where I want to touch on a different term, a term that is often meant for data quality, but most people don’t separate the two. That’s Information Quality.
Information Quality is defined as, “Information quality (IQ) is the quality of the content of information systems. It is often pragmatically defined as: “The fitness for use of the information provided.”. IQ frameworks also provides a tangible approach to assess and measure DQ/IQ in a robust and rigorous manner.”
Information Quality is the value of your data in service to your organization’s goals and objectives. Before we start attempting to build a set of processes around improving overall data quality, it’s important to solidify what the expectations are for Information Quality. For our purposes, we’ll stick with the following understanding of Information Quality.
Information Quality should be one of the primary means of determining the performance of business initiatives. Information and Data Quality managed correctly within the goals and guidance of business objectives will primarily be a trailing indicator of where the business is on the journey toward success.
This means that the Information System must be capable of delivering the business the information it needs, when it needs it, and where it needs it, and to the people who need it.
Therefore, it seems that the first step in determining where to focus is to understand the following:
- What portions of my existing data does the business need to see in relation to the stated goals and objectives? This will likely require business analysis to uncover the portions of the system that can lend information to the objective.
- What is the tempo of information delivery? What data ranges are under consideration? When is this information timely to the objectives of the business? For instance, if I need to see property related to the latest sales on homes in a particular area, because my business wants to build new homes in the area, data from three years ago is likely to be of less value than data from three weeks ago. However, if I’m looking at various flood cycle changes to soil on valley farmland, I’ll likely want the last five years data, and not the last five weeks.
- Where do decision makers and stakeholders need to go to view their data – and to a certain extent, where exactly within the view of that data to they need to look? Handing over a massive multi-paged report when the stakeholder needs something that could fit in a short message on a mobile text isn’t helpful. So be sure to clarify where the data needs to reside to help the decision make have what they need when they need it.
- Who needs to see it? This is usually covered in Data Governance and Security, but it’s important to understand who needs to see this data and especially who does not need to see this data.
Does this mean that we don’t need to practice data quality standards across our entire system? Not necessarily? There should still be processes in place to ensure that minimal data quality is met. That data needs to be available to pull from at any time—however, the degree to which you enforce rigor should be measured based on the specific business objectives.
For instance, let’s say that our data team’s investment house has just acquired a smaller investment firm. Our data team is now responsible for importing the newly acquired data into our data system. The acquired business used an Oracle database and traditional ETL processes. Our company uses an AWS Data Lake architecture. Does it make sense to import the entire data set from Oracle into our DynamoDM? Probably not. We likely just need the elements from the Oracle database that fir our business goals and objectives. Now, it does make sense to store that historical data in a way that we can easily access it if it seems that data could be of value for future business objectives, for instance in the RAW areas of our data in flat files? Probably.
When a data team has a large collection of data and no guidance from the business on how that data provides value to business decisions, it’s understandable that data teams aren’t clear on where they should focus their efforts. Data teams should work closely with the business to understand the business goals and objectives—likewise, the business must set aside time to clarify with the data team where the value resides within the Information System. This will allow those teams to better understand where they should focus their data and intelligence quality efforts.