There is no one straight fix on data quality. There are various options and possibilities. Identifying and getting a commitment on the workable approach for your organization is equivalent to completing half the journey. Let's say that you are a house-hold, and you have multiple taps in the house, which are drawing water from the main tank on the roof-top.
Existing Data Cleansing This is the easiest path in life for any thing. As existing data clean-up does not include much change management, and is more of a crisis management. There are various shades of existing data clean-up: Complete Data Clean-up: - This is typically driven by a crisis resulting in sudden wake-up call combined with fear of unknown (My general ledger has an issue, regulators are complaining. What else could be wrong in the system??? Let's check and fix the whole damn thing for once..)
- A major initiative is needed, lots of organization energies are spent on scanning the data.
- Root Causes are found, and which ever are easy to fix are fixed and rest are left out for subsequent ‘Data Quality Milestones’, and the focus is mainly on clean-up of the data.
- If it is not possible to fix the root-cause, the complete data clean-up is done with fixed frequency to maintain sanity.
This is also equivalent to cleaning-up your main tank, de-silt it, remove the rusty patches, filter the entire water and put it back. Select data clean-up As one starts working on the cost, one realizes that it is not possible to do an entire clean-up and one goes down to the select set of data, which needs to be cleaned-up first, and rest is left to 'mid-way phase assessment’. For example – Instead of cleaning-up the entire set of telecom retail customers, lets do it for the ones who have been active. Limited clean-up in select data
This is the realistic and practical level. Even if you have manageable chunk of data to clean, one may end-up cleaning the data, which is important. For example- Within active customers, it is not possible to fix the date of birth, so lets focus on addresses and names.
Filter the data inflow to ensure clean input
This is just like putting a ‘Reverse Osmosis’ system in front of incoming water supply. This will ensure that once the existing data is cleaned-up, there will be no further pollution. This also has different shades:
Filter all data inflow to ensure complete cleanliness: This is equivalent to putting a water filtration and water distillation plan ahead of the main tank. Filter select incoming data to ensure complete cleanliness: This is equivalent to placing a water distillation plan in front of the tap in the kitchen from where you fill-up drinking water.
Filter select data inflow on select parameters:
This is like planning an 'RO (reverse osmosis) water filter' in front of drinking water tap and planning a charcoal filter in front of the tap providing bathing water.
Issue Prevention for Data Quality Issues
It’s a typical quality speak. Prevention is better than cure. This is most rewarding, but difficult to manage due to change management effort and ‘out of control circle' issues. Refer to Data Quality Challenges for a comprehensive list of root-causes to data quality. The prevention is another word for Data Quality Assurance, which is covered in a separate topic. Which method to apply out of a long list of measures, depends on business case, do-ability and importance of data.
|