Purpose and Usage
You may refer the chapter of Data Mapping and Assessment with in Data Quality Section . This will give you a fair degree of detail on what and why of DMA. In brief, DMA reverse engineers the state of data quality, data model and data flow chain by studying the databases, system programs, enterprise reports and business programs. The reason for DMA is that it provides the detail on 'what is happening in actual' vs.; what is documented and/or expected'. PLEASE GO TO THE END OF THE PAGE FOR DOWLLOADING THIS WORK-TOOL.
Typically DMA is done when you have inadequate system documentation, OR you are not confident about the quality of data OR you don't have an end-to-end view of how data flows end-to-end. You would do DMA, because, you might be looking for migrating the data, or doing a root-cause analysis on a systems/data issue, or simply creating a new system which has strong linkage with another existing system/database.
Given that there are many 'when' and 'whys' behind DMA, the Data Mapping and Assessment report template tries to be a super-set of what you may create at your end. In this template, we will provide inputs on what sections you may use or not use depending upon your context.
DMA Report is a sort of independent document, which can feed into a data quality program, a Data Warehouse ETL Analysis phase or a Master Data Management program. This is how it links to various initiatives:
DMA report can feed into the System Stability and Data Quality Assessment Template. The difference is that System stability and assessment document has a focus on data quality and stability. It also goes into root-cause analysis and impact assessment of data quality issues. DMA Report on the other hand is a simple 'matter of fact- What is happening' statement, without spending much effort on 'why is it happening'? (Though it will touch upon it).
Help Guide
This Help guide provides explanation for each heading and sub-heading within the work-tool (PLEASE GO TO END OF THE PAGE TO DOWNLOAD THE WORK-TOOL).The same explanation is also provided in the attached document. It will be better for you to refer to the online page as it changes and gets upgraded more frequently.
EXECUTIVE SUMMARY
- Back-Ground: Provide the back-ground on what triggered the DMA exercise, which requires reasonable funding. The trigger could involve data warehouse initiatives, a major data quality program, Data Migration initiative, Business Process Re-engineering initiative or a need to audit comment or a regulatory compliance issue
- The Scope of the Data Mapping & Assessment: Provide high level scope in terms of coverage- Systems, Processes, data entities etc.. AND the activities- Data Profiling, Data Model Assessment, Data Flow Chain analysis...
- Methods Used: State on how did you carried on the DMA- Interviews, Data profiling, using existing documentation, and stakeholders involved
- Key Gaps: State on the key gaps identified in terms of data quality, data model issues and data flow chain issues. The gaps are essentially around what is expected or a standard vs. Actual
- High Level Root Cause of High Priority Gaps: DMA is not focused on detailed root cause analysis for the gaps. This is because RCA is huge exercise in itself. However, to have some conclusive flavor, it can contain the high level root cause for high priority gaps. This sets the tone for system stability and data quality assessment exercise
- Proposed Next Steps: Proposed next steps and help items
- Assumptions: Key assumptions made for DMA. For example, 25% of the data sample is representative enough.
SCOPE OF DMA
DMA has some level of complexities. As DMA is an effort consuming exercise (as any reverse engineering project will be), one has to be careful on the scope to make the best of the funding available. Secondly you would not like to disrupt the normal Business as Usual. Therefore, one needs be careful on who all you involve.
- Functions: The functions and sub-functions, which have a stake-holding in the DMA, and the functions which will be deploying their resources to conduct the DMA
- Business Processes: The business processes run across the functions, manual world and automated world. You may not have an end-to-end business process under your DMA scope. Therefore, you would need to mention which sub-processes, within a business process which will be involved.
- Data-Groups Targeted: Typically the DMA, if triggered by a data quality issue, is generally targeted to resolve data quality issues with certain data groups. For example, Customer Data Quality Issue, or Reconciliation issues with Travel business accounting transactions... As you list down the Data-Groups, do mention the specific Data-Groups which are targeted to be fixed via this DMA. As a side not, DMA will not fix the data issues. It will primarily identify the current state of the data and contribute to the Root-Cause Analysis
- Geographies/Locations Involved: Within a system and process, you may like to focus on select geographies. This can be because you may want to go in phases, or the issues or initiatives which triggered DMA, could be planned for a limited set of geographies. Therefore, you may like to focus on those geographies first
- Systems and Data-Bases Involved: This is fairly straight forward. Depending upon the trigger of the DMA, you can identify the systems and databases which you will include in DMA. For example, if the DMA is being done for designing and scoping the ETL of a Data Mart, you will include all the ETL source systems as part of your DMA. However, with in these source systems, you may like to focus on the customer and related databases, as the Data Mart is for customer value and profitability analysis.
- DMA activities in Scope: All the previous points are on the entities involved in DMA. The other part of the scope is around the extent to which you will do the DMA. For example, ere are 3 main are areas of DMA- Column & Table Analysis including data value distribution, Data Model Analysis and Data Flow Chain analysis. You may like to do just column & table analysis, in case you are just looking for distribution of accurate vs. inaccurate data.
- DMA Scope Matrix: This is the appendix. It contains the detailed list of all the functions, processes, Systems, Geographies, Data-Groups plotted against the DMA activities, which you need to do
METHODS DEPLOYED
This is a free form, where the process of conducting the DMA is listed out. This includes some initial information gathering methods, like
- Interviews with managers, processors and technology staff
- Segregating the data as per logical groupings.
- Production issues review
- Analysis of Customer complaints
After developing broad level understanding, here are the methods which you would undertake to do the detailed DMA
- Running ad-hoc queries on the database.
- Application of Data Profiling tools
- Reviewing the database design in the system
- Reviewing the programming logic
- Reviewing the front-end screens
- Review Systems Documentation
- Reviewing business process documents
- Reviewing Enterprise reports. Etc...
The idea here is to share on 'what was the entity' (System, Business Process, Data-Groups etc.) and 'what method(s)' did you apply to come out with 'what information'. For example, you can say that you reviewed system documentation and applied data profiling tools to come out with the value distribution of the customer data. PLEASE GO TO THE END OF THE PAGE FOR DOWLLOADING THIS WORK-TOOL.
OBSERVATIONS AND GAP STATEMENT
This is a free form with no specific format. Purely depends on the data and type of assessment undergone. ONE important addition is the proposed priority and criticality rating of the gaps. The observations are typically Categorized as per the DMA Task, followed by a comment on the level of Gap (if any) associated with that observation.
This section will not have the details of your findings, as they may be running into scores of pages. The detailed observations are placed in an Appendix. This part will provide the list of all key observations linked to the objectives, under which you started the DMA. This section will also not be detailing on the areas, where your findings are aligned with your expectations, and/or standards.
One needs to define the gaps with both the expectations and the standards. Sometimes the expectations are below the standards, as the audiences are aware of the issues. For example you may expect only 60% of the customer records will be having PIN Codes populated. However, the standard could be 95%
As a principle, one may like define the top 20-30 observations and Gaps. The detailed observations, Gaps and Root-Cause Analysis can be kept for appendix. The main DMA report audiences are business owners, who may be interested in what matters most.
One may ask a question, that why we are not stating root-cause along with the Gaps and why we have a separate section for root-cause analysis? - The reason is that you do not have all the observations and Gaps in this section. The details are in the appendix. Secondly there is not one-to-one mapping between Gaps and Root-Cause. Thirdly, DMA report does not go into every root-cause, but tries to share the high level root cause or reasons for high priority Gaps. It is also possible (if determined by the scope) that it may not go into any root-cause analysis
For example-
Single Column Analysis
- 90% of PIN codes are populated in the customer master. They are as per expectations but 5% less than the bench-mark of 95%. Most of the unpopulated Pin-Codes belong to the Asia Market.
- 35% of Customer Location is not adhering to the domain value set defined for locations.
- 45% of the PIN codes are not adhering to the data standard set for the Pin code
Multiple Column Analysis
- 7% of the customers are not mapped to an active sales agent.
- 15% of the customer names are mis-matching across the customer masters maintained in CRM system and Order Processing System.
- 5% of the product codes in the sales processing systems are linked to the product records, which are inactive.
- 1.2% of the records in the Billing table correspond to the WIP orders in the order processing table.
Data Model Mapping
- Customer ID is named differently across 11 tables in the CRM system.
- Leads Table in Leads Management System and the Customer Master table in order fulfillment system have the field names Customer_ID, but they are having a different meaning. Customer_ID in Leads Table also has 74% potential or prospective customers, whereas the customers in the customer master table are actual customers.
- The Sales Management system allows more than one active agent per customer. Our defined data model has one-to-one relationship.
Data Flow Chain Analysis
- The courier receipts are entered into shipment management system, through an offline data capture system, using bar-code identification. The offline capture system sends the file to be uploaded in the main system. The rejects (due to upload validations), are kept in a temporary table. This temporary table is manually corrected and the items are posted after correction. Some inputs controls on the correction screen are missing (which are present in the batch upload validation programs). This is not as per the expected design)
KEY ROOT CAUSES BEHIND HIGH PRIORITY GAPS
This section provides RCA for select high-priority gaps. The root cause analysis stated out here will be typically fundamental areas needing improvement, and not a niche area. For example-
- A wrongly designed Account opening form is fundamental root cause, and
- A system downtime 2 months back, resulting in bad 100 records is transaction specific root-cause which may not be included in this section. However, if there have been many such incidents, it may move into 'fundamental' category. Appendix will be carrying the details of all the root-causes you have identified
The reason for placing only high level fundamental root-causes is that business owners are its main audience. As we have to be selective in terms of information overload, we need to be tactful in terms of the areas which need their biggest support.
The root causes need not be linked to any specific Gap or Observation (refer previous section). You may like to refer to the key gaps it is linked to. However, the key objective is to give the list of top 20-30 issues, which are contributing to the bad data quality.
For example
- Badly designed account opening form and lack of input controls is leading to more than one-third of all customer master information.
- Lack of universal data models around key entities, is leading to every systems applying their own cardinality rules, which is leading to data inconsistent with our business rules and processes.
- Due to lack of a universal interface between the feeding system and general ledge system, every feeding system is following its own mapping between business accounts and GL accounts, which are leading to huge reconciliation issues.
Some of the root-cause analysis can also point to solutions (like 2nd and 3rd point in the examples. One may argue that we should not go into the solution mode, as there can be many solutions to an issue. This is a matter of debate and we will not recommend any hard and fast rule.
NEXT STEPS
This section provides the list of next steps and help items. Following are the examples of the next steps:
- Stakeholder Sign-off on the report.
- Initiate Prioritization and solutions to the Gaps
NOTE- Sometime the prioritization is done as part of the DMA, and sometimes it is done after DMA. The reason to do it after DMA is because the prioritization could change depending upon the purpose for which DMA is going to be used. Data Quality program vs. Data-Warehouse vs. Data Migration can attach different priorities and solution to different issues.
For example, for a DQ program the solution could be to fix the front-end input whereas for DW, it could be to fix it through the Transformation design, and not to touch the front-end input.
- Proceed to detailed root-cause analysis and solution finding for to the Gaps.
- Proceed to next phase ETL analysis for the data warehouse (if the DMA was done for Data warehousing solution).
- Proceed to next phase of Data Quality program
APPENDIX
There is a long list of appendix you can have in the DMA, depending on its scope
- List of the people Interviewed
- List of the systems analyzed
- List of Tables Analyzed
- List of Data Entities analyzed
- List of Business processes, Sub-processes and process points analyzed.
- Observations, Gaps and Root Cause- (System Wise)
- Single Column
- Multiple Column Analysis
- Data Model Analysis
It is recommended to list your observations and Gaps as per the system. This is because there are assigned IT and Business Owners for a system. Therefore this kind of segregation would helps. The point to note is that it will contain all the findings, even if there is a gap or not. Against each finding, it will state the expected, standard, Gap or not, Root-cause Analysis
- Observation, Gaps and Root Cause- Overall
This part will provide the details around the gaps and observations which cut across the systems and processes. This also includes all the observations, Gaps and Root Causes for Data Flow Chain analysis
This is further appendix, which provides the documentation which was used for analysis and the documentation, which was produced during DMA. For example, it will provide the data model, which you generated.
|