Data Model Analysis through the data Definition Language:This becomes a bit easier, if you have a good database system and more importantly a good database design. The data definition language (script for data table creation) provides fairly good detail on the data model. It provides the primary keys, attributes to the table, the constraints attached to each field, constraints across the fields, mandatory/optional/null/default values. It also talks about the triggers linked to specific actions performed on the database. It also tells us on the foreign key (referential linkage) across the tables. In real world, following are the problems, which are faced and the methods applied to get around them: Same fields with different names across the tables (synonyms): The age-old problem - customer_id vs. cust_id vs. custid, with each actually meaning the same. The way to find synonyms is: - Use subject matter expertise- The analysts identify the similar sounding fields and discuss with SMEs.
- Find the fields having exactly the same structure (8 characters, with first two characters being alpha..) and analyze.
- Use conventional wisdom- PIN, PIN_Code, ZIP_code should mean the same. However, apply your conventions along with confirming them with domain experts.
Different fields having the same name To make the things little complex, it is not all that simple. it is possible that the field may be having the same name and structure, but they could be meaning different. For example, you may have an entity called customer (with customer ID of 8 characters), and within that entity, you have sub-entities called 'gold-customers’ and ‘classic_customers’. The entity as well as these sub-entities may carry the same field name ‘Customer_ID’, whereas the field names should be different, to uniquely identify the sub-entitites.
So whenever you are in process of finding synonyms, you may come across the faults both-ways. Therefore, a continuing input is needed from SMEs AND one needs complete details of the system specs and business rules.
Entity Relationship (ER) analysis through data in the data-base
If the data definition language does not help fully, one has to run the queries on the database to draw conclusions on the data model. For example- If one finds that all records in policy_master have an agent-ID. This means that there is a mandatory relationship between a policy and agent ID. Similarly, lets say that one finds that a OPD Transaction Master has multiple patient Ids against a doctor and have multiple doctor Ids against a patient. This shows that there is many to many relationships.
However, the limitation is that data does not confirm all the rules. If there is at least one policy sold by every agent in a life insurance company database, it does not mean that there is a mandatory relationship. This could be incidental also. Through the Programs Sitting over the data-baseSome of the data model related rules are built in the programs instead of being in Database. This is purely investigative method, AND is typically done for high-importance targeted entities and attributes. Some tips are: - Go through the front-end screens and check on the form level controls. For example – A form does not allow the ‘submit’ for a customer record, if the date of birth is not filled-up. This means that the ‘DOB’ is the mandatory attribute to the ‘customer’ entity.
- Check with the process owners, look at the data entry forms etc. to understand the business rules being followed.
NET ANALYSIS IS - DATA PROFILING tools, in reality solve only a limited proportion of the problem in terms of identifying the underlying data-models. Rest has to be done by self-discovery. PLEASE REFER Execution-MiHPractice Tool Data Mapping & Assessment Report |