Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   OLAP + Data Warehouse Design Phase  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  Data-Warehouse/Mart  →   -  Data Warehouse Project Plan- Work Break-Down Structure  → 

Physical Database Design and Implementation

The physical data-warehouse database design and implementation includes setting naming standards, designing database tables & columns, setting aggregation plan and resource handling parameters.

Define Naming Standards for Data Warehouse

The naming standards of the Data Warehouse will be essentially on how to name the Data-Base objects, File Names. We will not go into the naming standards for routines and processes, as it is not linked to database design. Here are some methods:

Naming of Dimension Tables

The dimension tables should be named as per the standard dimensions list which you have created during the analyze phase. The assumption here is that you are using the foundations dimensions and measures. You can also add a prefix or suffix in terms of that:

  • It is a dimension table
  • It is a detailed table or an aggregate table
  • It is current table or historical snapshot table

Naming of Fact Tables

The fact tables will vary with each schema unlike the dimension tables (assuming you are using foundation dimensional tables). There can be two suggested ways to name:

Name it as the business themes of the schema. For example, if a star schema is doing the sales Revenue Analysis, you can provide similar names. The reason is that you will not find two different schemas addressing the same analysis objective. So the distinctness in naming is maintained.

The other way to name a fact table is to name a fact table as per the fact-table grain you are using. For example in sales revenue analysis schema, you may be using a sales transaction as the fact table grain. Therefore, you can name the fact table accordingly. In the fact table, you can add a prefix or suffix on:

  • If it is a detailed or aggregate fact table
  • If it is current fact table or historical snapshot table

Naming of the Database Fields:

A database field name can have the following components:

  • The attribute Word: The field will represent an attribute of some entity. For example pin code, telephone no, date of birth
  • The attribute type word: This part of the name suggests on what is the kind of attribute it is. Some examples are
    • Summary Types: Average, sum, total, count
    • Functional Reason: Flag, ID, category, class...
    • Others: Description, Title, Serial_number
  • Additional Identifier: This is optional if needed. Some examples are end, start, initial, final, previous, later, Ist, 2nd, primary, secondary, main... (like start_date, 1st_address, primary_email...)

Few Points to keep in mind, as you are setting the naming standards:

  • Avoid overuse of abbreviations: If you have to use abbreviations, use them in a standard way (like use 'descr' universally and not 'desc' and 'descr' being used at different instances...). Ideally you should have a standard reference list for abbreviations. However avoid using unnecessary abbreviations like 'cust' for 'customer'.
  • Avoid using too long a name: Avoid using examples like 'Purchase_order_header_parts_total_amount_dollars'. The best way to handle it is that one can look at the field and table together. The name of the table may not be needed to be fully repeated in the field name, though it need not be totally eliminated. For example I will use the name customer_ID in the table customer_master (where the customer part is both in the table name and field name). However, I will not need to have field name called customer_master_ID.
  • Use underscores instead of hyphens: This is the standard best practice for any kind of naming. Underscores are always preferred.
  • Don't and never use uppercase: To make sure that you do not get caught into any case sensitive platform or database handling function, you should always use lower-case.
  • Document your naming standards: Your naming standards should be universally available, and they should be a living document.

Naming standards of the Data Warehouse code files, data files and data definition files

Apart from the naming standards for the database, one will need to define the naming standards for the directory paths, code files, the dbf files and database creation scripts. It’s a fairly simple and doable concept. The release manager or version control manager can easily create some rules for naming. Some examples are-

  • Name the ETL script names for the customer entity to go as per the customer name. You may also add the source system name in the script name.
  • Name the dbf file as that of the business theme (this is more applicable for data-mart...)
  • Name the table creation script as per the name of the entity or fact table grain.

Design Physical Tables & Columns

The physical design or columns need to be same as the logical model. The changes will be typically will be driven by the following consideration:

  • Determine the Null Fields
  • Create the surrogate keys
  • Change the structure, if it will make the access by the access tools more productive.
  • Add the indexing structure
  • Making the changes in the staging, if your defined logical staging model is not turning out to be efficient.

Estimate Database Size

The database size estimates tricks are similar to that of an OLTP. While one can go into a complex calculation, keep your database size estimate as five times that of the base size of your dimensional tables (including historical snap-shots). The components are:

  • Aggregate tables
  • Index Tables
  • Metadata tables
  • Temp Space
  • Future expansion

Develop Index Plan

Indexing plan will keep on changing with time as you learn more about the usage. We will not spend too much time on this aspect as most of the principles are same as that of OLTP. There are however few differences:

  • Special optimization of dimensional model schema: This includes creation of special indexing, so that you can actually search for the combination of dimensional table fields. By this optimization, you can create index on a combination of keys not from a single table (as you typically do in OLTP), but across various tables. This means that when a query is searching across these dimensional tables, one can find the records across these dimensional tables and then link up to fact table for getting the data. This kind of multi-table indexing needs to be constantly updated as the query types change over time. To further explain this point, if you feel that there is a heavy query load across customer, time, location and product dimension table (people want to find which customer went to which branch at what time and bought what product). You may create a multi-key index for the combination of primary keys of these four dimension tables.
  • High fact Table indexing: Fact table is a kind of table which is unique and a parallel is not found in OLTP environment. A fact table will have more indexes than a typical OLTP table. This is because it is the centre point of all queries done on a dimensional model schema. Depending upon query usage, you can define multiple multi-key indexes. Taking the examples from the previous point, you can create an index on the four keys which are linked to the customer, product, location and time dimension tables.

Develop Aggregation Plan

Aggregation is used for the purpose of increasing the response time to the queries. The aggregation is done to handle few predictable and frequent queries which need to look at the summary data. This is straight as one has to look at the expected queries. Now two things to consider:

  • If you are using an OLAP layer, you may need to create too many aggregates as OLAP itself is an aggregator.
  • You have to ruthlessly question the need of aggregates, as they do increase the load on the whole loading process.

Resource Handling

  • Determine the partition plan: There are innumerable partitioning plans. This is nothing but the same principles you apply on OLTP. Typical partitioning happens with partitioning of the fact table and the associated slices of the dimensional tables in a partition.
  • Determine DBMS Memory handling: You can refer in-memory analytics for more details on the benefit of placing the reference data in Random Access Memory. There are various ways in which you can cut it. You can place entire set of dimensional model schema in the memory (or OLAP cube) to speed up the access OR If you are not able to do so, you can have most accessed data set in the memory. The suggestion is to maximize the use of memory.
  • Determine the Cache handling: Cache is used when you want to maximize the response time by using the storage which is local or closer. You can decide about the cache distribution across the local memory, local disc, and central database memory or in the database itself. You can also decide to maintain the cache in the memory or disk of the application server.

Installation and Implementation of DBMS design

There are some other aspects of physical database design. However, they are same as that in OLTP. The design and implementation will go through some iterations as you run your testing around volume and stress factors to ensure the meeting of expectations.

 

   OLAP + Data Warehouse Design Phase  
 
 

Was this page helpful?
 
 
More on Data Warehouse Project Plan- WBS
Data Warehouse Project Definition
Data Warehouse Project Initiation Phase
DW Business Requirements Gathering Phase
Data Warehouse Modeling and Analyze Phase
Data Warehouse Design Phase
OLAP + Data Warehouse Design Phase
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators



Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales strike rate
Sales Compensation Analysis
Sales Revenue Management
Sales Leads Generation through Events
Sales Leads Management SWOT
Read more...
  Customer Relationship Management
Customer Satisfaction and Retention- Overview
Customer Value and Profitability Data Management
Customer Service and Support Overview
Drivers for Customer Satisfaction & Retention
What is Customer Segmentation?
Read more...
  Human Resources & Leadership
Setting Strategic Intent and Alignment
Act with Decisiveness
Lead diverse and collaborative teams
Deliver Results
Be straight and blunt, till you team gets used to it
Read more...
 
 
Business Performance & Planning
Strategy Blueprint Information Gathering
SWOT Analysis in Strategic blueprint Planning
Strategic Planning Business Themes
SWOT Assessment Report
Business Objectives Drill Down
Read more...
  Business Intelligence & Data Quality
Checksum for changed records
Data Warehouse ETL Transformation
Data Quality Policy
Ask for dates instead of years
Documenting data-integration system
Read more...
  IT Vendors & Tools Management
Collaboration and Administration Support
Technical Evaluation- Interoperability
Vendor Commercial Evaluation- Licensing IPR
Vendor Quality Evaluation
Technical Architecture Evaluation
Read more...