This is the next evolution state for an enterprise. This involves a Data Mart OR set of Independent Data Marts sitting on top of a single OR multiple staging databases. Data Mart is evolved as a quick solution to a business need. This results in Individual businesses and system owners creating their own data mart to meet their specific needs. Data-Mart does not work in isolation and it needs a staging database to get the data and OLAP tools to analyze the data contained in the data-mart. This leads to various variants of this topology.
NOTE- ‘Staging Database” for production reporting and transaction level queries can be different from the ‘Staging Database’ used for Data Warehouse/Data Marts .
The variants for Two Tier BI architecture are as follows:
The Single Staging and Single Data Mart
This is most basic variant. A single staging is created, which undergoes the standard Extraction and Transformation process. The data from this staging is loaded into a data mart. The Extraction and Transformation is fairly limited as we are looking at the select tables and ETL operations relevant only for the given data mart and it's purpose. Therefore, this doesn't have the concept of standard Dimensions and Measures. It may not always have Dimensional Model, IF the purpose of the Data Mart does not require too much of ad-hoc querying and the volume is also limited. For example a Data Mart made to analyze the reserves to fine-tune the reserve allocations at the end of every month, may not use Dimensional Model. This is because the use of this data mart is low volume, number of users is fairly limited and the type of analysis is predictable to certain extent.
Multiple Staging and Multiple Data Marts
This typically is the next stage of evolution, and most common one in the large enterprises. Once a business owner showcases the advantage of the a Data Mart, there is a lot of demand for creating new Data-Marts. Technology mostly does not get time to work out an overall strategy to have a common staging database to service all these Data Marts. This leads to individual Staging Areas getting created, which mostly serves one (sometimes more than one) Data Mart.
While the upside of this set-up is the quickest spread of Data-Marts and their business relevance, there are many risky downsides
- Individual staging databases may end-up extracting the same information multiple times from the same source system. This leads to increase in the end of day Extraction load and time window.
- If there is not a common control, the Transformation/Cleansing process may end-up having inconsistent data across the data-marts. This leads to lack of confidence.
- Loss of control and inefficient use of resources.
Single Staging and Multiple Data Marts
This is the most ideal set-up in the given topology. This includes a common staging database, which provides loaded data into multiple independent Data-Marts. Lets see at how we reach this situation:
- The start-point is the previous scenario (multiple Staging Areas and multiple Data Marts). As the downsides of this scenario come into play, a decision is made to have a common Staging Area.
A common Staging Area is created quickly. All the information from the source systems is picked up as is.
- The rest of the processes (which used to happen in the individual Staging Areas) are replicated in the common Staging Area as is. Thereafter, the Transformation/Loading processes and data sets follow the same isolated path as they used to do in Individual Staging Areas. Till this point, the Extraction load on sourcing systems is taken care of.
- As the time passes, the Transformation processes happening in isolated fashion for individual data marts (within the common Staging Area) are rationalized. Therefore one may end-up with a common table of transformed & cleansed Customer data.
- However, the output Data Sets for Loading into the data-marts may still be different. Therefore you will have two different Customer datasets created to be loaded into different data marts.