Please refer store most granular data in Data-warehouse, where we are recommending the detailed/transaction level data to be stored in Data Warehouse platform. This makes Data Warehouse as a universal BI resource, whereby it can serve wide variety of end-user tools (many of which required detailed data- like enterprise reporting) and also allow you to drill down to the lowest level of details for investigative and cause analysis actions.
So does it mean that we don’t need to store summary data (or what you call as aggregate fact tables in Dimensional modeling lingo) in Data Warehouse. Answer is mostly ‘no- we need summary data’. The reasons is the Summary data, enhanced the response time to the select queries. Detailed data may be in position to respond to more un-predictable queries. However, for predictable, high load queries, one may need to have aggregate fact tables to facilitate faster response time. While we say this, one needs to be cautious to go for summary data, and use it only when critically needed.
Therefore, you can have situations, where you will need the detail as well as the summary data for a given set of dimensions. In this scenario you should have separate fact table grain for detail vs. summary data. Therefore, you will have a different fact table grain for different level of aggregations. The reason for this is:
- A fact table grain should be operating at a single level (refer business hierarchy). The entire querying and population of OLAP from Data Warehouse, is typically based on this assumption of single level fact grain table. The querying on the data warehouse will become complex in case you club detail and aggregation in the same table. You need to apply filters on the same schema, depending upon the level of summary information you are looking for.
- The dimensional table linked with the fact table grain will also become confused. The primary key of the dimensional table, will be pointing to different entities. For example, one record of the dimensional table will have the primary key related to a sales agent, and the other record will have the primary key related to the sales manager (higher level in the same channel dimension).
- If you look the database tables in the OLTP or transaction systems, you will see that all the tables will have data at the same level. The same principle is also applied in the data warehouse, to maintain consistency and simplicity.
|