We have shared the tips on how to make your dimensional model scalable and extensible, a data warehouse cannot be super flexible, in terms of handling any kind of query.
When you present a business-case on BI ( or in particular Data Warehouse and OLAP
- the largest piece in terns of efforts), avoid over-selling and positioning a DW as a black-box, which can provide you the answers to all the questions. Having gone through the brochures of many platforms, the over-selling is much more acute in BI platforms.
The reasons that you cannot have a super-flexible DW are:
No platform can realistically handle beyond certain number of dimensions
One can think that if you pick all possible data at detailed level and place it in distinct dimensional model schemas, you can process any query. This is not the case, because a super flexible data warehouse will have to have a single schema and large number of dimensions. While many BI platforms claim to be able to handle hundreds of dimensions, in reality going beyond 30 odd dimensions can lead to huge data explosion.
Summary and aggregates
You need to have summary or aggregate schemas, where you have data at summary level (like agent level sales figures, instead of individual invoice level data), to increase the response time to the queries. If you only have detailed data, multiple queries running on a schema can de-mobilize the system. These summary schemas are drive by the specific business requirements. You can have million different options to summarize the detailed data. You have to be selective on which aggregate schemas to be made for which level of summarization.
Derived attributes and facts
To increase the response time, one also creates derived facts. Here also you need to be selective.
Processing and population time
Even if you have unlimited storage and processing infrastructure, a large dimension set, and overly large number of aggregations/derived facts will expand the end of day extraction, transformation and loading process, which will reduce the windows available to the users.
Conclusion
In other words, one can have a super flexible data warehouse, if one has infinite storage and processing power and one is ready to spend half a day to populate the DW and ready to wait for many days to run a complex query. As you will agree, none of these conditions are feasible. |