We have kept this as a separate topic, as this is beyond just Data Warehouse design. Please refer on How OLAP fits into BI framework for more details. In-brief, OLAP provide analysis services to the end-user tools (like enterprise reporting, query and analytics tools). OLAP sits between the Data Warehouse and end-user tools. OLAP provides multi-dimensional view of the data, and provides host of analytical functions and services, which can be used by the end user tools.
As you plan your data warehouse, you have to decide, if you want to have OLAP server or you want to go for end-user tools directly sitting over the data-warehouse. 99% of the case your answer should be in favor of OLAP. We feel that a BI platform having Data Warehouse and not an OLAP server is theoretical in today’s world, as OLAP enables true potential of data warehouse.
As you do the design of your BI platform, it should be done holistically including the OLAP component. An OLAP layer reduces the need for creating the aggregate schemas in Data Warehouse significantly. Readers are aware that aggregate/summary level schemas are created in the data warehouse to enable faster response time to the queries. However, as OLAP summarizes the detailed data and stores it in its multi dimensional database, there is no or least need for having the aggregated schemas in the DW database.
Identify the OLAP server Product
This is part of identifying the overall BI platform. As mentioned before, it is advisable to have the core BI platforms from a single vendor. You can refer OLAP server evaluation criteria for more details.
Selecting the Data Warehouse + OLAP storage strategy
Readers can refer OLAP architectures in the BI end-to-end section. In brief, there are three major storage options-
- MOLAP- Having only multi-dimensional storage. This limits the level of detailed data it can store. Therefore in MOLAP you will mostly end-up having query performed at summary level
- ROLAP- The entire OLAP and Data Warehouse are in the relational form of storage.
- HOLAP- This is a combination of ROLAP and MOLAP, whereby the summary data is in the multi-dimensional form and the detailed data is in the relational form.
Design for the aggregate strategy
As you use OLAP, you need to have few aggregate schemas in the data-warehouse. However, sometimes, you may still need aggregates, if you are accessing the data warehouse database directly (and not through OLAP). This happens mostly in the case of enterprise reporting.
Design for style of OLAP calculation
You will need to refer OLAP server tool features to get a greater understanding on how you manage the architecture of OLAP. There can be three ways for calculations in OLAP-
- Pre-stored calculations- You can have different level of summary data in OLAP and also the derived fields (for example 12 month moving averages based upon month-end figures..) pre calculated (or aggregated).
- You can have on the fly calculations, where the calculation happens at the time of query.
- Pre-calculated calculations- These calculations are not stored, but done in anticipation of a query and are stored either in RAM or in temporary tables.
Depending upon the expected query volume, data explosion issue and type of queries, one can decide on the strategy of calculations.
Design for multi-cube architecture
Multi-cube architecture essentially means when you can run your query across more than one cube. This is made possible when there are identical dimensions and its instances across two cubes. The beauty of a good multi-cube architecture is that if you have the identical dimensions, the OLAP server can automatically create the link across the cubes. You do not have to identify each and every join.
Design for Data Explosion Management
Data Explosion is a key issue in most of the OLAP server environments. The data explosion happens when you have sparse field (or blank fields). For example you may have 5000 products and 20000 sales outlets. In an OLAP you may end up with 100 million records of daily sales. However in actual, not all products will be sold by every sale out-let. This results in large number of blank fields related to the product + sales outlet combination, where no sales happened. The Data Explosion Strategy can include the following approaches (much of it depends upon the capability of your OLAP server);
- Configure your server in a way so that there is no storage space assigned to a blank field, and there is only a pointer. The blank fields are still used for many aggregation queries (for example number of sales offices which did not sell any home furnishing line of products...).
- Configure your server, so that a very small storage space is provided for the blank fields- Sometimes your OLAP server may not allow for zero storage space for the blank fields.
- Compress blank fields- If you have to allocate storage space to the blank fields, you can compress the same using compression tools available with OLAP server.
|