Building Making It Happen
Establishing Making-it-Happen as ‘Formal & Measurable’ Business Discipline
  Sign-in         Register
    
   Imperatives of good KPI/Metrics Prioritizing & selecting KPIs-Metrics  

Execution-MiH Encyclopedia  →   Vendor- IT Tool Domain  →  SECTION -  BI platform Tools  →  CHAPTER -  OLAP Server- scalability and Performance  → 

Data explosion OLAP Server

A common problem with OLAP databases is data explosion- which essentially means that data size multiplies, when it is loaded from the source Data into OLAP. This is because of aggregations, indexes and sparse cubes. This page covers on the tricks an OLAP tool may use to manage data explosion.

A common problem with OLAP databases is data explosion. In all OLAP databases, the following circumstances lead to data explosion:

Storage required by data and indexes

When you store data in OLAP, you are storing it in the form multi-dimensional arrays, with each array carrying one combination of all dimension instance and the associated measures. This is similar to the schema tables in the dimensional model. To enable faster response, indexing is done on these tables.

Storage devoted to empty cells:

Not all the combinations have the values attached. For example as company may have 10000 different product models. However, an office may have sold only 500 of them in a month. Therefore the combination of rest of 9500 models+Office+month will have zero values for the sales measure. There is also a question of empty cells, where the combination simply does not exist, as office is not expected to sell those models. This is called sparsity of data. If you assign the full-length storage for these empty/zero fields, you will have data explosion.

Storage devoted to aggregations

OLAP server is expected to provide fast response time, and summarization and aggregation is the key. Designers build pre-aggregated values, given their understanding of expected queries on the system. However, each new aggregation adds to the data requirements.

Data explosion is not an issue for small data-bases, but can be a serious problems with large databases. The data explosion can go up to 1:1000 times kind of range (Source data: OLAP data)

An OLAP sever applies various methods to reduce the data-Explosion. It may:

  • Have smart indexing in term of usage. If an index is not used much, it may highlight it to the designers OR internally handle indexing and de-indexing.
  • Provide least storage space for the empty cells: An OLAP should be able to provide minimum storage for empty cells, whereby it can have a pointer which states that 'this is an empty cell', with no storage allocated.
  • It can have effective compression technologies.
  • Reducing the number of aggregated data and do more calculations on real-time basis. This is achieved by more efficient query and calculation engines. Also the OLAP server can smartly keep the data in RAM for more frequent querying, so that the disk I/O activities are not an over-head.
  • Running aggregations in the back-ground: OLAP server can run aggregations calculations in the back-end and therefore the calculated aggregations are ready by the time queries are fired. For example, once your overnight refresh of the data warehouse and OLAP server is done, the server can start the aggregation calculations, and gain some time before the users get on the system.
  • Smart aggregations: For every query, the server has a choice on which data it wants to refer. For example, if your typical query is on yearly summarization, one may like to store the quarter level data (4 instances) instead of month-level data (12 instances) OR week level data (52 instances)

Handling OLAP Data Storage option

Ideally, an OLAP server should provide all the three options for storage - multidimensional OLAP (MOLAP), relational OLAP (ROLAP), and hybrid OLAP (HOLAP) storage. You are aware, that each of the above has got its own strong points. For details, you can refer to OLAP architecture scenarios in OLAP Architecture. Here is the brief:

  • MOLAP stores aggregations in multidimensional form on the OLAP server.
  • ROLAP stores detailed data and aggregations in a relational database.
  • HOLAP is a hybrid of the two: aggregations are stored on the OLAP server, but detailed data is stored in an RDBMS.

The flexibility can go on the following lines:

  • Ability to choose different storage options for different cubes.
  • Ability to choose different storage option within the same cube.
  • The OLAP server tool suggesting the option, given the kind of application.

The flexibility in the data-storage options is not the key-most parameter, as it also depends on the size and scale of your operations and queries. I would say that in most cases, even the pure HOLAP works well. However, this flexibility can be having an added advantage. A smart HOLAP with an ability to draw the line on what will be in OLAP server and what will be in the data warehouse server can do the trick.

 

   Imperatives of good KPI/Metrics Prioritizing & selecting KPIs-Metrics  
 
All Topics in: "OLAP Server- scalability and Performance" Chapter
 OLAP Scalability →  Data explosion OLAP Server →  OLAP Server Reliability →  OLAP Performance Management → 
 

Was this page helpful?
If you like it ? share it !
Digg
Digg
Reddit
Reddit
Del.icio.us
Delicious
Google
Google
Live
Live
Facebook
Facebook
Slashdot
Slashdot
Netscape
Netscape
Technorati
Technorati
Stumbleupon
Stumbleupon
Spurl
Spurl
Furl
Furl
Blogmarks
Blogmarks
Yahoo
Yahoo
Plugim
Plugim
Squidoo
Squidoo
BlinkBits
BlinkBits
 
CONTENT ZONE
BI platform Tools

Featured Pages
Data Integration Metadata Management
Data Profiling and Monitoring
Metadata Tool Architecture Features
Data Quality Tools Integration

Make 'Executable' Strategy
Maximize Results
Maximize People
Manage Execution

Featured Pages
Metadata Tool Change Management
Vendor Partnership and alliance Evaluation
Multi Layer Architecture
OLAP Server write backs