Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   Data Warehouse sharing and browsing  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  Data-Warehouse/Mart  →   -  DW Design & Architecture  → 

Data Warehouse Infrastructure

This page provides the Data Warehouse Infrastructure considerations, which are unique to a Data Warehouse. Otherwise most of the considerations are same as that of any OLTP systems. The unique considerations are mainly linked to the ad-hoc and unpredictable nature of the use a Data Warehouse may be put to.

This is generic chapter on what you need to consider as you set-up the infrastructure for a Data Warehouse. These are the considerations for Data Warehouse Platform alone. If you are placing OLAP Server the end-user tools (like data mining, enterprise reporting, analytics), they will be having their own considerations.

A Data Warehouse is a 'business infrastructure'. In a practical world, it does not do anything on its own, but provides sanitized, consistent and integrated information for host of applications and end-user tools. Therefore, the stability, availability and response time of this platform is critical. Just like a foundation pillar, its strength is core to your information management success.

Here are the various factors you will consider for building your Data Warehouse infrastructure

Data Warehouse Data Size

Data Warehouses grow fast in terms of size. This is not only the increment to the data as per the current design you have. A data warehouse will have frequent additions of new dimensions, attributes and measures. With each such addition the data could take quantum jump, as you may bring in the entire historical data related to that additional dimensional model element. Therefore as you estimate your data size, be on conservative side.

TIP- While being conservativie, do not over invest into your hard disk space. Be liberal in your estimates, but don't overdo.

TIP- You can also use the data compression method to reduce the hard-disk, but plan it well

Data Dynamics for Data Warehouse:

The volume and frequency of increment of data determines the processing speed and memory of the hardware platform. The increment of data should be typically on the daily basis. However, the level of increment could be different depending upon which data you are pulling in. Many a times, you may pull in huge amount of data from the source system into staging area, but load much smaller size summary data in Data Warehouse.

TIP- Even if you are placing only the summary data in the data warehouse, I would advise that you should assume that soon you will have the granular data in the data warehouse. Sometimes, the demand for more detailed data comes within months of implementing the summary data mode. Refer Data Warehouse can have broader applications and keep granular data in data warehouse.

Number of Users of Data Warehouse:

The number of users are essentially the number of concurrent logins which are on a data warehouse platform. Guessing he number of users of a data warehouse has the following complication:

Sometimes the user can be an end user tool, which may result in the actual number of users. For example an enterprise reporting server can access the data warehouse in form of few users to generate all the enterprise reports Post that, the actual users are accessing the database and reports repository of the enterprise reporting system and not that of the data warehouse. Similarly, you might be using an analytics system, which creates its own local cube from a data warehouse. The actual users may be accessing that cube without logging into the data warehouse. Some times the users could be referring to the cache of the data warehouse distributed database and not referring to the main data warehouse.

There is not fixed formulae for calculating (and then linking) the number of users for the purpose of estimating the infrastructure needed. We assume that the data warehouse will be able to support large number of simultaneous login threads.

Number of Star-Schemas in Data Warehouse

We are assuming that you may have multiple star-schemas (which finally get translated into cubes as per OLAP).

TIP- Higher is the number star-schemas; higher will be the chances that you would need a larger and more powerful platform. This applies even if these star-schemas contain few dimensions and a moderate level of data. Every star-schema is like a seed which will grow fast into a massive tree. Once user tastes the blood, his requirements are going to grow. This is what we call 'crises of success'.

Nature of Use-

There are expert opinions on considering the nature of use (ad-hoc vs. scheduled, simple vs. complex queries, data mining vs. reporting) to plan out the hardware. Our view is that it is a non-relevant factor. Once you have placed a DW platform, you would not be able to control the kind of its applications and the tools which one can make sit over the DW.

Financial Readiness:

It depends on the monies which one can spend. If you have a monetary constraint, instead of DW platform with all the attendant infrastructure elements, we would suggest you go for few data marts first. Building an enterprise data warehouse, which has a risk of running out of memory or disk-space, will be risky.

Skills Readiness:

We feel that it should not be a major issue as skills can be bought, especially in today's world where more and more companies are outsourcing their IT services.

TIP- Go for data warehouse platform, which is not proprietary and it has an open architecture.

Processing Architecture for Data Warehouse

Multiprocessors with in the same machine sharing same disk and memory:

This is good approach for small to small-medium size data warehouse. The problem with the DW (which is not in OLTP) is that the kind of load and queries are not certain. Therefore, sometimes the allocation of processes across the processors, itself runs out of breath. Even if you are having a kind of cluster, (with load-balancing and automatic fail-over), it will become complex once you go beyond a certain size.

Parallel Processing Servers

Here the processing is done across multiple servers with each having its own memory and disk space. This way they get their own playing field, instead of fighting for common resources (as in the multiprocessor architecture). This way one can add hundreds of servers to share the load through messaging or other mode of EAI (enterprise application integration). As you design this processing architecture, one needs to ensure that there is not too many cross connections. For example if you want to join two star-schemas (refer multi-cube in OLAP server), you have to ensure that relevant data for the two cubes is in same or few servers.

Combining the above two

Depending upon the kind of business you want to run through your data warehouse, the best is to get the combination of the above two.

TIP- Ask your vendor, if the data warehouse can support all the above three processing architecture styles. Also ask the following questions:

  • How many multiprocessor machines it can support?
  • Can it support cluster architecture?
  • Does it have load-balancing and fail-over capability?
  • What has been the field experience of the number of parallel processing servers that this platform has achieved?

For the top quadrant like Oracle, Teradata..., the answer for all of this is going to be positive.

TIP- If you have enterprise strength database from Oracle, IBM DB2, SQL server (2008 is preferable), one needs to ride over them unless there are strong reasons for you to go for some other database for your data warehouse.

Database Considerations for Data Warehouse:

We would not spend too much time on this. Most of the key database vendors like Oracle, Microsoft, IBM, and Sybase have invest a lot of effort to support data warehouse application. Therefore from our point of view, it’s relatively a no-brainer for most of the clients.

 

   Data Warehouse sharing and browsing  
 
 

Was this page helpful?
 
 
More on DW Design & Architecture
DW Design and Architecture Overview
Data Warehouse Source Systems
Data Warehouse ETL Extraction
Data Warehouse ETL Transformation
Data Warehouse ETL Loading
Data Warehouse Metadata
Back-Room Data Warehouse Metadata
Data Warehouse Data Quality assurance
Data Warehouse job control and audit
Data Warehouse sharing and browsing
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators



Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales Compensation Structure Decision
Enhancing Sales Channel productivity
Sales Leads Classification and prioritization
Sales force density
Sales Channel Data Management
Read more...
  Customer Relationship Management
Customer Satisfaction and Retention- Overview
Customer Value and Profitability Tips and Actions
Customer Service and Support - Strategic Role
Customer Segmentation Parameters
Supply Chain for Customer Service and Support
Read more...
  Human Resources & Leadership
Feedback does not mean only negative feedback
Be straight and blunt, till you team gets used to it
Developing Leaders- Few Leadership Traits
Fostering Innovation
Competencies Definitions
Read more...
 
 
Business Performance & Planning
Dashboard Health Checklist
Performance Review should have no surprises
Review Session should stay focused
Strategic Business Plan
External Info Assessment Report
Read more...
  Business Intelligence & Data Quality
Domain and Data Related Services
Data Warehouse Project Definition
Dimensional Model Simple Hierarchy
Two tier Data Warehouse Architecture
Data Quality Approach Finalization
Read more...
  IT Vendors & Tools Management
Load, Log and Cache Management for Reports
End User Reporting Features
OLAP Server administration
Design & Analysis support and Wizards
Technical Customization Evaluation
Read more...