Once the Data is loaded in the presentation server, it can now be considered as cooked, and ready to be served in various dishes as per the choice of the users. Coming back to the original example, if the Data Transformation is to convert the raw iron ore into the Stainless steel billets, the Loading is to further process it into the stainless steel rolled sheets. The next step is to now providing different ways to the users to access the same data for their needs. The first part of this objective is met through Data Warehouse Data Access and Delivery services, which interface between end user tools and the Data Warehouse.
Data warehouse Access Query Management Services
These services are important from the user point of View. While the efficiency of the ETL is to make available the 'refreshed Data in the morning' timing, the Query Management Services has a challenge to meet user expectation every time a request is submitted to the user. Query management services are the 'Concierge' for the user, which ensure to get the data from the various sources in the Data Warehouse in the shortest possible time.
This is achieved by: - Threading the query, which means splitting the query into multiple parallel tasks.
- One time retrieval for the same query (OR part of a query) submitted by multiple users. This is used extensively in the batch population of the views.
Query Management service will have following capabilities:
Using semantic or simplification layer
Using the semantic layer metadata, one can shield users from the complexities of tables in the data warehouse. User can directly (or through an end user tool like enterprise reporting), can create user-friendly queries and the semantic layer will translate into the technical language, which can be understood by the query engine.
Query Efficiency and optimization
This is a typical capability of any query service engine, but is needed most in the Business Intelligence environment. As mentioned before (Refer why data warehouse is needed), even the basic queries on a data warehouse are complex. As users create (directly or through an end user tool like Analytics tool) queries, the query engine should be able to find the most optimum way to service that query. This means that it can change the sequence and type of joins, and also the sequence of nested queries.
Query splitting and multi-pass operation
This will allow you to run queries on the results from other queries. This is one way to simplify or split complex queries into more simplistic chunk. This capability also allows you to query two different sources.
Intelligent aggregate usage
A data warehouse can have data at granular level as well as at an aggregate level. Its mentioned before that aggregate level data and granular level data should not be combined in a single dimensional model schema. The query services should be able to decide on if it has to use the fact-table carrying detailed or the fact-table carrying aggregate data. A good 'aggregate aware' query service will be able to keep the details on:
- The queries which used aggregate schemas vs granular schemas.
- The level of usage for the aggregate schemas vs granular schemas.
- Additional aggregate schemas needed to boost query performance.
Creating automated data range
A good query service should be able to have defined data range, based upon the query. For example, when you say end of the month or month to date etc., the query should be able to generate the dates accordingly. For example, when you say month to date for month of july , it should be able to say from date July 1st to current date.
Query Load Anticipation
In spite of all the intelligent modeling and optimization, there will always be queries which will saturate the DW resources. Its easy to create the queries with full table scans and get into recursive or nested query operation. An intelligent query management is able to anticipate the load a query will be able to put on the system. This can be done by analyzing the history of high-load queries or having intelligent rules to anticipate the load.
Query Services Location
Query services can be located in different layers in the data warehouse architecture
- Desk-top level : Most of the desktop level query services are part of the end-user tools, which are proprietary and impose a client level expense.
- Middleware Level: This is a better location of query services, as it allows you to utilize the services across the clients and data sources. Apart from Data-warehouse, some end-user tools have their own middle-ware layers for query management services.
- Database level: This turns out to be most efficient and it fully understand the database environment, however it will be limited to the given database environment.
Data Warehouse Browsing Services
While a typical user would like to have Data Warehouse data groomed and provided through end user applications, a power user can do more by being able to go through the Data in Data Warehouse. Tools providing good navigation, drop down, expand/collapse capabilities and creative labeling capabilities enable these services. This browsing capability draws heavily on to the Meta Data which provides all the detailed information on the Data, its meaning, its source, its recency etc. |