Data Federation is essentially an ability to pick the data from various source systems and directly feed it into your OLAP
and End-User tools. It means that one does not have to go through the painful process of Extraction, Transformation and Loading. Data federation also provides a "virtual" data-warehouse, where it can run a query across multiple systems as if it was coming from a singular source. However, it does not make Data Federation as your core integration solution.
Goes unsaid that there are many tools, which are a hybrid between pure ETL and Data Federation. Major solution providers also give it as an add-on to the core ETL solution.
My recommendation would be to avoid Data Federation if you have any complexity or dynamism in your source system or reporting requirements. In other words you can try Data Federation in case you have stable systems, which are not going to change frequently and your reporting requirements are stable. Some situations where you may think of using data federation cautiously are:
- If there is a shrink-wrapped legacy system, OR
- systems, which are providing secondary or tertiary information needs OR
- they are low volume and intensity information needs OR
- when the data sources providing the data are limited in number (say 2-3)
- When you are picking up source data with very simple transformations.
- When you want to run simple queries for operational purposes (for example customer profile..)
Other area where you can use data federation is to establish an initial show-case of the what a business intelligence solution can do ("widening the horizon for the end-users), if does not cost you much.
The reasons for avoiding data federation as the core ETL solution are:
- If you are doing online data federation, and looking for high volume queries or analysis, you will over-load the source systems.
- You have to be careful on when you run an online query across the source systems, as all the systems may not fully consistent at all the times. Even with all the talks of "online Processing", I have not seen many system landscapes, which do not bring in synch all the systems (by synchronizing masters, accounting data, and transactional data etc..) through over-night batch runs.
- Source systems change in data-base structures/design frequently, and your queries could fail. You may keep track of source system changes and change your source systems mapping accordingly, but it may not be simple always.
- No historical snap-shots.
|