Migration and Synchronization
Data Integration is not only ETL. It also includes Migration (physical transfer of data from one source to another permanently). For example you could be creating a central Master Data Management** Hub by permanently moving customer data from some source systems to a central repository. Synchronization is to have congruent databases (or part of them) via one way or two-way transfer of data.
Migration some times is more complex than ETL. ETL generally is used for reporting, analysis or for non-financial data. Migration on the other hand is generally of operational nature. This means that when you are migrating, you have to ensure that the cycle-cuts are taken care off, the billing letters are not sent twice, the batch processing is not redone on the migrated data and the difference between pre and post is well-addressed
- Metadata driven access to sources and Targets: This is a generic feature. However, you should also have an access outside of metadata layer.
- Ability to synchronize data with real-time transformations: As migration and synchronization is generally for the operational purpose, doing it across different database platforms requires a high degree of robustness.
- Ability to migrate or synchronize data across the business applications: While we are aware that data migration (or data conversion), may include transformation, the same holds true for synchronization as well. The different between migration and synchronization is that migration is one-time activity, whereas synchronization is a periodic or many a time near-time/real-time activity. If you are synchronizing data across different applications, you will have a situations whereby you will be doing the real-time transformation
- Ability to migrate or synchronize data across data structures: This is a subset of previous points. The migration and synchronization can include some level of data structure change and DI should be able to action it.
- Ability to incorporate the data quality assurance controls**: Data Quality becomes critical as you do one time migration of a real-time synchronization. One should be able to design and embed data quality controls in the data migration and synchronization routines. A good DI tool should invoke you to define DQ controls.
- Ability to do the synchronization through various enterprise application integration tools like MQ and Tibco: The Data Integration tools should be able to ride on the enterprise EAI tools to do synchronization. It should be able to work with wide range of tools.
- Using data exchange standards like XML: XML is becoming a popular standard for data exchange. A DI tool should have a capability to create XML files for export and import of data and data structures.
- Data Cleansing can be done Real-Time: The Data cleansing is a type of transformation activity and a Di tool should support it for any real-time DI task.
Data federation
So what is Data Federation? Simply put, it is the combining of data from various data sources into one single virtual data source or Data Service. The data can then be accessed, managed and viewed as if it were part of a single system. This Data Service can be a combination of files, HTTP Requests, Web Services, SQL queries (including Stored Procedures) and other Data Services etc. There are two typical components in this- a Database Connector and a Database Operation. The Database Connector is a mechanism that enables one or more database operations to connect to a relational database and a Database Operation is a mechanism that can extract/modify data from/in a relational database and deliver it to a Data Service.
As mentioned before- Migration is one time transfer of the data from one state to another. Synchronization is to have congruent databases (or parts of them)...This works generally in the near-time. Once the data is synchronized, there is no complication as the synchronized data will be like a Database, which can be accessed. In data federation, the data is pulled out of the source systems in real-time, transformed, combined and presented as a data service or a virtual data source, which is used for host of activities. Data- Federation is not only a BI concept but can also be used in Operational Data Stores and Master Data Management**.
- Broad-based access and connectivity: Refer the worksheet related to connectivity- We have a separate list of criteria sheet for connectivity features, as they are applicable to many different categories of tools.
- Ability to federate data from multiple disparate platforms: A Data Integration platform should be able to pull out the data in real-time from various DBMS's. The key is that because of the real-time operation, the connector has to be very efficient. For that Data Integration tool will need to have a strong partnership with the popular database platform vendors.
- Ability to federate data from multiple disparate data structures and formats: It’s a fairly obvious requirement. However, the connector will need to be highly efficient.
- Ability to integrate with various OLAP platforms: In a BI scenario, the data service of data federation may supply the data to an OLAP layer**. As speed is the key, the link between the data service and OLAP needs to be fairly efficient.
- Ability to integrate with various Business Intelligence End-User tools: One need not go though OLAP layer (in case of BI) to use the data federation virtual data service. The end user tools like reporting tools and query and analytics tools can also link to this service. The DI tool should have a strong partnership with popular end-user tools to serve high-end production needs.
- Ability to source and join data across multiple sources: A data federation gives its true value, if it can source data in real-time across multiple sources. The ability to have efficient connectors and having efficient query optimization is the key here.
- Virtual view creator to provide the view of the data dynamically sourced from multiple sources: This is core to data federation. A DI tool data-federation capability should be able to create virtual view as a data service. A virtual view for OLAP, End-user tools will appear
- Query optimization help while designing the federation queries: Query optimization help will include:
- Suggestions by the tool on the sequence of queries to create the virtual view.
- Automatic query creation, once you have defined the data sources and data elements.
- Enable you to run and check the query time, and do on the spot improvements
- Manual Query optimization through query editor: This is a standard for any query management tool.
- Query optimization both for heterogeneous and homogeneous sources: Sometimes, while the DI tools are able to do a good query optimization if the sources are homogeneous (like Oracle databases in all the sources...). However, the challenge is that one needs to do the optimization across Oracle, IBM, Teradata and other sources. A tool should be able to have a query optimization capability, which also takes into account the query performance of these sources and also efficiency of the connectors.
|