Data Virtualization: Essential but Approach with Caution - The Next Big Thing -
Data Virtualization: Essential but Approach with Caution

Data Virtualization is the current marketing banner for Enterprise Information Integration (EII). It is an important adjunct to SOA, but must be undertaken with caution.

David Linthicum, Linthicum Group, and Bradley Wright, Progress DataDirect, recently presented an ebizQ webinar on "Putting Your Data to Work for Your Cloud, BPM, MDM and SOA Project."  The thrust of this presentation was that data virtualization will provide consistent, cross-enterprise access to data in heterogeneous data stores.  This is not a new capability, but was introduced as EII a number of years ago.

When asked the difference between data virtualization and EII, Bradley Wright indicated that data virtualization includes the capability to perform updates.  While not all EII products supported updates, some did, so it appears the primary difference is marketing.  At the same time, as I will discuss, below, data virtualization should not be used for updates.

The fundamental concept of data virtualization and EII is that data is accessed from multiple, heterogeneous databases through a virtual database that provides an integrated, consistent view of data from these multiple sources.  Queries are expressed in terms of the virtual database schema and translated as required, and data from multiple sources is transformed and integrated to provide a response that is consistent with the virtual database schema.

SOA increases the importance of data virtualization because SOA is likely to increase the number and diversity of databases.  I discussed this in my blog last year entitled, "Data Management for SOA."  A service should be loosely coupled and its data stores should be hidden from the service users to maintain flexibility in the implementation of the service.  This conflicts with needs for cross-enterprise views of data for planning and decision-making.  Data virtualization can provide such visibility; however, there are certain realities that must be understood when using data virtualization.  Loraine Lawson touched on some limitations in an interview with Peter Tran and Bob Reary of Composite Software two years ago entitled, "When Data Virtualization Works - And When It Doesn't," but there are additional concerns.

The following paragraphs outline key limitations of data virtualization that must be considered when setting expectations and when using data virtualization to obtain composite views.

Data inconsistencies.  A data virtualization product can perform data conversions (e.g., feet to meters), but it can't create data that isn't stored.  For example, if one organization maintains weekly production figures and another maintains monthly figures, these two different measures cannot be reconciled.  If one organization tracks numbers of defects in one set of categories, and another uses a different set of categories, the figures cannot be compared or added.

Such problems are fundamental to the business, and if it is important to examine such data across the enterprise, then there must be a transformation initiative to make the data collection and storage consistent with a common scheme.

Process inconsistencies.  Some enterprises will have similar business operations that are in different geographies or produce different categories of products or services.  What they do may be similar, but they may have business processes that cannot be compared.  There may be different stages of production or service delivery that are of interest to top management.  The different operations may use the same terminology for phases, but the terms are not applied consistently to the business processes.  This may lead to top management comparing apples to oranges.  Such discrepancies might extend to inconsistent metrics such as the definition of rework, and inconsistencies between sales and the cost of goods sold.

Timing inconsistencies.  An enterprise does not operate instantaneously and in lock step.  The orders being received are not the same as the orders being filled and the orders being shipped.  The engineering change issued by the engineering department may be delayed until current inventories are consumed.  Payment is not due on orders shipped but not yet delivered.  A query that combines data from different operations will not represent a consistent view of the enterprise.  That requires the definition of cut-off-points and the time for various activities and transactions to reach and record consistent points in their operations.  This is why financial information is not immediately available at the end of a period.

It is not practical to eliminate all such inconsistencies or wait to accumulate consistent results.  Users of data virtualization must understand such limitations when using the composite data.

Resource overload.  A data virtualization service will access data from various production databases.  These databases are not necessarily configured to handle an increased volume of queries.  Some queries may add unexpected workload to a database, or the workload from many potential users may be quite unpredictable.  This ad hoc resource demand could interfere with mainstream business application performance.  In cloud computing, the resource may be available on demand, but there could be unacceptable increases in costs.

Update errors.  If data virtualization is used to update databases, it will bypass the applications designed to validate, control and coordinate the updates.  The updates may also be inconsistent with the current state of the production operations.  Furthermore, updates normally performed by associated applications may require coordination and propagation to related operations and applications.  It is very dangerous to bypass the responsible organizations and their applications to update their databases-it should not happen.  Any update should go through the appropriate processes for validation, authorization, control and coordination that are the responsibility of those business operations and their applications.

I think data virtualization (a.k.a., enterprise information integration) is an important technology that should be part of a SOA strategy, but users must adopt it with their eyes wide open.  It's a long-term investment, and it is likely there will always be the need to understand and allow for inconsistencies in the data.


Posted 11-02-2009 7:34 AM by Fred Cummins
Powered by Community Server (Non-Commercial Edition), by Telligent Systems