Data Virtualization is neither new nor unknown. It has been in the IT industry for more than two decades with some power players pioneering themselves as the first ones to have introduced this solution.
Although the technology has been around for quite some time it is only now that more and more corporations have started using it. Mostly as an alternative to the traditional Extract-Transform-Load (ETL) systems and data warehousing in areas such as Business Intelligence Analytics (BIA), application development and Big Data. Depending on its nature, data is stored in various sources ranging from the traditional central servers including mainframes to relatively newer storage solutions such as cloud storage applications and Big Data systems. What adds to the mix is that within these different storage solutions the type of stored data also differs, ranging from traditional relational database systems like Oracle and Microsoft SQL Server to CSV, XML & JSON files, etc. This results in a complex digital ecosystem comprising of heterogeneous data.
A Data Management
solution is needed to make this data usable. It should not only connect the various storage types but also effectively handle the various data types and transform it from raw data into usable data. Data Virtualization is a broad term unfolding a Data Management approach that allows the retrieval and manipulation of data from its source without requiring any technical details about it, such as its formatting, physical location and transforming it using an application. The goal of Data Virtualization is the creation of a homogeneous single representation of data originating from several, unlike sources. By introducing a semantic layout this can be done without ever having to copy or move the data from its original source.
-Data Virtualization at a Glance-
The success of Data Virtualization lies in its flexibility to handle and process both structured and semi-structured data. The extracted data can be used for virtual viewing through a dashboard or visualized using a BI tool. This virtualization of data results in rendering usable data in an instant without the user having to go through all its complexities, such as its varying nature, location, and type. As mentioned earlier, Data Virtualization does not duplicate the data in any manner from the source systems. Instead, it only works with the derived metadata and integration logic for viewing in a visual manner.
Common data sources virtualized using Data Virtualization software
-Excel & flat files
Data Virtualization: How it Works
In simple words, the Data Virtualization software acts as a middle layer allowing the virtual integration of data. Coming from various types of data models it enables authorized data consumers to access the entirety of an organization’s data from a single point of access. The data consumer needs not to be concerned whether the retrieved data is being housed in a mainframe (central server) or in a data lake in the cloud. Data Virtualization software also enables access management since it allows the data owner to define who is able to access which data and who is not. One of the biggest reasons why Data Virtualization software is being deployed is because it ensures the establishment of a Single Point of Truth for all the stakeholders. Each of them needs access to the up-to-date data in the most cost-efficient manner possible for accomplishing their business objectives.
The Data Virtualization tools are usually a combination of two distinct procedures, comprising the Data Virtualization presentation layer and data federation. Data Virtualization and data federation are two different processes. They are brought to life with the use of software tools creating a virtual semantic presentation of data enabling applications to transparently request data that is distributed across multiple storage platforms. Data Virtualization enables the consumer to simply blend the data coming from various sources with the help of consumer-oriented data materialization rules which gives consumers the ability to set what the data would look like when it returns as a result of their query.
If several databases are being used by an organization to store data and someone needs to access one data set stored in MySQL and another data set stored in Microsoft Access, then a data federation mechanism allows the execution of that query against a virtual or a semantic layer. This looks like a single data model, while in the background the tool breaks down the query into its individual elements and takes the query part that goes against the MySQL database and the query part that goes against the Microsoft Access database. This results in the masking of details such as how the data is formatted in the original source and extract insights regardless of inconsistencies.