Our digital specialist, Adam Rahnejat, outlines the role of data virtualisation in application architecture and the technology’s key benefits and limitations.
The pivotal role that data plays in financial institutions cannot be understated. It is the driving force behind business intelligence, operational efficiency and improved customer insight. However, for this potential to be fully realised, financial institutions must improve their data management capabilities. Data virtualisation (DV) is a powerful data management solution that delivers the benefits of data federation and data warehousing1 while avoiding their drawbacks; i.e. prohibitively strict data models, performance issues and costly infrastructure. Moreover, the benefits of DV can be quick to realise, affording to its fast deployment time. So, what is DV and what does it have to offer?
Data virtualisation enables data consumers (both users and systems) to retrieve and manipulate heterogeneous data2 stored across disparate sources, as though it were consolidated in a single location - and with no technical knowledge of its format. In application architecture, a data virtualisation layer (DVL) manages the technical demands of handling diverse storage structures, access languages and APIs.3
The layer performs data aggregation, cleansing and enrichment operations4 to service multiple consumers without replicating the original data. Applications interface directly with the layer and receive a unified and normalised view of the virtual data held by the data sources that the layer fronts. A DV solution can be employed as the core element of a virtualised application architecture or as a specialised feature catering to specific business requirements; e.g. an abstraction5 layer providing specific, real-time data analytics.
Data virtualisation layer
FIG 1. VIRTUALISED APPLICATION ARCHITECTURE:
A data virtualisation layer abstracts data from disparate sources presenting a unified and normalised view of aggregated data to consumers on demand while handling the technical challenges of orchestrating access to the underlying data.
The inclusion of DV technology in application architecture offers a unique combination of benefits:
On-demand aggregation and rationalisation
Data derived from a DVL is curated such that only data relevant to the recipient is transferred. Data can be aggregated from several sources and is rationalised6 in real-time. This ensures that recipients receive a consistent, unified view of the data in a normalised standard, irrespective of its original condition. Moreover, DV delivers the majority of data warehouse functionality without the need for ETL,7 meaning that MI and BI8 capabilities can be performed on demand, without the need for data replication or processing that requires complex and expensive infrastructure. Subsequently, DV technology delivers significant cost reductions and can be implemented much faster than conventional solutions.
Data discovery and security
A DVL reinforces data integrity by exposing the entire data supply chain fronted by the layer to authorised consumers, revealing duplicated and erroneous data. The layer can also facilitate access to third-party and unstructured data as if it were consolidated as part of a single virtual scheme.9 For these reasons, DV supports much more powerful MI and BI, capable of tapping into data that would be unusable otherwise. Furthermore, a DVL acts as gatekeeper to the data held therein, offering an additional layer of centralised security. Subsequently, access privilege and priority can be governed from a single location and new governance models, regulations and schema can be readily adopted across the entire architecture.
Modularity and Integration
As data underlying a DVL is abstracted, application code is effectively decoupled from data sources. This is essential in the development of a modular environment:10 a popular, modern architectural approach. This design significantly reduces the architectural complexity typically associated with legacy architectures as data communication is orchestrated through a single control point.
Consequently, individual modules can be introduced or modified with minimal disruption to the system as a whole. Moreover, as the DVL accommodates all communication, integration of new applications and data sources into an existing, virtualised architecture is greatly simplified and expedited. As the layer is technology-neutral and non-proprietary by design, vendor lock-in11 and technology dependency are also effectively mitigated.
Despite these benefits, DV has a number of limitations:
- The incorporation of DV into an existing architecture necessitates the adoption of new communication paradigms by existing applications. This presents a level of disruption with associated development time and costs.
- Furthermore, a DVL can become a single point of failure, as the client is reliant on the provider’s ability to avoid downtime and remain up to date and secure. If the layer has insufficient capacity to handle demand response times may suffer.
- Most importantly, DV technology provides an on-demand service; data is transferred through the layer, not stored. Thus, historic data cannot be stored in its virtualised form without purpose-built storage facilities. This means that virtual data is not directly auditable; this being the key benefit of a data warehouse. Consequently, DV is promoted as a means to augment data warehousing technology where this functionality is required.
The real power of DV lies in the ability to expose the entirety of a pre-existing data supply chain in business semantics while masking the underlying technical challenges of accomplishing this and doing so without the need for significant architectural and operational upheaval. As financial institutions continue feeling the increased burden of effective data management, DV is proving to be an important tool in resolving the key data management challenges commonly faced in the industry.
1 Data warehouse: A central repository for integrated data from one or more sources for analysis (i.e. business intelligence and management information).
2 Heterogeneous data: A collection of data of dissimilar typology.
3 Application programming interface (API): A complete set of clearly-defined communication methods for accessing data or functions within an application.
4 Data aggregation: Compilation or summarisation of data from dispersed, disparate sources.
Data cleansing: Identification and correction of corrupt, inaccurate, incomplete or conflicting data.
Data enrichment: Enhancement or refinement of raw data.
5 Abstraction: The separation of usability and implementation to facilitate exploitation of underlying functionality.
6 Rationalised data: Data that has been standardised by aggregation, cleansing and enrichment.
7 Extract, transform, load (ETL): The process of organising and storing large quantities of aggregated data en masse for consumption.
8 Management information (MI): Data converted into meaningful information for monitoring business performance.
Business intelligent (BI): Techniques for analysing management information to inform strategic decisions.
9 Schema: A description of a formally defined data storage structure.
10 Modular architecture: A design that emphases the separation of functionality into independent modules comprised of applications and their respective databases.
11 Vendor lock-in: A tendency to overcommit to a single vendor and become excessively reliant on their platform or technology.