How NEOS’ Data Lake Implementation Improves Operational Agility While Reducing Costs  

INTRODUCTION

This engagement was a multi-year effort to increase operational excellence, reduce overall costs, and improve operational agility for the investment division of a global insurance company. 

A key part of this vision was to enable the strategic use of data as an asset by leveraging a robust aggregation, analytics, and reporting capability, often referred to as a data lake. NEOS worked with the client to define a data lake solution that serves the needs of not only the Investment Division, but the rest of the firm as well – an enterprise data lake (EDL).  In addition to the initial work performed by NEOS, expertise was needed in defining the solution architecture and implementation plan for the data lake within the client’s reporting and analytics workstream. NEOS designed and implemented the entire process of data intake and curation as well as the orchestration between platforms such as Bl required to ensure it followed a repeatable nightly process.  

WHY NOW?  

The client had started their global optimization data workstream to address four key pain points:  

  • a lack of centralized authoritative sources and access to appropriate data to make decisions  
  • untrusted data sources or data not fit for purpose  
  • a lack of appropriate data oversight  
  • management of business incidents caused from not managing data appropriately.  

The program had started to identify an overall architecture but needed help in architecting a data lake, identifying key use cases, and road mapping to implement the lake while progressively gaining value from it.  

 HOW WE DID IT:

Strategy  

  • Worked with stakeholders to define their business and technical requirements and create use cases  
  • Designed a solution architecture approach that focused on four data layers: Ingestion, Curation Zone, Access, and Business Intelligence  
  • Collaborated with client technical resources to define the target architecture and environments  
  • Defined a repeatable approach for rapid delivery of business-driven solutions consisting of technical components as well as methodologies 
  • Data Lake Architecture – Identified and designed strategic capabilities of the Enterprise Data Lake  
  • Roadmap – created and implemented a multi-faceted implementation roadmap to add capabilities and complexities to the data lake while also meeting the program goal and 7-month implementation target.  

Implementation  

  • Created templatized ingestion processes to quickly ingest new data sources into the lake and expose as structured tables  
  • Implemented specific opportunities and associated business cases for reporting and data analytics  
  • Added 20+ data sources of varying complexity to the lake including integrations with source systems such as Simcorp and Bloomberg Polar Lake  
  • Curated all sources into a canonical model  
  • Created multiple data marts for consumers to easily digest the data they wanted in a convenient format  
  • Continued to enhance and performance tune ingestion and curation processes to accommodate additional data sources and reduce overall processing time. 

WHAT WE DELIVERED:  

In the first phase of the project, nine data sources were ingested into the data lake (including SimCorp and Bloomberg Polar Lake data), curated, with three data marts being created. Processes were developed to ingest data and update the data marts daily including quality checking the feeds and curation.   

This was accomplished in the first six months and was the only part of the program to be production ready on time. Over the next three years, the sources grew to 25+, including data from FactSet and Sylvan, and ingestion happens multiple times daily. The consumer side grew to 40 data marts with eight teams of consumers using them daily and others accessing on an ad-hoc basis.  

The client now has a 5-zone data lake for delivering updated consumable data marts for reporting based on several disparate data sources including Bloomberg Polar Lake Data, SimCorp general ledger data, Sylvan performance data, Factset data and many custom internal sources.  

Due to the nature of the data lake ingestion changes and enhancements, reporting can be done in hours instead of months. The project was delivered on-time and within budget even while new data sources were being introduced and changed. The client is now able to manage data intake from North America and Asia within SLAs, enriching their data lake with significantly more global data which results in enhanced business insights to consumers and data scientists.  

For more information on Delivering Data at Scale please contact Robert Nocera.