A large global trade organization (GTO) has a mission to promote the sciences in order to improve the quality of life for all. The organization focuses on publications and membership, with a membership exceeding 150,000 in more than 140 countries. They publish over 70 peer-reviewed journals that are considered cutting edge in research within their domain. Additionally, they organize national and regional conferences to promote the sciences.
Keyrus was brought in to work with the GTO to address very specific meta-data management and data cataloging concerns that their business stakeholders had identified. The existing data platform was a 15+-year-old Oracle system that had supported the business well over the years, but had complex logic that was not documented.
Whenever there were questions on how KPIs were being added into the system, it was a very time-consuming process to figure out what the existing data transformations were doing. To understand what was done, it would require IT to reverse engineer the layers upon layers of code and then document these findings back to the business.
This ultimately created bottlenecks for their lean development team and instilled a lack of confidence in some of the metrics, due to the black box logic.
Keyrus executed a data discovery to assess their entire data platform and provide recommendations on how to resolve their meta-data management needs. This was a 5-week discovery where our team conducted interviews across their key department stakeholders to understand existing data processes, how they use the data, and what capabilities they would like in the future.
We determined that meta-data management was only part of the challenge, and that a recommendation for a data cataloging tool wouldn’t solve all the core issues. While the 15+-year-old system did its job at meeting existing business requirements, there was a need to modernize the platform to accommodate future requirements.
Some of the core future requirements that were discovered were:
A data platform that can scale for growing data volumes and connections without spending significant overhead on IT time
Modernize their BI consumption layer and provide self-service capabilities for business users to generate their own insights
Remove the need to provide manual reports and dashboards to external stakeholders
Provide a more transparent data lineage capability to identify issues
At the end of the discovery, we presented our findings to the main stakeholders at the GTO and proposed a data platform modernization to address the core challenges that expanded outside of the data lineage challenges.
We provided the GTO leadership team with recommendations on future technology stack options, based on the requirements we gathered from business and IT. To supplement the tool comparison, we also conducted a two-week proof-of-concept to demonstrate the capabilities of handling the GTO’s current and future state requirements. We chose Snowflake, Tableau, and Talend and agreed on a plan to execute the modernization effort.
At the implementation stage, our team collaborated with their stakeholders to design and build the future state data platform. First, we reverse engineered the 15-year-old system and looked at a large library of their Java classes and Oracle stored procedures to translate complex and layered logic into readable Talend data pipelines. Creating these readable flows was easier to follow and maintain than existing Java logic, and since Talend leverages a graphical user interface, it enables a larger pool of their developers to generate their own data pipelines, as opposed to being reserved for Java developers.
Lastly, Talend’s ETL tool offers great native connectivity to all of the connections that the GTO would like to leverage in the future, which reduces the time developers would have to spend building a custom connection in the existing solution.
To address their scalability requirement, we leveraged a fully cloud solution, leveraging Amazon Web Services (AWS) to host the Talend ETL engine and Snowflake for their data warehouse. These tools provided a highly elastic solution that would enable the team to scale their resources up and down accordingly.
Leveraging a cloud-based and managed solution allows the GTO to focus on generating business value as opposed to upkeep and patching on-premise infrastructure. On top of the natural scalability of the tools, Keyrus also leveraged Cloud Formation scripts on the AWS side that helps automate standing up the infrastructure.
To create a modern BI consumption layer, we leveraged Tableau as the enterprise reporting tool. Our first effort on the consumption layer was to factor the redundant reports and dashboards that existed within their legacy BusinessObjects stack.
Our team worked closely with key stakeholders to consolidate 30 dashboards and reports into six Tableau dashboards. This instantly created a better user experience, so that users can go to one dashboard as opposed to several to get the data sets they are looking for.
This also reduces the maintenance burden of IT by consolidating these objects. We also collaborated with the GTO’s infrastructure staff to install and deploy a Tableau external server, which allows end users who are outside of their organization’s network to access dashboards and reports. With this external server, their business intelligence team can now deploy an automated dashboard that their external users can access at any time, as opposed to building out manual reports every month.
To tie this all together, we deployed a Talend Data Catalog solution to address the data lineage requirements. This tool provides great native integration with our entire recommended stack (Talend ETL, Snowflake, and Tableau).
Keyrus configured the data catalog to automatically harvest and read the metadata across these different technologies, which automatically create end-to-end data lineage. This allows business users to see the entire journey of all their data assets - how a data element goes from source to data warehouse to dashboard - and how it gets transformed in the entire data process. This provides great visibility that the GTO stakeholders never had before, allowing them to better validate and trust their data.
Our team leveraged best-in-class technologies and layered them together to provide GTO with a custom solution that met a wide range of business needs.
Through proven technical expertise implementing data platform modernizations, we leveraged accelerators (i.e., Cloud Formation scripts) to reduce development time.
Managing different software vendors with different license and pricing structures can be complex. Keyrus operated as a reseller thanks to our relationship with different software partners. This created a one-stop-shop for our customer to get all of their software licenses they needed across different vendors.
Comprehensive assessment of existing infrastructure (ERD diagrams, workflows, interview notes)
Vendor technology comparisons for their entire data infrastructure stack (i.e., data visualization and analytics, data warehousing, ETL, and infrastructure), including executing a proof-of-concept
Detailed design of the new solution inclusive of new server/network infrastructure, data warehouse structure and models, error handling, and reporting principles
Reverse engineered a 15+-year-old system built on legacy Oracle and Java code into a modernized data platform
Stood up an external Tableau server to enable the new data platform to be accessed by their end users outside the company’s network
Conducted training on the new platform to a wide range of audiences, including a larger Tableau training