Expert's opinion

Cloud turns data transformation on its head

The traditional data transformation procedure of extract, transform and load (ETL) is rapidly being turned on its head in a modern twist enabled by cloud technologies.

Cloud’s lower costs, its flexibility and scalability, and the huge processing capability of cloud data warehouses, have driven a major change: the ability to load all data into the cloud, before transforming it. This trend means that ETL itself has been transformed—into extract, load and transform, or ELT.

ELT offers several advantages, including retention of data granularity, reduced need for expensive software engineers and significantly reduced project turnaround times.

Data transformation

Data is vital for organisations, who use it to understand their customers, identify new opportunities and support decision makers with mission-critical and up-to-date information. However, to analyse data, it must first be structured. It needs to be understood so that it can be pulled into dashboards, reports and predictive models.

The problem is that raw data doesn’t present as beautifully formatted, useable information. That is where data transformation comes in. Messy raw data needs to be transformed into representations of reality that help users accomplish specific goals.

This transformation can take place either before the data is loaded to its destination, usually a data warehouse, or afterwards.

Traditional ETL

In traditional ETL, data is transformed into analysis-ready data models before it is loaded. As Charles Wang of Fivetran notes, “combining transformation with loading into the same step can preserve storage and compute resources, but introduces a great deal of brittleness into the data engineering workflow. This also means that the software used for transformations is typically written using scripting languages such as Python and Java. In addition, transformations in ETL may require a great deal of complex orchestration using tools such as Airflow.”

ETL often also entails a great deal of customised code. One of the main challenges of traditional ETL is therefore accessibility. Scarce, costly resources such as engineers and data scientists need to be involved.

Another issue concerns turnaround times. Traditional ETL procedures associated with on-premise data warehouses are usually extremely time-consuming. Using ETL also involves constant maintenance and can introduce complexity.

Modern approaches to transformation

Storage has traditionally been prohibitively expensive. The benefit of ETL for organisations was that they did not have to load all their data to the final destination. That has now been changed by cloud. We are seeing a massive increase in cloud adoption in South Africa and the costs of technology are decreasing significantly. Lower costs make it possible for organisations to load all their data to the cloud, without having to be as mindful of storage costs.

This means that in the modern ELT workflow, raw data is transformed into analysis-ready data models after it has been loaded. Once in the warehouse, data can be transformed using SQL, which, thanks to its intuitive English-based syntax, can be used by a far wider range of users. Transformation can therefore be done by SQL-literate members of the organisation and not only by those with coding knowledge.

Data transformation today thus leverages cloud-based tools and technologies. These collectively make up what is referred to as the modern data stack (MDS).

Central to this MDS is a powerful cloud data platform, usually a cloud warehouse which can also include data lakes. Data is loaded into it from a variety of source systems including databases, web applications and APIs. To do this, a reliable transformation layer is used to convert raw data into query-ready datasets. Lastly, a collaborative business intelligence and visualisation solution enables the business to interact with the data and draw actionable insights to guide business decisions.

In his article called Data Transformation Explained, Wang points out that the MDS funnels data through the following stages:

  • Sources – data from operational databases, SaaS apps, event tracking

  • Data pipeline – extracts data from sources and loads it into the data warehouse, sometimes normalising it

  • Data warehouse – stores data in a relational database optimised for analytics

  • Data transformation tool – an SQL-based tool that uses data from the source to create new data models within the data warehouse

  • Analytics tool – tools for generating reports and visualisations, such as business intelligence platforms

Transformation within the data warehouse

Transformations are tailored to produce the specific data models organisations need for analytics. Modern ELT separates extraction and loading from transformation. This makes it possible for companies to automate and outsource the extraction and loading stages of the data integration process. They can then use a dedicated SQL-based transformation tool once the data is already in the warehouse.

A major advantage of ELT is that data essentially remains in granular form because it has not undergone major transformation prior to being loaded. With traditional ETL, an organisation may have aggregated specific data before loading, thus losing its original granularity completely.

The new ELT architecture also offers significant performance, flexibility, and cost advantages. Loading is fast, and organisations can save all their data in the data warehouse, even that which they may not currently need.

“Roughly speaking, transformed data models within the data warehouse can be views or materialised views,” notes Wang. He goes on to explain that every time someone accesses a view, the data warehouse runs a query to return the relevant data. These views are not stored. “In an ideal world with zero latency and unlimited computational resources, all transformations would simply be views,” he adds.

By contrast, materialised views are stored on disk because views generated on the fly from a large table or complex query can cause data warehouses to choke.

EtLT

ELT should probably be referred to as EtLT in most cases, as some light-duty transformation, or normalisation, is often carried out before the data is loaded. This removes redundancies, duplicates and derived values. It also organises tables from the data into the clearest possible set of interrelations so that analysts can easily interpret the underlying data model of the source app, and construct new analysis-ready data models accordingly.

“The outputs of the extraction and loading pipelines must be standardised if outsourced, automated ELT is to work,” says Wang. “To correctly normalise the data from a source, you need a keen understanding of the source application’s underlying functionality and data model. The best way to circumvent this challenge is to outsource extraction and loading to a team that has extensive experience with data engineering for that particular source.”

Shaped by the real-world needs of data analysts, Fivetran technology supports agile analytics, enabling data-backed decisions across organisations. Keyrus in partnership with Fivetran provides advisory to a growing number of organisations within South Africa on data migration to the cloud and effective deployment of ELT.

whatsapptwitter
linkedinfacebookworkplace
newsletter.svg

Never miss an insight

Stay updated on the latest articles, events, and more

Your email address is only used to send you the Keyrus newsletter and for commercial prospecting purposes. You can use the link in our emails to opt-out at any time. Learn more about the management of your data and your rights.

Continue reading

Success story

Vector Logistics embraces cloud-powered modern analytic solutions

April 20, 2022

A leading national supply chain specialist now benefits from vastly improved visibility and immediate access to information via self-service following a sophisticated cloud-powered analytics solution delivered by Keyrus.

Success story

Robust cost transparency model for The Foschini Group

April 6, 2022

Following this implementation, TFG is benefitting from insights into the costs of delivering IT and services to the group, as well as the ability to accurately assign costs to users of IT services. For the first time, IT is able to properly explain these chargebacks with accurate, detailed reports.

Success story

Company-wide BI enablement and self-service at a large retailer

March 31, 2022

The leadership of Mr. Price Group, JSE listed retail company and one of the largest retailers in South Africa, had the foresight to invest in the new generation of BI solution and to empower wide business communities to utilise the tool. Keyrus has developed a unique methodology to train business users to extract relevant information for fast, competent decision-making through a self-service.

Press release

Keyrus wins a prestigious Tableau partner award

March 14, 2022

Keyrus, a leading business intelligence and analytics consultancy in South Africa announced that it has been recognised as a 2021 Tableau Partner Award winner for the following category: EMEA Rising Star. This award is a testament to Keyrus’s achievements as a long-term Tableau Partner and dedication to their customer success.

Expert's opinion

How to maximise your BI investment through business-wide enablement

March 3, 2022

Modern business intelligence (BI) technologies have propelled a surge in demand for a company-wide BI enablement as, for the first time, entire business communities are able to obtain answers from company data via self-service. That empowers them to have meaningful conversations about data, leading to more data-driven business culture. Once people throughout the business have visibility into, and take ownership of, the data, the organisation benefits from speed to insight as well as the ability to make rapid, informed decisions.

Event

A proud Industry Insight Sponsor of ITWeb BI & Analytics Summit 2022

February 22, 2022

Enabling a data-driven business in a post-pandemic world is a topic that we at Keyrus feel very strongly about. Our experts will be taking the stage to contribute to this years' BI & Analytics Summit thought leadership and discussion on the most effective ways of driving business-wide enablement and maximizing the value of your data.

Press release

Namibia Breweries appoints Keyrus to implement their Big Data strategy

January 27, 2022

A leading southern African beverage manufacturer has selected Keyrus, a leader in the Data & Digital consultancy space, to partner on a journey of platform and application modernisation.

Expert's opinion

Delivering true value through IT financial transparency

January 25, 2022

In a services focused world, establishing the actual costs of services remains the key focus; greater levels of granularity are often needed to establish these and to deliver true cost transparency. This is especially true in a shared services environment where a range of administrative and support services, such as IT, HR and financial services, are provided to the entire organisation.

Expert's opinion

The future of business intelligence: 6 key trends

November 18, 2021

Business Intelligence (BI) - has undergone significant transformation in recent years and certainly is not being superseded by artificial intelligence. In fact, it has become more efficient and user friendly since moving to the cloud, utilising AI and machine learning, and its embedded implementation. Today's business leaders know that data is a valuable asset that needs to be used effectively to ensure efficient company processes. They require powerful data analysis tools to help them in the decision-making process. Businesses are embracing sophisticated analytics and data science to gain insights and make more informed decisions, the result; turning your BI department into a profit centre.

Event

Live Webinar - Get to Know Tableau with Keyrus

November 10, 2021

Keyrus, in partnership with Tableau, have assisted organisations to transform how they use data to solve complex business problems, and become more data driven organisations, enabling more informed decision making—do you want to do the same? Register for this live webinar to discover the secrets of a self-service approach to visual analytics.