The traditional data transformation procedure of extract, transform and load (ETL) is rapidly being turned on its head in a modern twist enabled by cloud technologies.
Cloud’s lower costs, its flexibility and scalability, and the huge processing capability of cloud data warehouses, have driven a major change: the ability to load all data into the cloud, before transforming it. This trend means that ETL itself has been transformed—into extract, load and transform, or ELT.
ELT offers several advantages, including retention of data granularity, reduced need for expensive software engineers and significantly reduced project turnaround times.
Data is vital for organisations, who use it to understand their customers, identify new opportunities and support decision makers with mission-critical and up-to-date information. However, to analyse data, it must first be structured. It needs to be understood so that it can be pulled into dashboards, reports and predictive models.
The problem is that raw data doesn’t present as beautifully formatted, useable information. That is where data transformation comes in. Messy raw data needs to be transformed into representations of reality that help users accomplish specific goals.
This transformation can take place either before the data is loaded to its destination, usually a data warehouse, or afterwards.
In traditional ETL, data is transformed into analysis-ready data models before it is loaded. As Charles Wang of Fivetran notes, “combining transformation with loading into the same step can preserve storage and compute resources, but introduces a great deal of brittleness into the data engineering workflow. This also means that the software used for transformations is typically written using scripting languages such as Python and Java. In addition, transformations in ETL may require a great deal of complex orchestration using tools such as Airflow.”
ETL often also entails a great deal of customised code. One of the main challenges of traditional ETL is therefore accessibility. Scarce, costly resources such as engineers and data scientists need to be involved.
Another issue concerns turnaround times. Traditional ETL procedures associated with on-premise data warehouses are usually extremely time-consuming. Using ETL also involves constant maintenance and can introduce complexity.
Storage has traditionally been prohibitively expensive. The benefit of ETL for organisations was that they did not have to load all their data to the final destination. That has now been changed by cloud. We are seeing a massive increase in cloud adoption in South Africa and the costs of technology are decreasing significantly. Lower costs make it possible for organisations to load all their data to the cloud, without having to be as mindful of storage costs.
This means that in the modern ELT workflow, raw data is transformed into analysis-ready data models after it has been loaded. Once in the warehouse, data can be transformed using SQL, which, thanks to its intuitive English-based syntax, can be used by a far wider range of users. Transformation can therefore be done by SQL-literate members of the organisation and not only by those with coding knowledge.
Data transformation today thus leverages cloud-based tools and technologies. These collectively make up what is referred to as the modern data stack (MDS).
Central to this MDS is a powerful cloud data platform, usually a cloud warehouse which can also include data lakes. Data is loaded into it from a variety of source systems including databases, web applications and APIs. To do this, a reliable transformation layer is used to convert raw data into query-ready datasets. Lastly, a collaborative business intelligence and visualisation solution enables the business to interact with the data and draw actionable insights to guide business decisions.
In his article called Data Transformation Explained, Wang points out that the MDS funnels data through the following stages:
Sources – data from operational databases, SaaS apps, event tracking
Data pipeline – extracts data from sources and loads it into the data warehouse, sometimes normalising it
Data warehouse – stores data in a relational database optimised for analytics
Data transformation tool – an SQL-based tool that uses data from the source to create new data models within the data warehouse
Analytics tool – tools for generating reports and visualisations, such as business intelligence platforms
Transformations are tailored to produce the specific data models organisations need for analytics. Modern ELT separates extraction and loading from transformation. This makes it possible for companies to automate and outsource the extraction and loading stages of the data integration process. They can then use a dedicated SQL-based transformation tool once the data is already in the warehouse.
A major advantage of ELT is that data essentially remains in granular form because it has not undergone major transformation prior to being loaded. With traditional ETL, an organisation may have aggregated specific data before loading, thus losing its original granularity completely.
The new ELT architecture also offers significant performance, flexibility, and cost advantages. Loading is fast, and organisations can save all their data in the data warehouse, even that which they may not currently need.
“Roughly speaking, transformed data models within the data warehouse can be views or materialised views,” notes Wang. He goes on to explain that every time someone accesses a view, the data warehouse runs a query to return the relevant data. These views are not stored. “In an ideal world with zero latency and unlimited computational resources, all transformations would simply be views,” he adds.
By contrast, materialised views are stored on disk because views generated on the fly from a large table or complex query can cause data warehouses to choke.
ELT should probably be referred to as EtLT in most cases, as some light-duty transformation, or normalisation, is often carried out before the data is loaded. This removes redundancies, duplicates and derived values. It also organises tables from the data into the clearest possible set of interrelations so that analysts can easily interpret the underlying data model of the source app, and construct new analysis-ready data models accordingly.
“The outputs of the extraction and loading pipelines must be standardised if outsourced, automated ELT is to work,” says Wang. “To correctly normalise the data from a source, you need a keen understanding of the source application’s underlying functionality and data model. The best way to circumvent this challenge is to outsource extraction and loading to a team that has extensive experience with data engineering for that particular source.”
Shaped by the real-world needs of data analysts, Fivetran technology supports agile analytics, enabling data-backed decisions across organisations. Keyrus in partnership with Fivetran provides advisory to a growing number of organisations within South Africa on data migration to the cloud and effective deployment of ELT.
Stay updated on the latest articles, events, and more
A leading national supply chain specialist now benefits from vastly improved visibility and immediate access to information via self-service following a sophisticated cloud-powered analytics solution delivered by Keyrus.
Following this implementation, TFG is benefitting from insights into the costs of delivering IT and services to the group, as well as the ability to accurately assign costs to users of IT services. For the first time, IT is able to properly explain these chargebacks with accurate, detailed reports.
The leadership of Mr. Price Group, JSE listed retail company and one of the largest retailers in South Africa, had the foresight to invest in the new generation of BI solution and to empower wide business communities to utilise the tool. Keyrus has developed a unique methodology to train business users to extract relevant information for fast, competent decision-making through a self-service.
Keyrus, a leading business intelligence and analytics consultancy in South Africa announced that it has been recognised as a 2021 Tableau Partner Award winner for the following category: EMEA Rising Star. This award is a testament to Keyrus’s achievements as a long-term Tableau Partner and dedication to their customer success.
Modern business intelligence (BI) technologies have propelled a surge in demand for a company-wide BI enablement as, for the first time, entire business communities are able to obtain answers from company data via self-service. That empowers them to have meaningful conversations about data, leading to more data-driven business culture. Once people throughout the business have visibility into, and take ownership of, the data, the organisation benefits from speed to insight as well as the ability to make rapid, informed decisions.
Enabling a data-driven business in a post-pandemic world is a topic that we at Keyrus feel very strongly about. Our experts will be taking the stage to contribute to this years' BI & Analytics Summit thought leadership and discussion on the most effective ways of driving business-wide enablement and maximizing the value of your data.
A leading southern African beverage manufacturer has selected Keyrus, a leader in the Data & Digital consultancy space, to partner on a journey of platform and application modernisation.
In a services focused world, establishing the actual costs of services remains the key focus; greater levels of granularity are often needed to establish these and to deliver true cost transparency. This is especially true in a shared services environment where a range of administrative and support services, such as IT, HR and financial services, are provided to the entire organisation.
Business Intelligence (BI) - has undergone significant transformation in recent years and certainly is not being superseded by artificial intelligence. In fact, it has become more efficient and user friendly since moving to the cloud, utilising AI and machine learning, and its embedded implementation. Today's business leaders know that data is a valuable asset that needs to be used effectively to ensure efficient company processes. They require powerful data analysis tools to help them in the decision-making process. Businesses are embracing sophisticated analytics and data science to gain insights and make more informed decisions, the result; turning your BI department into a profit centre.
Keyrus, in partnership with Tableau, have assisted organisations to transform how they use data to solve complex business problems, and become more data driven organisations, enabling more informed decision making—do you want to do the same? Register for this live webinar to discover the secrets of a self-service approach to visual analytics.