The traditional data transformation procedure of extract, transform and load (ETL) is rapidly being turned on its head in a modern twist enabled by cloud technologies.
Cloud’s lower costs, its flexibility and scalability, and the huge processing capability of cloud data warehouses, have driven a major change: the ability to load all data into the cloud, before transforming it. This trend means that ETL itself has been transformed—into extract, load and transform, or ELT.
ELT offers several advantages, including retention of data granularity, reduced need for expensive software engineers and significantly reduced project turnaround times.
Data is vital for organisations, who use it to understand their customers, identify new opportunities and support decision makers with mission-critical and up-to-date information. However, to analyse data, it must first be structured. It needs to be understood so that it can be pulled into dashboards, reports and predictive models.
The problem is that raw data doesn’t present as beautifully formatted, useable information. That is where data transformation comes in. Messy raw data needs to be transformed into representations of reality that help users accomplish specific goals.
This transformation can take place either before the data is loaded to its destination, usually a data warehouse, or afterwards.
In traditional ETL, data is transformed into analysis-ready data models before it is loaded. As Charles Wang of Fivetran notes, “combining transformation with loading into the same step can preserve storage and compute resources, but introduces a great deal of brittleness into the data engineering workflow. This also means that the software used for transformations is typically written using scripting languages such as Python and Java. In addition, transformations in ETL may require a great deal of complex orchestration using tools such as Airflow.”
ETL often also entails a great deal of customised code. One of the main challenges of traditional ETL is therefore accessibility. Scarce, costly resources such as engineers and data scientists need to be involved.
Another issue concerns turnaround times. Traditional ETL procedures associated with on-premise data warehouses are usually extremely time-consuming. Using ETL also involves constant maintenance and can introduce complexity.
Storage has traditionally been prohibitively expensive. The benefit of ETL for organisations was that they did not have to load all their data to the final destination. That has now been changed by cloud. We are seeing a massive increase in cloud adoption in South Africa and the costs of technology are decreasing significantly. Lower costs make it possible for organisations to load all their data to the cloud, without having to be as mindful of storage costs.
This means that in the modern ELT workflow, raw data is transformed into analysis-ready data models after it has been loaded. Once in the warehouse, data can be transformed using SQL, which, thanks to its intuitive English-based syntax, can be used by a far wider range of users. Transformation can therefore be done by SQL-literate members of the organisation and not only by those with coding knowledge.
Data transformation today thus leverages cloud-based tools and technologies. These collectively make up what is referred to as the modern data stack (MDS).
Central to this MDS is a powerful cloud data platform, usually a cloud warehouse which can also include data lakes. Data is loaded into it from a variety of source systems including databases, web applications and APIs. To do this, a reliable transformation layer is used to convert raw data into query-ready datasets. Lastly, a collaborative business intelligence and visualisation solution enables the business to interact with the data and draw actionable insights to guide business decisions.
In his article called Data Transformation Explained, Wang points out that the MDS funnels data through the following stages:
Sources – data from operational databases, SaaS apps, event tracking
Data pipeline – extracts data from sources and loads it into the data warehouse, sometimes normalising it
Data warehouse – stores data in a relational database optimised for analytics
Data transformation tool – an SQL-based tool that uses data from the source to create new data models within the data warehouse
Analytics tool – tools for generating reports and visualisations, such as business intelligence platforms
Transformations are tailored to produce the specific data models organisations need for analytics. Modern ELT separates extraction and loading from transformation. This makes it possible for companies to automate and outsource the extraction and loading stages of the data integration process. They can then use a dedicated SQL-based transformation tool once the data is already in the warehouse.
A major advantage of ELT is that data essentially remains in granular form because it has not undergone major transformation prior to being loaded. With traditional ETL, an organisation may have aggregated specific data before loading, thus losing its original granularity completely.
The new ELT architecture also offers significant performance, flexibility, and cost advantages. Loading is fast, and organisations can save all their data in the data warehouse, even that which they may not currently need.
“Roughly speaking, transformed data models within the data warehouse can be views or materialised views,” notes Wang. He goes on to explain that every time someone accesses a view, the data warehouse runs a query to return the relevant data. These views are not stored. “In an ideal world with zero latency and unlimited computational resources, all transformations would simply be views,” he adds.
By contrast, materialised views are stored on disk because views generated on the fly from a large table or complex query can cause data warehouses to choke.
ELT should probably be referred to as EtLT in most cases, as some light-duty transformation, or normalisation, is often carried out before the data is loaded. This removes redundancies, duplicates and derived values. It also organises tables from the data into the clearest possible set of interrelations so that analysts can easily interpret the underlying data model of the source app, and construct new analysis-ready data models accordingly.
“The outputs of the extraction and loading pipelines must be standardised if outsourced, automated ELT is to work,” says Wang. “To correctly normalise the data from a source, you need a keen understanding of the source application’s underlying functionality and data model. The best way to circumvent this challenge is to outsource extraction and loading to a team that has extensive experience with data engineering for that particular source.”
Shaped by the real-world needs of data analysts, Fivetran technology supports agile analytics, enabling data-backed decisions across organisations. Keyrus in partnership with Fivetran provides advisory to a growing number of organisations within South Africa on data migration to the cloud and effective deployment of ELT.
Stay updated on the latest articles, events, and more
Business Intelligence (BI) - has undergone significant transformation in recent years and certainly is not being superseded by artificial intelligence. In fact, it has become more efficient and user friendly since moving to the cloud, utilising AI and machine learning, and its embedded implementation. Today's business leaders know that data is a valuable asset that needs to be used effectively to ensure efficient company processes. They require powerful data analysis tools to help them in the decision-making process. Businesses are embracing sophisticated analytics and data science to gain insights and make more informed decisions, the result; turning your BI department into a profit centre.
Keyrus, in partnership with Tableau, have assisted organisations to transform how they use data to solve complex business problems, and become more data driven organisations, enabling more informed decision making—do you want to do the same? Register for this live webinar to discover the secrets of a self-service approach to visual analytics.
An international player in consulting and technology as well as a Data and Digital specialist, Keyrus has announced new goals for the market via a renewed and inspiring vision. A vision underpinned by its 25 years of global and 10 years of South African expertise in the data and digital sector.
The BI landscape is continually evolving, and rapidly too. The transition from IT-driven BI delivery to self-service analytics continues at pace and is now enhanced by the latest major trend of embedded analytics. The BI environment is no longer something separate from line of business systems. BI and advanced analytics are now being embedded within core operational systems with tangible benefits. So says Barry Burdis, head of delivery at Keyrus, who presented on ‘The value of embedded analytics’, at the ITWeb Business Intelligence Summit 2020.
By Craig Andrew, Senior Consultant and Tableau specialist at Keyrus in South Africa. Choosing the right tool to enable users across the company to perform analytics, publish dashboards and share discoveries is essential for businesses of all sizes to drive success through leveraging data.
National supply chain specialist, Vector Logistics, is reaping significant benefits from an automated data analytics solution implemented by Keyrus. These include substantial time savings, reduced key person dependency and improved efficiency.
An ambitious, multi-phase digitalisation and automatisation project has provided unparalleled visibility process at SA’s largest mineral freight logistics company Grindrod. Keyrus, a leading business data intelligence consultancy, used cloud-based technologies and Industrial Internet of Things (IIoT) to develop a comprehensive solution that gives Grindrod’s Terminal business significantly improved performance of their operations. The solution highlights the power of modern technology to solve business challenges. It includes a combination of data streaming, analytics, logic matching and visualisation technologies and was designed using a range of Microsoft Azure services, Microsoft SQL Server Integration Services (SSIS) and Tableau visualisation software.
Grindrod can now accurately report on the size of the stockpiles throughout its southern African terminals. This was another module delivered by Keyrus in an ambitious, multi-phase development project that has achieved complete digitalisation of the logistics giant’s Terminal business.
Turning data into meaningful business information Keyrus implements Tableau as embedded visibility layer for Grindrod optimisation project Powerful analytics and data visualisation are critical components of any data automation implementation. For this reason, leading business data intelligence consultancy Keyrus chose Tableau Embedded Analytics as the key visibility layer when appointed to digitalise Grindrod’s southern African Terminal operations. The project has enabled the logistics giant to optimise its entire logistics process and Tableau is playing a key role in enabling faster, data-driven decision-making across the business.
SA’s largest mineral freight logistics company now has comprehensive visibility into its disparate third-party industrial systems. Leading business data intelligence consultancy Keyrus has integrated data from various Grindrod Terminals’ stand-alone industrial systems into the organisation’s operational data to provide a single version of the truth, accessible via dashboards and their Visibility System.