Data Glossary 🧠

Search

Search IconIcon to open search

What is data transformation?

Last updated Dec 14, 2022 - Edit Source

Data transformation is the process of converting data from one format into a different format. Reasons for doing this could be to optimize the data for a different use case than it was originally intended for, or to meet the requirements for storing data in a different system. Data transformation may involve steps such as cleansing, normalizing, structuring, validation, sorting, joining, or enriching data.

# How is data transformation done

Data is often transformed as part of an ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) approach to data integration.

See ETL vs. ELT for a comparison of these two approaches.

Additionally, a hybrid approach has recently emerged which is known as EtLT (Extract, “tweak”, Load, Transform). This combines aspects of both ETL and ELT.

# Benefits of data transformation

When used correctly, data transformation can provide the following benefits:

# Examples of data transformation

Below are some examples of how data may be transformed to achieve some of the benefits mentioned above.

# Improved efficiency and speed

One kind of transformation could be the extraction of structured data from data that is stored in a string. Imagine data that looks as follows:

1
input_string: "Bob is 29"

In order to efficiently process this data in the future, it may preferable to transform this data into additional/new fields, and store it as:

1
2
name: "Bob"
age: 29

Storing the data in this manner makes it much more efficient to analyze with operations such as:

1
SELECT * FROM X where Age=29

# Enriching data

Data enrichment is a data transformation that adds additional information to the data that makes new kinds of queries possible.