Data Glossary 🧠
What is data transformation?
Data transformation is the process of converting data from one format into a different format. Reasons for doing this could be to optimize the data for a different use case than it was originally intended for, or to meet the requirements for storing data in a different system. Data transformation may involve steps such as cleansing, normalizing, structuring, validation, sorting, joining, or enriching data.
# How is data transformation done
Data is often transformed as part of an ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) approach to data integration.
See ETL vs. ELT for a comparison of these two approaches.
Additionally, a hybrid approach has recently emerged which is known as EtLT (Extract, “tweak”, Load, Transform). This combines aspects of both ETL and ELT.
# Benefits of data transformation
When used correctly, data transformation can provide the following benefits:
- Improved query-time efficiency and speed.
- Conversion of data into a format that is required by a target system.
- Enrichment of data with additional information that allows insights to be more easily extracted.
- Improved data quality by validating and fixing data, and removal of duplicates.
# Examples of data transformation
Below are some examples of how data may be transformed to achieve some of the benefits mentioned above.
# Improved efficiency and speed
One kind of transformation could be the extraction of structured data from data that is stored in a string. Imagine data that looks as follows:
In order to efficiently process this data in the future, it may preferable to transform this data into additional/new fields, and store it as:
Storing the data in this manner makes it much more efficient to analyze with operations such as:
# Enriching data
Data enrichment is a data transformation that adds additional information to the data that makes new kinds of queries possible.