What is data transformation?
Data transformation is the process of converting data from one format into a different format. Reasons for doing this could be to optimize the data for a different use case than it was originally intended for, or to meet the requirements for storing data in a different system. Data transformation may involve steps such as cleansing, normalizing, structuring, validation, sorting, joining, or enriching data.
# How is data transformation done
See ETL vs. ELT for a comparison of these two approaches.
Additionally, a hybrid approach has recently emerged which is known as EtLT (Extract, “tweak”, Load, Transform). This combines aspects of both ETL and ELT.
# Benefits of data transformation
When used correctly, data transformation can provide the following benefits:
- Improved query-time efficiency and speed.
- Conversion of data into a format that is required by a target system.
- Enrichment of data with additional information that allows insights to be more easily extracted.
- Improved data quality by validating and fixing data, and removal of duplicates.
# Examples of data transformation
Below are some examples of how data may be transformed to achieve some of the benefits mentioned above.
# Improved efficiency and speed
One kind of transformation could be the extraction of structured data from data that is stored in a string. Imagine data that looks as follows:
In order to efficiently process this data in the future, it may preferable to transform this data into additional/new fields, and store it as:
Storing the data in this manner makes it much more efficient to analyze with operations such as:
# Enriching data
Data enrichment is a data transformation that adds additional information to the data that makes new kinds of queries possible.