Data Glossary 🧠

Search

Search IconIcon to open search

What is a Data Lake File Format?

Last updated Sep 8, 2022 - Edit Source

Data lake file formats are the new CSVs on the cloud. They are more column-oriented and compress large files with added features. The main players here are Apache Parquet, Apache Avro, and Apache Arrow. It’s the physical store with the actual files distributed around different buckets on your  Object Store.

You can build more features with Data Lake Table Format on top. Read more on our Data Lake and Lakehouse Guide.