Data Glossary 🧠


Search IconIcon to open search

What is a Data Lake?

Last updated Oct 25, 2022 - Edit Source

A Data Lake is a storage system with vast amounts of unstructured and structured data, stored as-is, without a specific purpose in mind, that can be built on multiple technologies such as Hadoop, NoSQL, Amazon Simple Storage Service, a relational database, or various combinations and different formats (e.g. Excel, CSV, Text, Logs, etc.).

According to  Hortonworks Data Lake Whitepaper, the data lake arose because new types of data needed to be captured and exploited by the enterprise. As this data became increasingly available, early adopters discovered that they could extract insight through new applications built to serve the business. The data lake supports the following capabilities:

The initial concept was created by Databricks in the  CIDR Paper in 2021. Read more on our  Data Lake and Lakehouse Guide.