What is a Data Catalog?
A Data Catalog is a centralized store where all metadata data about your data is made searchable.
Think about a Google Search for your internal Metadata. This is vital, as with Data Lake and other data stores, and you want the ability to search for your data. Data is growing exponentially, with 90% of the world’s data being generated alone in the last two years. It’s hard to keep this amount over time. A data catalog solves the problem of the fast-growing handling of data internally.
An interesting read about the beginning of the Data Catalog is explained in the 2017 published paper about a Data Context Service.
See a High-Level Feature Comparison by the Awesome Data Discovery and Observability list on GitHub (check out the link for more):
Or a great overview by Sarah Krasnik on Choosing a Data Catalog: