Data Glossary 🧠
Search
What is Data Observability?
Data observability, also known as monitoring, continuously collects metrics about your data. You can collect data about the number of rows, columns, and properties for each dataset. You can also manage metadata about the dataset, such as when it was last updated.
From the great article Choosing a Data Quality Tool - by Sarah Krasnik, there are also different categories for observability:
- Auto-profiling data
- Bigeye: unique in a wide range of ML-driven automatic threshold tests and alerts
- Datafold: unique Github integration presenting Data Diff between environments with custom tests
- Monte Carlo: unique in being the most enterprise-ready enterprise-ready with many data lake integrations
- Lightup: unique self-hosted deployment option, appealing to highly regulated industries
- Metaplane: unique in a high level of configuration for a hosted tool with both out-of-the-box and custom tests
- Pipeline Testing
- Great Expectations: unique in its data quality specific community and automatic documentation of tests
- Soda: unique in its self-hosted cloud option
- dbt tests: unique in integration with dbt core and dbt Cloud builds (naturally), but not as versatile outside of the dbt ecosystem
- Infrastructure monitoring
- A little bit of everything
- Databand: unique integration with Airflow and specific Airflow metric monitoring
- Unravel: unique support for other data sources like Spark, data lake, and NoSQL databases
- Data Catalogs: Helping observe existing data
Related terms are Data Governance and Data Quality.