Data Glossary 🧠

What is Granularity

Last updated Oct 25, 2022 - Edit Source

Data engineering

Declaring the granularity (or grain) is the pivotal step in Dimensional Modeling. The grain establishes exactly what a single fact table row represents. The grain declaration becomes a binding contract on the design. The grain must be declared before choosing dimensions or facts because every candidate dimension or fact must be consistent with the grain. This consistency enforces uniformity on all dimensional designs which is critical to Business Intelligence application performance and ease of use.

For example, in the transformation layer, you must balance low and high granularity. What level do you aggregate and store (e.g., rollups hourly data to daily to save storage), or what valuable dimensions to add. With each dimension and its column added, rows will explode exponentially, and we can’t persist each of these representations to the filesystem.

A Semantic Layer is much more flexible and makes the most sense on top of transformed data in a Data Warehouse. Avoid extensive reshuffles or reprocesses of large amounts of data. Think of OLAP cubes where you can dice-and-slice ad-hoc on significant amounts of data without storing them ahead of time

Read more on Kimball Dimensional Modeling Techniques. Also related is Rollup.

Data Glossary 🧠

What is Granularity

Interactive Graph

Backlinks