ishan's notes

Metrics layer tools


Reading about multiple projects on Twitter that provide a metrics layer against the database. For example - dbt metrics, Transform, Supergrain. Thoughts from what I make of it below.

The product webpages use the term Headless BI or metrics layer, none of them super informative. I'd expect the pitch roughly goes like this.

The typical analytics pipeline looks like this

Where it gets difficult

What a metrics layer does

How it helps

Could it be done with regular SQL?

Not sure, and that has to do with SQL not being very compositional. An analytics query might involve an aggregation over a column value when the data is grouped by a related column in another table. For example,

SELECT AVG(price) 
FROM orders
JOIN customers
    ON orders.customer_id = customers.id
GROUP BY 
    customers.country

The metrics layer approach is neat in that you kind of split the SQL definition between the model file and the runtime query.

// model definition
- relation: orders JOIN customers ON orders.customer_id = customers.id
- metric: average_order_size = AVG(price)

// runtime query looks like
GET average_order_size BY country

The query now look more comprehensible and more approachable to a business user. There is a clear delineation in the scope of what the engineers should deal with vs the analysts, even taking some autonomy away from the latter. Though I can definitely see the benefits from it.

Conclusion

So, the metrics layer concept is clearly more work, though YMMV in the utility you get from it. It does introduce discipline to the analytics workflow and better... umm, separation of concerns between engineering and analysts. When setting up an analytics workflow from scratch, the concept does feel like a useful primitive.