Skip to main content

Why Write Catalogs in MinusX

Increase reusability, reliability, and trust

S
Written by Sreejith Puthanpurayil
Updated this week

Catalogs are the foundation of reliable, interpretable answers in MinusX. They serve as a contract between the analyst and the model — defining exactly what entities, dimensions, and metrics the model is allowed to see and operate on.


0. Catalogs Promote Reusability (DRY philosophy)

Typically dashboards have complex SQL. To run ad-hoc analyses, users copy over the massive SQL and make changes in a new query. These changes will be unmaintained and unvetted. Additionally, they have to redefine any metrics used across your business (such as Avg. Customer Lifetime Value, or Conversion Rate). If this practice frustrates you, you'll fall in love with MinusX Catalogs. Users never have to even worry about redefining anything

1. Catalogs Eliminate Ambiguity

Without a catalog, MinusX has to reason over raw database tables — often filled with unclear names, redundant columns, and overlapping meanings. This leads to incorrect SQL and inconsistent answers.

Catalogs change that. Every field is curated and documented. Only relevant entities, dimensions, and metrics are exposed. Within a catalog, there is no ambiguity — questions map deterministically to SQL using a constrained semantic layer.


2. Catalogs Are a Formal Interface Between Analysts and the Model

Catalogs are written by analysts, but used by business users and LLMs. They function as a formal interface:

  • Analysts define trusted fields and business logic

  • The model uses only what’s exposed

  • Business users get consistent answers without needing to know SQL

Over time, as analysts expose more context — joins, computed fields, derived logic — catalogs can grow deeper. But the model stays bounded to just the curated interface, preventing misuse or confusion.


3. The Model Sees Metrics — But Not Dimension SQL

  • For metrics, the model sees both the name and the SQL expression. This allows it to reason over aggregations and reuse logic like AVG(Revenue) or TotalProfit.

  • For dimensions, the model sees only the name, type, and description — not the SQL. This means that to the model, even a complex derived column appears just like a regular field on a table.

This design helps in two ways:

  • Complex business logic is abstracted: Derived columns can encapsulate logic the model doesn't need to reason about.

  • SQL generation improves: From internal benchmarks, we've observed a clear lift in reliability when the model operates on top of a catalog compared to raw schemas.


In Summary

Catalogs are the semantic boundary between raw data and the model. They:

  • Promote reusability

  • Remove ambiguity by curating a clean interface

  • Enable analysts to safely grow the model’s knowledge over time

  • Let business users ask natural language questions that map cleanly to SQL

  • Abstract dimension complexity and expose reusable metric logic

  • Improve accuracy and trust in answers

Did this answer your question?