Catalogs are the foundation of reliable, interpretable answers in MinusX. They serve as a contract between the analyst and the model — defining exactly what entities, dimensions, and metrics the model is allowed to see and operate on.
0. Catalogs Promote Reusability (DRY philosophy)
Typically dashboards have complex SQL. To run ad-hoc analyses, users copy over the massive SQL and make changes in a new query. These changes will be unmaintained and unvetted. Additionally, they have to redefine any metrics used across your business (such as Avg. Customer Lifetime Value, or Conversion Rate). If this practice frustrates you, you'll fall in love with MinusX Catalogs. Users never have to even worry about redefining anything
1. Catalogs Eliminate Ambiguity
Without a catalog, MinusX has to reason over raw database tables — often filled with unclear names, redundant columns, and overlapping meanings. This leads to incorrect SQL and inconsistent answers.
Catalogs change that. Every field is curated and documented. Only relevant entities, dimensions, and metrics are exposed. Within a catalog, there is no ambiguity — questions map deterministically to SQL using a constrained semantic layer.
2. Catalogs Are a Formal Interface Between Analysts and the Model
Catalogs are written by analysts, but used by business users and LLMs. They function as a formal interface:
Analysts define trusted fields and business logic
The model uses only what’s exposed
Business users get consistent answers without needing to know SQL
Over time, as analysts expose more context — joins, computed fields, derived logic — catalogs can grow deeper. But the model stays bounded to just the curated interface, preventing misuse or confusion.
3. The Model Sees Metrics — But Not Dimension SQL
For metrics, the model sees both the name and the SQL expression. This allows it to reason over aggregations and reuse logic like
AVG(Revenue)
orTotalProfit
.For dimensions, the model sees only the name, type, and description — not the SQL. This means that to the model, even a complex derived column appears just like a regular field on a table.
This design helps in two ways:
Complex business logic is abstracted: Derived columns can encapsulate logic the model doesn't need to reason about.
SQL generation improves: From internal benchmarks, we've observed a clear lift in reliability when the model operates on top of a catalog compared to raw schemas.
In Summary
Catalogs are the semantic boundary between raw data and the model. They:
Promote reusability
Remove ambiguity by curating a clean interface
Enable analysts to safely grow the model’s knowledge over time
Let business users ask natural language questions that map cleanly to SQL
Abstract dimension complexity and expose reusable metric logic
Improve accuracy and trust in answers