Prerequisite: Advanced Catalogs
Why write catalogs?
Large Language Models (LLMs) struggle to reliably generate SQL when given only a natural language question and a raw database schema. They often pick incorrect tables or columns, leading to broken or misleading results.
To solve this, MinusX introduces catalogs — YAML-based data models that define the key entities, dimensions, and metrics a model is allowed to access. The model generates SQL from this simplified and semantically meaningful layer, not the full schema.
Well-written catalogs are the single most effective way to improve the reliability of answers. Our internal measurements show a significant improvement in both accuracy and consistency when questions are answered through well-scoped catalogs.
What makes a catalog effective?
An effective catalog is unambiguous. It reduces room for misinterpretation and ensures that a user's question can be deterministically mapped to SQL.
Here are guidelines to ensure clarity:
Use distinct and descriptive names
Avoid similar names across entities, dimensions, and metrics. Ifsignup_date
andcreated_date
exist, make their purpose crystal clear by naming and description.Add clear descriptions where overlap is possible
When names alone aren't enough to eliminate ambiguity, use thedescription
field to explain the purpose, differences, and business meaning behind each field.Remember what the model sees
The model doesn't see the underlying SQL of custom dimensions, only metrics. It only sees the name, type, and description — as if it were navigating a table with columns. Treat naming & descriptions as your primary UX.
By default, if you expose raw database tables, the model operates over too many entities and too much noise — leading to errors. A catalog filters and reshapes this into a reliable, opinionated model of your business logic.
Examples: Good vs. Bad Entity Definitions
✅ Good: Clear, well-scoped, and descriptive
- name: ProductEntity
from_: Product
dimensions:
- name: Name
type: string
- name: ProductLine
type: string
description: "Product line; M for Mountain. Other values exist. Will be single capital letter."
- name: Profit
type: number
sql: "{Product}.ListPrice - {Product}.StandardCost"
- name: FinishedGoodsFlag
type: number
description: "Flag indicating if the product is a finished good and sellable; 1 if true, 0 if false."
This entity clearly defines a small, relevant set of fields with meaningful descriptions. Computed fields like Profit
are made explicit. Flags and encodings are explained with business logic
❌ Bad: Ambiguous, under-described, or overlapping
- name: ProductEntity
from_: Product
dimensions:
- name: Cost
type: number
- name: Price
type: number
- name: ProductFlag
type: number
- name: Class
type: string
- name: Line
type: string
What’s wrong here?
Ambiguous field names:
Cost
andPrice
don’t indicate whether they are standard cost, list price, historical, or discounted.ProductFlag
is an unclear boolean-like field — does it mean "is finished good", "is active", or something else?Line
could mean product line, order line, or even a text field. It's too vague.
Missing descriptions:
Without descriptions, the model has no way to differentiate similarly named fields across entities.
For example, several tables contain
Price
,Cost
, andClass
— but their meanings vary.
Measuring Performance
After editing & saving a catalog, you can validate its effectiveness directly:
Ask a natural language question in Metabase using the new catalog.
Inspect the generated SQL and verify the results.
If the result is incorrect or ambiguous, revisit your catalog definitions — especially naming, descriptions, and the set of exposed entities.