Writing Effective Catalogs in MinusX

Why write catalogs?

Large Language Models (LLMs) struggle to reliably generate SQL when given only a natural language question and a raw database schema. They often pick incorrect tables or columns, leading to broken or misleading results.

To solve this, MinusX introduces catalogs — YAML-based data models that define the key entities, dimensions, and metrics a model is allowed to access. The model generates SQL from this simplified and semantically meaningful layer, not the full schema.

Well-written catalogs are the single most effective way to improve the reliability of answers. Our internal measurements show a significant improvement in both accuracy and consistency when questions are answered through well-scoped catalogs.

What makes a catalog effective?

An effective catalog is unambiguous. It reduces room for misinterpretation and ensures that a user's question can be deterministically mapped to SQL.

Here are guidelines to ensure clarity:

Use distinct and descriptive names
Avoid similar names across entities, dimensions, and metrics. If signup_date and created_date exist, make their purpose crystal clear by naming and description.
Add clear descriptions where overlap is possible
When names alone aren't enough to eliminate ambiguity, use the description field to explain the purpose, differences, and business meaning behind each field.
Remember what the model sees
The model doesn't see the underlying SQL of custom dimensions, only metrics. It only sees the name, type, and description — as if it were navigating a table with columns. Treat naming & descriptions as your primary UX.

By default, if you expose raw database tables, the model operates over too many entities and too much noise — leading to errors. A catalog filters and reshapes this into a reliable, opinionated model of your business logic.

Examples: Good vs. Bad Entity Definitions

✅ Good: Clear, well-scoped, and descriptive

- name: ProductEntity
  sql_table: public.Product
  dimensions:
    - name: Name
      type: string
    - name: ProductLine
      type: string
      description: "Product line; M for Mountain. Other values exist. Will be single capital letter."
    - name: Profit
      type: number
      sql: "{Product}.ListPrice - {Product}.StandardCost"
    - name: FinishedGoodsFlag
      type: number
      description: "Flag indicating if the product is a finished good and sellable; 1 if true, 0 if false."

This entity clearly defines a small, relevant set of fields with meaningful descriptions. Computed fields like Profit are made explicit. Flags and encodings are explained with business logic

❌ Bad: Ambiguous, under-described, or overlapping

- name: ProductEntity
  sql_table: public.Product
  dimensions:
    - name: Cost
      type: number
    - name: Price
      type: number
    - name: ProductFlag
      type: number
    - name: Class
      type: string
    - name: Line
      type: string

What’s wrong here?

Ambiguous field names:
- Cost and Price don’t indicate whether they are standard cost, list price, historical, or discounted.
- ProductFlag is an unclear boolean-like field — does it mean "is finished good", "is active", or something else?
- Line could mean product line, order line, or even a text field. It's too vague.
Missing descriptions:
- Without descriptions, the model has no way to differentiate similarly named fields across entities.
- For example, several tables contain Price, Cost, and Class — but their meanings vary.

Measuring Performance

After editing & saving a catalog, you can validate its effectiveness directly:

Ask a natural language question in Metabase using the new catalog.
Inspect the generated SQL and verify the results.
If the result is incorrect or ambiguous, revisit your catalog definitions — especially naming, descriptions, and the set of exposed entities.

Data Catalogs

Default Tables

MinusX Catalogs

Why Write Catalogs in MinusX