Skip to main content

Writing Effective Catalogs in MinusX

A guide to writing catalogs & measuring performance

S
Written by Sreejith Puthanpurayil
Updated over 2 weeks ago

Prerequisite: Advanced Catalogs

Why write catalogs?

Large Language Models (LLMs) struggle to reliably generate SQL when given only a natural language question and a raw database schema. They often pick incorrect tables or columns, leading to broken or misleading results.

To solve this, MinusX introduces catalogs — YAML-based data models that define the key entities, dimensions, and metrics a model is allowed to access. The model generates SQL from this simplified and semantically meaningful layer, not the full schema.

Well-written catalogs are the single most effective way to improve the reliability of answers. Our internal measurements show a significant improvement in both accuracy and consistency when questions are answered through well-scoped catalogs.


What makes a catalog effective?

An effective catalog is unambiguous. It reduces room for misinterpretation and ensures that a user's question can be deterministically mapped to SQL.

Here are guidelines to ensure clarity:

  1. Use distinct and descriptive names
    Avoid similar names across entities, dimensions, and metrics. If signup_date and created_date exist, make their purpose crystal clear by naming and description.

  2. Add clear descriptions where overlap is possible
    When names alone aren't enough to eliminate ambiguity, use the description field to explain the purpose, differences, and business meaning behind each field.

  3. Remember what the model sees
    The model doesn't see the underlying SQL of custom dimensions, only metrics. It only sees the name, type, and description — as if it were navigating a table with columns. Treat naming & descriptions as your primary UX.


By default, if you expose raw database tables, the model operates over too many entities and too much noise — leading to errors. A catalog filters and reshapes this into a reliable, opinionated model of your business logic.

Examples: Good vs. Bad Entity Definitions

✅ Good: Clear, well-scoped, and descriptive

- name: ProductEntity
from_: Product
dimensions:
- name: Name
type: string
- name: ProductLine
type: string
description: "Product line; M for Mountain. Other values exist. Will be single capital letter."
- name: Profit
type: number
sql: "{Product}.ListPrice - {Product}.StandardCost"
- name: FinishedGoodsFlag
type: number
description: "Flag indicating if the product is a finished good and sellable; 1 if true, 0 if false."

This entity clearly defines a small, relevant set of fields with meaningful descriptions. Computed fields like Profit are made explicit. Flags and encodings are explained with business logic

❌ Bad: Ambiguous, under-described, or overlapping

- name: ProductEntity
from_: Product
dimensions:
- name: Cost
type: number
- name: Price
type: number
- name: ProductFlag
type: number
- name: Class
type: string
- name: Line
type: string

What’s wrong here?

  • Ambiguous field names:

    • Cost and Price don’t indicate whether they are standard cost, list price, historical, or discounted.

    • ProductFlag is an unclear boolean-like field — does it mean "is finished good", "is active", or something else?

    • Line could mean product line, order line, or even a text field. It's too vague.

  • Missing descriptions:

    • Without descriptions, the model has no way to differentiate similarly named fields across entities.

    • For example, several tables contain Price, Cost, and Class — but their meanings vary.

Measuring Performance

After editing & saving a catalog, you can validate its effectiveness directly:

  • Ask a natural language question in Metabase using the new catalog.

  • Inspect the generated SQL and verify the results.

  • If the result is incorrect or ambiguous, revisit your catalog definitions — especially naming, descriptions, and the set of exposed entities.

Did this answer your question?