Skip to main content

MinusX Catalogs

Data Models + Enshrining business concepts

Vivek Aithal avatar
Written by Vivek Aithal
Updated this week

A MinusX Catalog is a way to have total control over the business concepts, custom metrics, and any custom instructions you want the MinusX agent to have in context. Analysts, with the help of MinusX AE agent, write the catalog and upstream it to all their Business users, who can ask questions against any Catalog.

Understanding the Catalog

A catalog is a YAML representation of the data model. Here's the full reference to all the keys in the YAML.

entities:
- name: <EntityName>
description: <EntityDescription>
sql_table: <Schema.TableName>
sql: <SQL CTE> // optional; one of sql_table or sql must be defined
dimensions:
- name: <DimensionName>
type: <DimensionType>
description: <DimensionDescription>
sql: <SQL>
unique: <true/false>
metrics:
- name: <MetricName>
description: <MetricDescription>
sql: <SQL>
examples:
- name: <ExampleName>
description: <ExampleDescription>
sql: <SQL. This SQL should refer the entity name and not the base table name>

Notes:

  1. Entities:

    1. sql_table can be used to define the entity source table. eg. sql_table: public.orders. Note that it MUST contain the schema name as well.

    2. sql can also be a select expression like sql: SELECT orderID, customerID from public.orders. This allows you to have complex SQL as a "view"

    3. Only one of sql and sql_table needs to be defined

  2. Dimensions:

    1. These are the only columns included. Whatever is not mentioned here are hidden from MinusX.

    2. If sql is not specified, it is assumed to be the base column. If the sql statement is provided, it is considered a derived column (and not materialized).

    3. If you want to expose the unique values in a column to MinusX, mark unique: true. MinusX automatically fetches the values and adds it into context.

  3. Metrics:

    1. These are custom metrics (aggregations) that are relevant to the business. The model has access to simple metrics like COUNT or SUM.

    2. By definition, they are not materialized.

  4. Examples [optional]: These are named example queries.

Add a new Catalog

Here's how you can add a new catalog:

Actual Context to MinusX

You can always see what MinusX views as context by clicking the Preview tab. MinusX sees the following structure for the selected tables:

Technical Details

As you see above, the LLM does not even see the entire catalog. We send the catalog as if it is a materialized table, and then the MinusX SQL compiler converts the generated SQL into the actual SQL that is executed.

Example Catalogs

Example #1: Simple catalog built on top of base tables

entities:
- name: orders
sql_table: prod.orders
dimensions:
- name: id
type: numeric
description: Unique ID for the order
- name: order_date
type: date
description: Date the order was created at
- name: ship_date
type: date
description: Date the order was shipped
- name: customer_id
type: numeric
description: Unique ID for the customer
- name: cost_price
type: numeric
description: Cost price of the order
- name: amount
type: numeric
description: Total amount of the order
metrics:
- name: num_orders
sql: COUNT(DISTINCT id)
description: Number of orders
- name: total_revenue
sql: SUM(amount)
description: Total revenue from orders
- name: total_profit
sql: SUM(amount - cost_price)
description: Profit from orders
- name: num_customers
sql: COUNT(DISTINCT customer_id)
description: Number of customers

- name: customers
sql_table: prod.customer
dimensions:
- name: id
type: numeric
description: Unique ID for the customer
- name: first_name
type: string
description: First name of the customer
- name: last_name
type: string
description: Last name of the customer
- name: email
type: string
description: Email address of the customer
- name: phone
type: string
description: Phone number of the customer

Example #2: Complex Catalog with a "wide-query" as the SQL for the entity

- name: customer_order_lifetime
sql: |
SELECT
c.id AS customer_id,
c.first_name,
c.last_name,
c.email,
c.phone,
o.id AS order_id,
o.order_date,
o.ship_date,
o.amount,
o.cost_price,
SUM(o.amount) OVER (PARTITION BY c.id) AS customer_lifetime_value,
COUNT(DISTINCT o.id) OVER (PARTITION BY c.id) AS total_orders_per_customer,
SUM(o.amount - o.cost_price) OVER (PARTITION BY c.id) AS total_profit_per_customer
FROM prod.customer c
LEFT JOIN prod.orders o ON c.id = o.customer_id

dimensions:
- name: customer_id
type: numeric
description: Unique ID for the customer
- name: first_name
type: string
description: First name of the customer
- name: last_name
type: string
description: Last name of the customer
- name: email
type: string
description: Email address of the customer
- name: phone
type: string
description: Phone number of the customer
- name: order_id
type: numeric
description: Unique ID for the order
- name: order_date
type: date
description: Date the order was created at
- name: ship_date
type: date
description: Date the order was shipped
- name: amount
type: numeric
description: Total amount of the order
- name: cost_price
type: numeric
description: Cost price of the order
- name: customer_lifetime_value
type: numeric
description: Total revenue from all orders made by the customer
- name: total_orders_per_customer
type: numeric
description: Total number of orders made by the customer
- name: total_profit_per_customer
type: numeric
description: Total profit generated by the customer

metrics:
- name: num_orders
sql: COUNT(DISTINCT order_id)
description: Number of orders
- name: total_revenue
sql: SUM(amount)
description: Total revenue from orders
- name: total_profit
sql: SUM(amount - cost_price)
description: Profit from orders
- name: num_customers
sql: COUNT(DISTINCT customer_id)
description: Number of customers
- name: avg_customer_lifetime_value
sql: AVG(customer_lifetime_value)
description: Average revenue per customer
Did this answer your question?