A MinusX Catalog is a way to have total control over the business concepts, custom metrics, and any custom instructions you want the MinusX agent to have in context. Analysts, with the help of MinusX AE agent, write the catalog and upstream it to all their Business users, who can ask questions against any Catalog.
Understanding the Catalog
A catalog is a YAML
representation of the data model. Here's the full reference to all the keys in the YAML
.
entities:
- name: <EntityName>
description: <EntityDescription>
sql_table: <Schema.TableName>
sql: <SQL CTE> // optional; one of sql_table or sql must be defined
dimensions:
- name: <DimensionName>
type: <DimensionType>
description: <DimensionDescription>
sql: <SQL>
unique: <true/false>
metrics:
- name: <MetricName>
description: <MetricDescription>
sql: <SQL>
examples:
- name: <ExampleName>
description: <ExampleDescription>
sql: <SQL. This SQL should refer the entity name and not the base table name>
Notes:
Entities:
sql_table
can be used to define the entity source table. eg.sql_table: public.orders
. Note that it MUST contain the schema name as well.sql
can also be a select expression likesql: SELECT orderID, customerID from public.orders
. This allows you to have complex SQL as a "view"Only one of
sql
andsql_table
needs to be defined
Dimensions:
These are the only columns included. Whatever is not mentioned here are hidden from MinusX.
If
sql
is not specified, it is assumed to be the base column. If thesql
statement is provided, it is considered a derived column (and not materialized).If you want to expose the unique values in a column to MinusX, mark
unique: true
. MinusX automatically fetches the values and adds it into context.
Metrics:
These are custom metrics (aggregations) that are relevant to the business. The model has access to simple metrics like
COUNT
orSUM
.By definition, they are not materialized.
Examples [optional]: These are named example queries.
Add a new Catalog
Here's how you can add a new catalog:
Actual Context to MinusX
You can always see what MinusX views as context by clicking the Preview
tab. MinusX sees the following structure for the selected tables:
Technical Details
As you see above, the LLM does not even see the entire catalog. We send the catalog as if it is a materialized table, and then the MinusX SQL compiler converts the generated SQL into the actual SQL that is executed.
Example Catalogs
Example #1: Simple catalog built on top of base tables
entities:
- name: orders
sql_table: prod.orders
dimensions:
- name: id
type: numeric
description: Unique ID for the order
- name: order_date
type: date
description: Date the order was created at
- name: ship_date
type: date
description: Date the order was shipped
- name: customer_id
type: numeric
description: Unique ID for the customer
- name: cost_price
type: numeric
description: Cost price of the order
- name: amount
type: numeric
description: Total amount of the order
metrics:
- name: num_orders
sql: COUNT(DISTINCT id)
description: Number of orders
- name: total_revenue
sql: SUM(amount)
description: Total revenue from orders
- name: total_profit
sql: SUM(amount - cost_price)
description: Profit from orders
- name: num_customers
sql: COUNT(DISTINCT customer_id)
description: Number of customers
- name: customers
sql_table: prod.customer
dimensions:
- name: id
type: numeric
description: Unique ID for the customer
- name: first_name
type: string
description: First name of the customer
- name: last_name
type: string
description: Last name of the customer
- name: email
type: string
description: Email address of the customer
- name: phone
type: string
description: Phone number of the customer
Example #2: Complex Catalog with a "wide-query" as the SQL for the entity
- name: customer_order_lifetime
sql: |
SELECT
c.id AS customer_id,
c.first_name,
c.last_name,
c.email,
c.phone,
o.id AS order_id,
o.order_date,
o.ship_date,
o.amount,
o.cost_price,
SUM(o.amount) OVER (PARTITION BY c.id) AS customer_lifetime_value,
COUNT(DISTINCT o.id) OVER (PARTITION BY c.id) AS total_orders_per_customer,
SUM(o.amount - o.cost_price) OVER (PARTITION BY c.id) AS total_profit_per_customer
FROM prod.customer c
LEFT JOIN prod.orders o ON c.id = o.customer_id
dimensions:
- name: customer_id
type: numeric
description: Unique ID for the customer
- name: first_name
type: string
description: First name of the customer
- name: last_name
type: string
description: Last name of the customer
- name: email
type: string
description: Email address of the customer
- name: phone
type: string
description: Phone number of the customer
- name: order_id
type: numeric
description: Unique ID for the order
- name: order_date
type: date
description: Date the order was created at
- name: ship_date
type: date
description: Date the order was shipped
- name: amount
type: numeric
description: Total amount of the order
- name: cost_price
type: numeric
description: Cost price of the order
- name: customer_lifetime_value
type: numeric
description: Total revenue from all orders made by the customer
- name: total_orders_per_customer
type: numeric
description: Total number of orders made by the customer
- name: total_profit_per_customer
type: numeric
description: Total profit generated by the customer
metrics:
- name: num_orders
sql: COUNT(DISTINCT order_id)
description: Number of orders
- name: total_revenue
sql: SUM(amount)
description: Total revenue from orders
- name: total_profit
sql: SUM(amount - cost_price)
description: Profit from orders
- name: num_customers
sql: COUNT(DISTINCT customer_id)
description: Number of customers
- name: avg_customer_lifetime_value
sql: AVG(customer_lifetime_value)
description: Average revenue per customer