SQL Models
SQL models let you write queries against your BigQuery tables to create custom views, aggregations, and transformations. The output can be synced to any compatible destination.
Creating a SQL Model
- Navigate to Models and click New Model
- Select SQL Model from the template gallery
- Enter a name and optional description
- Choose a data type for the output (this determines which destinations can receive the data)
- Write your SQL query
- Click Validate to check the query against your BigQuery schema
- Save the model
Data Types
The data type tells Vendo how to treat the model’s output when syncing to destinations:
| Data Type | Use Case | Example |
|---|---|---|
| Events | Time-series event data | Aggregated daily ad spend by campaign |
| User Profiles | One row per user with attributes | Customer LTV scores, purchase segments |
| Group Profiles | One row per group/company | Account-level metrics for B2B |
| Lookup Tables | Reference/enrichment data | Product category mappings |
| Custom | Freeform output for export | Any table for BigQuery or S3 export |
Writing Queries
Your SQL runs against BigQuery. Reference tables in your Vendo dataset directly:
SELECT
customer_id,
SUM(total_price) AS lifetime_value,
COUNT(*) AS order_count,
MIN(created_at) AS first_order_date,
MAX(created_at) AS last_order_date
FROM orders
WHERE financial_status = 'paid'
GROUP BY customer_idValidation
When you click Validate, Vendo:
- Runs the query against your BigQuery schema (dry run — no data scanned)
- Returns the column names and types
- Flags any syntax or schema errors
Fix any errors before saving. Only validated models can be synced to destinations.
Incremental Syncing
For large datasets, configure incremental syncing to avoid re-processing all data on every run:
- Primary Key Columns — Columns that uniquely identify each row (e.g.,
customer_id) - Incremental Column — A timestamp or monotonically increasing column used to detect new/updated rows (e.g.,
updated_at)
When incremental syncing is enabled, Vendo only processes rows where the incremental column is greater than the last sync’s maximum value.
Scheduling
SQL models run when:
- Manually triggered from the model detail page
- On a schedule — set a frequency from every 15 minutes to weekly
- After a source syncs — trigger the model when a specific source app finishes importing
- After an upstream model — chain models together so downstream models run after upstream ones complete
Example: Daily Revenue by Channel
SELECT
DATE(o.created_at) AS date,
COALESCE(a.utm_source, 'direct') AS channel,
COUNT(DISTINCT o.id) AS orders,
SUM(o.total_price) AS revenue
FROM orders o
LEFT JOIN events a
ON o.customer_id = a.user_id
AND a.event = 'checkout_started'
WHERE o.financial_status = 'paid'
GROUP BY date, channel
ORDER BY date DESCRelated
- Models Overview
- Python Models — For advanced transformations with pandas and scikit-learn
- Audiences — For rule-based user segments
Last updated on