Python Models
Python models run custom Python code against your BigQuery data for advanced transformations — customer segmentation, predictive scoring, recommendation engines, and more. Start from a pre-built template or write your own.
Creating a Python Model
- Navigate to Models and click New Model
- Browse the template gallery and select a Python template, or start from scratch
- Configure input tables — map BigQuery tables to your model’s inputs
- Configure output — set the output table name and write mode
- Customize the Python code and parameters
- Save the model
Template Categories
| Category | Templates | Use Case |
|---|---|---|
| Recommendations | Product affinity, cross-sell models | Suggest products based on purchase history |
| Segmentation | RFM analysis, behavioral clustering | Group customers by behavior or value |
| Prediction | Churn prediction, LTV forecasting | Predict future customer behavior |
Input Tables
Map one or more BigQuery tables as inputs to your model. Each input table is available as a pandas DataFrame in your code:
| Setting | Description |
|---|---|
| Dataset | Your BigQuery dataset (e.g., vendo_myshop) |
| Table | The specific table to read |
| Variable Name | How the table is referenced in code (e.g., orders_df) |
Output Configuration
| Setting | Description |
|---|---|
| Output Table | Name of the BigQuery table to write results to |
| Write Mode | replace (overwrite table each run) or append (add new rows) |
Writing Python Code
Your code runs in a secure environment with these libraries:
| Library | Version | Purpose |
|---|---|---|
| pandas | 2.x | Data manipulation |
| numpy | 1.x | Numerical computing |
| scikit-learn | 1.x | Machine learning |
| google-cloud-bigquery | 3.x | BigQuery client (advanced queries) |
Code Structure
Your code receives the mapped input tables as DataFrames and must return a DataFrame:
import pandas as pd
def transform(orders_df: pd.DataFrame) -> pd.DataFrame:
# RFM segmentation example
rfm = orders_df.groupby('customer_id').agg({
'created_at': 'max', # Recency
'id': 'count', # Frequency
'total_price': 'sum' # Monetary
}).reset_index()
rfm.columns = ['customer_id', 'last_order', 'frequency', 'monetary']
return rfmNote: External network access is disabled for security. All data must come from the configured input tables.
Scheduling and Triggers
Python models support flexible execution:
| Trigger | Description |
|---|---|
| Manual | Run on demand from the model detail page |
| Schedule | Set a frequency (15 minutes to weekly) |
| Source trigger | Run automatically when a source app finishes syncing |
| Model trigger | Run after an upstream model completes |
Chaining Models
Set a source model trigger to create a pipeline of models. For example:
- Source sync imports Shopify orders
- SQL model computes daily revenue (triggered by source sync)
- Python model runs RFM segmentation (triggered by the SQL model)
Vendo detects circular dependencies and prevents chains longer than 10 models.
Best Practices
- Keep models focused — one transformation per model
- Use descriptive output table names — e.g.,
customer_segments,product_recommendations - Handle edge cases — check for empty DataFrames and null values
- Start from templates — customize a pre-built template rather than writing from scratch
- Use
replacemode for idempotent outputs,appendfor time-series accumulation
Related
- Models Overview
- SQL Models — For query-based transformations
- Audiences — For rule-based user segments
- Data Science Agent → Building Models — Create Python models with AI assistance