Python Models

Python models run custom Python code against your BigQuery data for advanced transformations — customer segmentation, predictive scoring, recommendation engines, and more. Start from a pre-built template or write your own.

Creating a Python Model

Navigate to Models and click New Model
Browse the template gallery and select a Python template, or start from scratch
Configure input tables — map BigQuery tables to your model’s inputs
Configure output — set the output table name and write mode
Customize the Python code and parameters
Save the model

Template Categories

Category	Templates	Use Case
Recommendations	Product affinity, cross-sell models	Suggest products based on purchase history
Segmentation	RFM analysis, behavioral clustering	Group customers by behavior or value
Prediction	Churn prediction, LTV forecasting	Predict future customer behavior

Input Tables

Map one or more BigQuery tables as inputs to your model. Each input table is available as a pandas DataFrame in your code:

Setting	Description
Dataset	Your BigQuery dataset (e.g., `vendo_myshop`)
Table	The specific table to read
Variable Name	How the table is referenced in code (e.g., `orders_df`)

Output Configuration

Setting	Description
Output Table	Name of the BigQuery table to write results to
Write Mode	`replace` (overwrite table each run) or `append` (add new rows)

Writing Python Code

Your code runs in a secure environment with these libraries:

Library	Version	Purpose
pandas	2.x	Data manipulation
numpy	1.x	Numerical computing
scikit-learn	1.x	Machine learning
google-cloud-bigquery	3.x	BigQuery client (advanced queries)

Code Structure

Your code receives the mapped input tables as DataFrames and must return a DataFrame:


import pandas as pd
 
def transform(orders_df: pd.DataFrame) -> pd.DataFrame:
    # RFM segmentation example
    rfm = orders_df.groupby('customer_id').agg({
        'created_at': 'max',        # Recency
        'id': 'count',              # Frequency
        'total_price': 'sum'        # Monetary
    }).reset_index()
 
    rfm.columns = ['customer_id', 'last_order', 'frequency', 'monetary']
    return rfm

Note: External network access is disabled for security. All data must come from the configured input tables.

Scheduling and Triggers

Python models support flexible execution:

Trigger	Description
Manual	Run on demand from the model detail page
Schedule	Set a frequency (15 minutes to weekly)
Source trigger	Run automatically when a source app finishes syncing
Model trigger	Run after an upstream model completes

Chaining Models

Set a source model trigger to create a pipeline of models. For example:

Source sync imports Shopify orders
SQL model computes daily revenue (triggered by source sync)
Python model runs RFM segmentation (triggered by the SQL model)

Vendo detects circular dependencies and prevents chains longer than 10 models.

Best Practices

Keep models focused — one transformation per model
Use descriptive output table names — e.g., customer_segments, product_recommendations
Handle edge cases — check for empty DataFrames and null values
Start from templates — customize a pre-built template rather than writing from scratch
Use replace mode for idempotent outputs, append for time-series accumulation

Models Overview
SQL Models — For query-based transformations
Audiences — For rule-based user segments
Data Science Agent → Building Models — Create Python models with AI assistance