Skip to Content

Python Models

Python models run custom Python code against your BigQuery data for advanced transformations — customer segmentation, predictive scoring, recommendation engines, and more. Start from a pre-built template or write your own.

Creating a Python Model

  1. Navigate to Models and click New Model
  2. Browse the template gallery and select a Python template, or start from scratch
  3. Configure input tables — map BigQuery tables to your model’s inputs
  4. Configure output — set the output table name and write mode
  5. Customize the Python code and parameters
  6. Save the model

Template Categories

CategoryTemplatesUse Case
RecommendationsProduct affinity, cross-sell modelsSuggest products based on purchase history
SegmentationRFM analysis, behavioral clusteringGroup customers by behavior or value
PredictionChurn prediction, LTV forecastingPredict future customer behavior

Input Tables

Map one or more BigQuery tables as inputs to your model. Each input table is available as a pandas DataFrame in your code:

SettingDescription
DatasetYour BigQuery dataset (e.g., vendo_myshop)
TableThe specific table to read
Variable NameHow the table is referenced in code (e.g., orders_df)

Output Configuration

SettingDescription
Output TableName of the BigQuery table to write results to
Write Modereplace (overwrite table each run) or append (add new rows)

Writing Python Code

Your code runs in a secure environment with these libraries:

LibraryVersionPurpose
pandas2.xData manipulation
numpy1.xNumerical computing
scikit-learn1.xMachine learning
google-cloud-bigquery3.xBigQuery client (advanced queries)

Code Structure

Your code receives the mapped input tables as DataFrames and must return a DataFrame:

import pandas as pd def transform(orders_df: pd.DataFrame) -> pd.DataFrame: # RFM segmentation example rfm = orders_df.groupby('customer_id').agg({ 'created_at': 'max', # Recency 'id': 'count', # Frequency 'total_price': 'sum' # Monetary }).reset_index() rfm.columns = ['customer_id', 'last_order', 'frequency', 'monetary'] return rfm

Note: External network access is disabled for security. All data must come from the configured input tables.

Scheduling and Triggers

Python models support flexible execution:

TriggerDescription
ManualRun on demand from the model detail page
ScheduleSet a frequency (15 minutes to weekly)
Source triggerRun automatically when a source app finishes syncing
Model triggerRun after an upstream model completes

Chaining Models

Set a source model trigger to create a pipeline of models. For example:

  1. Source sync imports Shopify orders
  2. SQL model computes daily revenue (triggered by source sync)
  3. Python model runs RFM segmentation (triggered by the SQL model)

Vendo detects circular dependencies and prevents chains longer than 10 models.

Best Practices

  • Keep models focused — one transformation per model
  • Use descriptive output table names — e.g., customer_segments, product_recommendations
  • Handle edge cases — check for empty DataFrames and null values
  • Start from templates — customize a pre-built template rather than writing from scratch
  • Use replace mode for idempotent outputs, append for time-series accumulation
Last updated on