Skip to main content

Full Query Example

Setup

Load the Predibase notebook extension:

from predibase import pql
%load_ext predibase_notebook

PQL

Example execution of PQL query using %%pql magic:

%%pql
PREDICT Survived GIVEN select * from titanic;

Connect Data

All data sources available in Predibase except local files (GCS, Azure, Redshift, MySQL, etc.) are available to connect via PQL query. Templates for connecting from specific data sources is available in the Data Connectors section.

Example connecting to an S3 data source:

# First create connection
pql.execute("""
CREATE CONNECTION personal_s3_connection
CONNECTION_TYPE = S3
AWS_ACCESS_KEY_ID = '***'
AWS_SECRET_ACCESS_KEY = '***'
CONNECTION_URI = 's3://<path to s3 bucket to connect>/'
""")

# Next create dataset
pql.execute("""
CREATE DATASET titanic
FROM `personal_s3_connection` AT 's3://<path to s3 bucket to connect>/titanic.csv'
""")

To get the connection you want to work with:

# List out available data connections
connection_df = pql.get_connections()

# Filter out desired connections
connection_df[connection_df.name == 'titanic']

Lastly, set the connection:

# Set the connection
pql.set_connection('titanic')

Run Queries

Query outputs will return a pandas dataframe. You can view these as is, or assign to a variable to run subsequent pandas operations.

Example query:

df = pql.execute("""
SELECT *
FROM titanic
LIMIT 10;
""")

df.head(10)

Templated Queries

You can use jinja2 templating to make your queries dynamic.

For example, you can define a variable in Python like x = 42 and embed it into your query:

%%pql
PREDICT Survived GIVEN Age={{ x }}

This renders to PREDICT Survived GIVEN Age=42.

Templating can also be used for passing in Pandas DataFrames:

%%pql
PREDICT Survived GIVEN {{ df }}

Model Creation

You can train a model via query:

%%pql

CREATE MODEL titanic_model
FROM titanic
CONFIG {
"input_features": [
{"name": "Pclass", "type": "category"},
{"name": "Sex", "type": "category"},
{"name": "Age", "type": "number"},
{"name": "SibSp", "type": "number"},
{"name": "Parch", "type": "number"},
{"name": "Fare", "type": "number"},
{"name": "Cabin", "type": "category"},
{"name": "Embarked", "type": "category"},
],
"output_features": [
{"name": "Survived", "type": "Binary"}
]
};

Model Predictions

You can run predictions on datasets in Predibase and pandas dataframes loaded in a notebook.

Example predicting on a dataset in Predibase (Using version 1 of titanic model - denoted titanic_model.1):

%%pql

PREDICT titanic.Survived
USING titanic_model VERSION 1
GIVEN
SELECT *
FROM titanic
LIMIT 10;

Example predicting on a pandas DF:

test_df = pd.read_csv('titanic_test.csv')

preds = pql.execute("""
PREDICT titanic.Survived
USING titanic_model
GIVEN {{ df }}
""", params={'df':test_df})

test_df.head()