Full Query Example
Setup
Load the Predibase notebook extension:
from predibase import pql
%load_ext predibase_notebook
PQL
Example execution of PQL query using %%pql
magic:
%%pql
PREDICT Survived GIVEN select * from titanic;
Connect Data
All data sources available in Predibase except local files (GCS, Azure, Redshift, MySQL, etc.) are available to connect via PQL query. Templates for connecting from specific data sources is available in the Data Connectors section.
Example connecting to an S3
data source:
# First create connection
pql.execute("""
CREATE CONNECTION personal_s3_connection
CONNECTION_TYPE = S3
AWS_ACCESS_KEY_ID = '***'
AWS_SECRET_ACCESS_KEY = '***'
CONNECTION_URI = 's3://<path to s3 bucket to connect>/'
""")
# Next create dataset
pql.execute("""
CREATE DATASET titanic
FROM `personal_s3_connection` AT 's3://<path to s3 bucket to connect>/titanic.csv'
""")
To get the connection you want to work with:
# List out available data connections
connection_df = pql.get_connections()
# Filter out desired connections
connection_df[connection_df.name == 'titanic']
Lastly, set the connection:
# Set the connection
pql.set_connection('titanic')
Run Queries
Query outputs will return a pandas dataframe. You can view these as is, or assign to a variable to run subsequent pandas operations.
Example query:
df = pql.execute("""
SELECT *
FROM titanic
LIMIT 10;
""")
df.head(10)
Templated Queries
You can use jinja2 templating to make your queries dynamic.
For example, you can define a variable in Python like x = 42
and embed it into your query:
%%pql
PREDICT Survived GIVEN Age={{ x }}
This renders to PREDICT Survived GIVEN Age=42
.
Templating can also be used for passing in Pandas DataFrames:
%%pql
PREDICT Survived GIVEN {{ df }}
Model Creation
You can train a model via query:
%%pql
CREATE MODEL titanic_model
FROM titanic
CONFIG {
"input_features": [
{"name": "Pclass", "type": "category"},
{"name": "Sex", "type": "category"},
{"name": "Age", "type": "number"},
{"name": "SibSp", "type": "number"},
{"name": "Parch", "type": "number"},
{"name": "Fare", "type": "number"},
{"name": "Cabin", "type": "category"},
{"name": "Embarked", "type": "category"},
],
"output_features": [
{"name": "Survived", "type": "Binary"}
]
};
Model Predictions
You can run predictions on datasets in Predibase and pandas dataframes loaded in a notebook.
Example predicting on a dataset in Predibase (Using version 1 of titanic model - denoted titanic_model.1):
%%pql
PREDICT titanic.Survived
USING titanic_model VERSION 1
GIVEN
SELECT *
FROM titanic
LIMIT 10;
Example predicting on a pandas DF:
test_df = pd.read_csv('titanic_test.csv')
preds = pql.execute("""
PREDICT titanic.Survived
USING titanic_model
GIVEN {{ df }}
""", params={'df':test_df})
test_df.head()