Skip to main content

pql.execute

pql.execute(query)

Execute given PQL query using the Predibase PQL engine. Note that in order to query
a specific dataset you must set the corresponding connection with pql.set_connection.

Parameters:

   query: str
Query for execution.

Returns:

   pd.DataFrame

Examples:

Select the first 10 rows from the Titanic dataset.

   # First set connection
pql.set_connection('file_uploads')

# Then execute query
df = pql.execute("""
SELECT *
FROM titanic
LIMIT 10;
""")

# Check output of query
df.head(10)

Larger Query Results

Communication between the Predibase SDK and server is constrained by network bandwidth. Queries that produce a large amount of data can result in a large payload over the wire. To avoid network issues, these queries should be split up into multiple queries that produce a smaller payload. Here's an example of how to achieve this with data connections that use a SQL dialect supporting OFFSET:

  1. Store the following as string in a variable called query:
PREDICT Survived GIVEN
SELECT * FROM titanic
LIMIT {}
OFFSET {}
  1. Format the string, iteratively increasing the value of OFFSET:
# Limit the size of each query to 100K rows at a time
LIMIT = int(1e5)

dataset = pc.get_dataset("titanic", "file_uploads")

df = None
for offset in range(0, dataset.row_count, LIMIT):
current_query = query.format(LIMIT, offset)
df_results = pql.execute(current_query)

if df is None:
df = df_results
else:
df = pd.concat([df, df_results])