pql.execute
pql.execute(query)
Execute given PQL query using the Predibase PQL engine. Note that in order to query
a specific dataset you must set the corresponding connection with pql.set_connection.
Parameters:
query: str
Query for execution.
Returns:
pd.DataFrame
Examples:
Select the first 10 rows from the Titanic dataset.
# First set connection
pql.set_connection('file_uploads')
# Then execute query
df = pql.execute("""
SELECT *
FROM titanic
LIMIT 10;
""")
# Check output of query
df.head(10)
Larger Query Results
Communication between the Predibase SDK and server is constrained by network bandwidth.
Queries that produce a large amount of data can result in a large payload over the wire.
To avoid network issues, these queries should be split up into multiple queries that produce a smaller payload.
Here's an example of how to achieve this with data connections that use a SQL dialect supporting OFFSET
:
- Store the following as string in a variable called
query
:
PREDICT Survived GIVEN
SELECT * FROM titanic
LIMIT {}
OFFSET {}
- Format the string, iteratively increasing the value of
OFFSET
:
# Limit the size of each query to 100K rows at a time
LIMIT = int(1e5)
dataset = pc.get_dataset("titanic", "file_uploads")
df = None
for offset in range(0, dataset.row_count, LIMIT):
current_query = query.format(LIMIT, offset)
df_results = pql.execute(current_query)
if df is None:
df = df_results
else:
df = pd.concat([df, df_results])