CREATE CONNECTION
CREATE CONNECTION
creates a new data source from an external connection.
Syntax
Following is the grammar for creating a connection, with connection_parameters
dependent on the connection type (see below). The connection uri can be optionally provided as the base path for blob stores.
CREATE [ OR REPLACE ] CONNECTION [ IF NOT EXISTS ] connection_name
CONNECTION_TYPE = { adls | gcs | s3 | mysql | postgresql | redshift | snowflake | bigquery }
connection_parameters
The Predibase parser adheres to ANSI/ISO SQL standards and a superset of grammar supported by Apache Calcite parser.
When a parameter is a choice of one or more values they are enclosed in curly brackets { }
. Optional parameters are enclosed in square brackets [ ]
.
Strings are enclosed in single quotes eg: 'string'
, and identifiers are escaped with double quotes eg: "identifier"
.
Connection name
A unique name for the data source in Predibase. This name can be used to resolve datasets and features in the case of ambiguity. For example as in the FROM
clause:
SELECT feature_name FROM connection_name.dataset_name
Connection type
The type of external data source to connect to. One of:
adls
: Azure Data Lake Storage.gcs
: Google Cloud Storage.s3
: Amazon S3.mysql
: MySQL database.postgresql
: PostgreSQL database.redshift
: Amazon Redshift data warehouse.snowflake
: Snowflake data warehouse.bigquery
: Google BigQuery data warehouse.databricks
: Databricks SQL.
Note that local file uploads cannot be connected in this way. Please use the Predibase UI for this.
Connection parameters
A series of connection parameters are required depending on the connection type.
ADLS
ADLS requires an access key or connection string and optional protocol eg:
CONNECTION_TYPE = adls
[ AZURE_ACCESS_KEY = 'account_key' ]
[ AZURE_CONNECTION_STRING = 'connection_string' ]
[ AZURE_PROTOCOL = { 'abfs' } ]
GSC
GCS requires a GCP JSON token string:
CONNECTION_TYPE = gcs
GCP_TOKEN = 'json_string'
S3
S3 requires Access Key ID and Secret Key and optionally a Role ARN:
CONNECTION_TYPE = s3
AWS_ACCESS_KEY_ID = 'access_key_id'
AWS_SECRET_ACCESS_KEY = 'secret_access_key'
[ AWS_ROLE_ARN = 'arn' ]
Snowflake
In addition to the above, snowflake requires a warehouse, database and schema name as well as username and password.
CONNECTION_TYPE = snowflake
WAREHOUSE_NAME = 'warehouse_name'
DATABASE_NAME = 'database_name'
SCHEMA_NAME = 'schema_name'
USERNAME = 'user'
PASSWORD = 'pwd'
Bigquery
Bigquery requires a GCP token as well as a project and dataset name.
CONNECTION_TYPE = bigquery
GCP_TOKEN = 'json_string'
PROJECT_NAME = 'project_name'
DATASET_NAME = 'dataset_name'
Databricks
Databricks SQL requires a personal access token, server host name, http path and optional schema name.
CONNECTION_TYPE = DATABRICKS
ACCESS_TOKEN = 'access_token'
SERVER_HOST_NAME = 'server_host_name'
HTTP_PATH = 'http_path'
[ SCHEMA_NAME = 'schema_name' ]
MySql, PostgresSql, Redshift
All these databases require a database and schema name as well as username and password.
CONNECTION_TYPE = { mysql | postgresql | redshift }
DATABASE_ADDRESS = 'host:port'
DATABASE_NAME = 'database_name'
SCHEMA_NAME = 'schema_name'
USERNAME = 'user'
PASSWORD = 'pwd'
Examples
Here is an example of how to create a connection for a mysql database:
CREATE CONNECTION mysql_connection
CONNECTION_TYPE = mysql
DATABASE_ADDRESS = '127.0.0.1:3306'
DATABASE_NAME = 'my_database'
SCHEMA_NAME = 'public'
USERNAME = 'my_username'
PASSWORD = 'my_password'
Here is an example of how to create a s3 connection:
CREATE CONNECTION s3_connection
CONNECTION_TYPE = S3
AWS_ACCESS_KEY_ID = 'xxx'
AWS_SECRET_ACCESS_KEY = 'xxx'