Privacy
Learn About Data Privacy and Security in VPC Deployments
VPC deployments of Predibase offer a fully separate dataplane where models are fine-tuned and deployments run.
Data Flow to VPC Deployments
There are three ways to prompt a VPC deployment in Predibase:
UI
Requests sent via the Predibase Prompt UI are sent to public serving.app.predibase.com
and then sent through
our SaaS controlplane and our internal network before it reaches your dataplane, and ultimately your LLM deployment.
SDK/API
SDK or API requests to serving.app.predibase.com
are routed through our SaaS controlplane and internal network before reaching your deployment.
Direct Ingress
Direct ingress is Predibase feature which uses services like AWS PrivateLink to create a direct connection between your VPC and your Predibase dataplane. By routing through the cloud provider’s network, requests directly reach your deployment without external hops
Direct Ingress Setup
To enable direct ingress in your VPC:
- Contact Predibase support to enable the feature
- Create deployments with
direct_ingress=True
- Use VPC endpoint instead of
serving.app.predibase.com
Your requests will then route directly from your VPC to the Predibase dataplane VPC through the cloud provider’s network, bypassing our shared SaaS controlplane and external networks.
AWS PrivateLink Network Architecture
Data Storage
Fine-tuning Datasets
With the exception of explicitly uploading a file to Predibase (under the ‘dataset connection page’), data will never be saved to disk on our servers.
For dataset connections (ex. Amazon S3):
- Credentials are stored in a secure HashiCorp Vault instance
- Credentials are read at runtime by fine-tuning workers in your VPC
- Workers use these credentials to access your datasets
For file uploads:
- Files are saved to a private cloud storage bucket within your Predibase dataplane cloud account that you manage
Request Data
By default, Predibase will never log your prompts to your LLM deployments or the responses that are generated.
You can enable prompt and response logging via a flag when creating a deployment. Some customers find this valuable for creating new datasets of prompt-response pairs for further fine-tuning models, but this feature is turned off by default for all LLM deployments.
Data Stored By Predibase
Controlplane
Predibase Metadata Database
- Model metadata
- Example: Base model name, context length
- Adapter metadata
- Example: Name, description, number of training steps, name of dataset
- Dataset metadata
- Example: Number of rows, name of columns in dataset
Predibase Observability Database
- Debug logs from deployments
- Debug logs do not include prompts and responses by default. If we require more detailed logs to help triage issues, we will always contact you before turning them on.
- Debug logs from Predibase system components
- Metrics from LLM deployments and Predibase system level components
- Ex: GPU utilization, CPU utilization, filesystem utilization, number of requests
Both the Predibase Metadata Database and Predibase Observability Database encrypt data at rest. Requests going between our application and these databases are encrypted via TLS. Both databases require secure authentication to connect to them.