Direct ingress is a latency optimization feature for VPC customers that enables direct connection to your LLM deployment
during inference. By bypassing the Predibase control plane, direct ingress reduces network hops and decreases time to
first token (TTFT) latency by tens to hundreds of milliseconds. This bypass also enhances security by ensuring that
prompt requests and responses are only processed by customer-controlled infrastructure.Predibase direct ingress uses AWS PrivateLink (Azure and GCP also supported) to establish secure private connections
between your customer-controlled VPC (hosting application code) and the Predibase-configured VPC (hosting LLMs).
PrivateLink connections remain within AWS-managed networks, ensuring security and data privacy. When both VPCs are
located in the same AWS region, connections are established between regional hosts, minimizing network latency.Direct ingress is available exclusively to VPC customers and requires onboarding assistance from Predibase support. To
enable direct ingress for your tenant, please contact our support team.See private deployments for more details on how to prompt using
direct ingress.
Users can deploy Predibase into Amazon Web Services (AWS) or Microsoft Azure. While both cloud providers have a number
of regions, the curated list below are regions that have our recommended GPUs available.