Skip to main content


Predibase provides the interfaces and infrastructure to fine-tune and serve open-source Large Language Models (LLMs). In this section, we will cover how to easily get started with inference.


  • Model: A pretrained base model that you can deploy and query (e.g. llama-2-7b, mistral-7b)
  • Adapter: a set of (LoRA) weights produced from the fine-tuning process to specialize a base model

Inference Options

There are two main ways to run inference on Predibase:

  • Serverless Endpoints: Predibase hosts the most popular base models that can be queried or fine-tuned (via Adapters) at cost-effective prices.
  • Dedicated Deployments: Predibase can host nearly any open-source LLM on your behalf using dedicated hardware ranging from a single T4 to a cluster of A100’s.

See pricing for both options here.

OpenAI-compatible API

For users migrating from OpenAI, Predibase supports OpenAI compatible endpoints that serve as a drop-in replacement for the OpenAI SDK. Learn more here.

LoRAX (LoRA eXchange): Serving fine-tuned models at scale

LoRAX is an open-source framework released by the team at Predibase that allows users to serve up to hundreds of fine-tuned models (i.e. adapters) on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency. You can choose to use LoRAX with our serverless endpoints or dedicated deployments.