Skip to main content


Predibase provides the interfaces and infrastructure to fine-tune and serve open-source Large Language Models (LLMs). In this section, we will cover how to easily get started with inference.


  • Model: A pretrained base model that you can deploy and query (e.g. llama-2-7b, mistral-7b)
  • Adapter: a set of (LoRA) weights produced from the fine-tuning process to specialize a base model

Inference Options

There are two main ways to run inference on Predibase:

  • Serverless Endpoints: Predibase hosts the most popular base models that can be queried or fine-tuned (via Adapters). These endpoints are intended for experimentation and fast iteration.
  • Dedicated Deployments: Predibase can host nearly any open-source LLM on your behalf using dedicated hardware ranging from a single T4 to a cluster of H100s.

See our pricing page for more details here.

OpenAI-compatible API

For users migrating from OpenAI, Predibase supports OpenAI compatible endpoints that serve as a drop-in replacement for the OpenAI SDK. Learn more here.

LoRAX (LoRA eXchange): Serving fine-tuned models at scale

LoRAX is an open-source framework released by the team at Predibase that allows users to serve up to hundreds of fine-tuned models (i.e. adapters) on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency. You can choose to use LoRAX with our serverless endpoints or dedicated deployments.

Deployment Statuses

Dedicated Deployments that are created in Predibase can be in any of the following states:

  • Pending — Deployment record has been created, but deployment has not been fully created yet
  • Initializing — The first replica is in the process of being spun up
  • Ready — At least 1 replica is up and live
  • Standby — 0 replicas are up but deployment is ready to scale up on request
  • Stopped — 0 replicas are up and deployment will not scale up until moved to Standby
  • Errored — 0 replicas are up and last Initializing state led to an error
  • Updating — at least 1 replica is up and either:
    • needs to be re-initialized following a config change OR
    • the LLM is in the process of being re-initialized
  • Deleted — The deployment has been deleted