Quickstart

This guide will show you how to quickly get started with using Predibase to deploy and prompt LLMs. We’ll walk through setting up your environment, running inference, and using streaming responses.

Prerequisites

Create an account here
Navigate to the Settings page and click Generate API Token
Setup your environment and install the Python SDK:

pip install -U predibase

Create and prompt a private deployment

Let’s start by deploying a model and running inference.

from predibase import Predibase, DeploymentConfig

# Initialize the client with your API token
pb = Predibase(api_token="<PREDIBASE API TOKEN>")

# Create a deployment
deployment = pb.deployments.create(
    name="my-qwen3-8b",
    config=DeploymentConfig(base_model="qwen3-8b")
)

# Generate text
response = pb.deployments.client('my-qwen3-8b').generate(
    "What is a Large Language Model?",
    max_new_tokens=50
)
print(response.generated_text)
# "(LLM) Explained\n\nA large language model (LLM) is a type of artificial intelligence..."

Prompt a shared endpoint

For quick experimentation, you can use our shared endpoints, available for SaaS users only.

from predibase import Predibase

pb = Predibase(api_token="<PREDIBASE API TOKEN>")

# Get a list of available models
available_models = pb.deployments.list()

# Connect to a shared deployment
client = pb.deployments.client("qwen3-8b", max_new_tokens=32)

# Generate text
response = client.generate(
    "What are some popular tourist spots in San Francisco?"
)
print(response.generated_text)

Note the explicit use of special tokens (like [INST]) before and after the prompt. These are used with instruction- and chat-tuned models to improve response quality. See Chat Templates for details.

Stream responses

For longer responses, you might want to stream the tokens as they’re generated:

from predibase import Predibase

pb = Predibase(api_token="<PREDIBASE API TOKEN>")
client = pb.deployments.client("qwen3-8b")

# Stream tokens as they're generated
for response in client.generate_stream(
    "What are some popular tourist spots in San Francisco?", max_new_tokens=256
):
    if not response.token.special:
        print(response.token.text, sep="", end="", flush=True)

All examples above use the Python SDK for simplicity. A REST API is also available if you prefer making direct HTTP calls. See our Chat Completions API for details.

Next steps

Check out our officially supported LLMs
Try the fine-tuning guide to customize a model for your use case
Connect a dataset via the UI to start fine-tuning without code
Coming from OpenAI? Check out how to use OpenAI-compatible endpoints hosted on Predibase

Need help?

Email us at support@predibase.com

Getting Started

Inference

Fine-Tuning

Account

Integrations

Examples

Resources

Prerequisites

Create and prompt a private deployment

Prompt a shared endpoint

Stream responses

Next steps

Need help?

Getting Started

Inference

Fine-Tuning

Account

Integrations

Examples

Resources

​Prerequisites

​Create and prompt a private deployment

​Prompt a shared endpoint

​Stream responses

​Next steps

​Need help?

Prerequisites

Create and prompt a private deployment

Prompt a shared endpoint

Stream responses

Next steps

Need help?