force_bare_client: bool, optional, default False - When False, the SDK runs a
sub-process which queries the Predibase API and prints out a helpful message
if the deployment is still scaling up. This is useful for experimentation and
notebooks. Use True for production to avoid these additional checks.
serving_url_override: str, optional, default None - Override the default URL used
to prompt deployments. Only used for direct-ingress VPC deployments. The available VPC
endpoints for a direct-ingress deployment can be found in the Configuration tab for a deployment in the Predibase UI.
Returns
LoRAX Client - Client object for running inference
adapter_id: str, optional – Adapter ID to apply to the base model (e.g. "adapter-name/1"); can include a checkpoint (e.g. "adapter-name/1@7")
adapter_source: str, optional – Where to load the adapter from: "hub", "local", "s3", or "pbase"
api_token: str, optional – Token used to access private adapters
max_new_tokens: int – Maximum number of tokens to generate
best_of: int – Generate best_of sequences and return the one with the highest log-probability
repetition_penalty: float – Penalty applied to repeated tokens (1.0 means no penalty)
return_full_text: bool – If True, prepend the original prompt to the generated text
seed: int – Random seed for reproducible sampling
stop_sequences: List[str] – Stop generation when any of these sequences is produced
temperature: float – Softmax temperature for sampling
top_k: int – Keep only the highest-probability k tokens for sampling
top_p: float – Use nucleus sampling to keep the smallest set of tokens whose cumulative probability ≥ top_p
truncate: int – Truncate input tokens to this length before generation
response_format: Dict[str, Any] | ResponseFormat, optional – Schema describing a structured format (e.g. a JSON object) to impose on the output
decoder_input_details: bool – Return log-probabilities and IDs for the decoder’s input tokens
details: bool – Return log-probabilities and IDs for all generated tokens
Returns
GenerationResponse - Object containing the generated text and metadata
Examples
Copy
Ask AI
from predibase import Predibase# Basic promptingclient = pb.deployments.client("qwen3-8b")print(client.generate("What is your name?").generated_text)# Using an adapter with max_new_tokensclient = pb.deployments.client("qwen3-8b")print(client.generate("hello", adapter_id="news-summarizer-model/1", max_new_tokens=100).generated_text)# Using a specific adapter checkpointclient = pb.deployments.client("qwen3-8b")# Prompts using the 7th checkpoint of adapter version `news-summarizer-model/1`.print(client.generate("hello", adapter_id="news-summarizer-model/1@7", max_new_tokens=100).generated_text)
from predibase import Predibase# Generate embeddings for your datatext = "Generate embeddings using your dedicated deployment."response = pb.embeddings.create(model="my-embedding-model", input=text)print(f"Generated {len(response.data[0].embedding)}-dimensional embedding")# Process a batch of documentsdocuments = [ "First document for embedding", "Second document with different content", "Third document to process in batch"]batch_embeddings = [pb.embeddings.create(model="my-embedding-model", input=doc).data[0].embedding for doc in documents]