Skip to main content

pb.deployments.client

pb.deployments.client

Use a LoRAX Client to prompt your LLM

Parameters:

   deployment_ref: str
Name of the deployment to prompt

   force_bare_client: boolean, default False
When False, additional checks and messages are included. Use True for production.

Returns:

   LoRAX Client

Using LoRAX for inference

LoRAX client provides several functions for inference, including streaming. See the LoRAX docs for all possible parameters you can configure.

Examples:

Example 1: Prompt base model

lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("What is your name?").generated_text)

Example 2: Prompt fine-tuned adapter max_new_tokens

lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("hello", adapter_id="news-summarizer-model/1", max_new_tokens=100).generated_text)

Example 3: Prompt a specific checkpoint from an adapter version

lorax_client = pb.deployments.client("mistral-7b-instruct")
# Prompts using the 7th checkpoint of adapter version `news-summarizer-model/1`.
print(lorax_client.generate("hello", adapter_id="news-summarizer-model/1@7", max_new_tokens=100).generated_text)

Example 4: When you're ready for production, set force_bare_client=True. When this flag is set to False, the SDK runs a sub-process which queries the Predibase API and prints out a helpful message if the deployment is still scaling up, which is useful for experimentation and notebooks. When you're ready for production, set this flag to True to avoid redundant API calls.

lorax_client = pb.deployments.client("mistral-7b-instruct", force_bare_client=True)