pb.deployments.client
pb.deployments.client
Use a LoRAX Client to prompt your LLM
Parameters:
deployment_ref: str
Name of the deployment to prompt
force_bare_client: boolean, default False
When False, additional checks and messages are included. Use True for production.
Returns:
LoRAX client provides several functions for inference, including streaming. See the LoRAX docs for all possible parameters you can configure.
Examples:
Example 1: Prompt base model
lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("What is your name?").generated_text)
Example 2: Prompt fine-tuned adapter max_new_tokens
lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("hello", adapter_id="news-summarizer-model/1", max_new_tokens=100).generated_text)
Example 3: Prompt a specific checkpoint from an adapter version
lorax_client = pb.deployments.client("mistral-7b-instruct")
# Prompts using the 7th checkpoint of adapter version `news-summarizer-model/1`.
print(lorax_client.generate("hello", adapter_id="news-summarizer-model/1@7", max_new_tokens=100).generated_text)
Example 4: When you're ready for production, set force_bare_client=True
. When this flag is set to False
, the SDK runs a sub-process which queries the Predibase API and prints out a helpful message if the deployment is still scaling up, which is useful for experimentation and notebooks. When you're ready for production, set this flag to True
to avoid redundant API calls.
lorax_client = pb.deployments.client("mistral-7b-instruct", force_bare_client=True)