Skip to main content

llm_deployment.prompt

llm_deployment.prompt(data, max_new_tokens=128, temperature=0.1)

This method allows you to query the specified deployment in your Predibase environment.

Parameters:

data: Union[str, Dict[str, str]]

The prompt data to be passed to the specified LLM. If the LLM does not have a default prompt template, or if the template has a single interpolation slot, this should be passed as a single string. If the prompt template has multiple interpolation variables, this should be a dictionary mapping variable names to the data to inject. This is helpful for structuring few-shot learning examples by passing them in as a list of examples.

max_new_tokens: Optional[int] (default: None)

The maximum number of new tokens to generate, ignoring the number of tokens in the input prompt. If not set, will default to the max context window for the deployment minus the number of tokens in your prompt.

Tip

We'd still recommend setting max_new_tokens for non-instruction-tuned models since they are inclined to keep generating tokens.

temperature: float (default: 0.1)

Temperature is used to control the randomness of predictions. A high temperature value (closer to 1) makes the output more diverse and random, while a lower temperature (closer to 0) makes the model's responses more deterministic and focused on the most likely outcome. In other words, temperature adjusts the probability distribution from which the model picks the next token.

Returns:

A list of Predibase GeneratedResponse objects

Examples:

Simple LLM query (e.g. using llm_deployment from the Predibase deployment link):

llm_deployment.prompt("What is the capital of Italy")
# [
# GeneratedResponse(
# prompt='What is the capital of Italy?',
# response='\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the users questions. USER: What is the capital of Italy? ASSISTANT:\n\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the users questions. USER: What is the capital of Italy? ASSISTANT:\n\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the users questions. USER: What is the capital of Italy? ASSISTANT:',
# raw_prompt=None,
# sample_index=0,
# sample_data=None,
# context=None,
# model_name='llama-2-13b',
# finish_reason=None,
# generated_tokens=None
# )
# ]

Serverless LLMs

For our SaaS users, Predibase offers shared serverless LLMs which you can prompt without needing to deploy anything.