llm_deployment.prompt
llm_deployment.prompt(data, max_new_tokens=128, temperature=0.1)
This method allows you to query the specified deployment in your Predibase environment.
Parameters:
data: Union[str, Dict[str, str]]
The prompt data to be passed to the specified LLM. If the LLM does not have a default prompt template, or if the template has a single interpolation slot, this should be passed as a single string. If the prompt template has multiple interpolation variables, this should be a dictionary mapping variable names to the data to inject. This is helpful for structuring few-shot learning examples by passing them in as a list of examples.
max_new_tokens: Optional[int] (default: None)
The maximum number of new tokens to generate, ignoring the number of tokens in the input prompt. If not set, will default to the max context window for the deployment minus the number of tokens in your prompt.
We'd still recommend setting max_new_tokens for non-instruction-tuned models since they are inclined to keep generating tokens.
temperature: float (default: 0.1)
Temperature is used to control the randomness of predictions. A high temperature value (closer to 1) makes the output more diverse and random, while a lower temperature (closer to 0) makes the model's responses more deterministic and focused on the most likely outcome. In other words, temperature adjusts the probability distribution from which the model picks the next token.
Returns:
A list of Predibase GeneratedResponse objects
Examples:
- Prompt Pretrained LLM
- Prompt Fine-tuned LLM
Simple LLM query (e.g. using llm_deployment
from the Predibase deployment link):
llm_deployment.prompt("What is the capital of Italy")
# [
# GeneratedResponse(
# prompt='What is the capital of Italy?',
# response='\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the users questions. USER: What is the capital of Italy? ASSISTANT:\n\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the users questions. USER: What is the capital of Italy? ASSISTANT:\n\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the users questions. USER: What is the capital of Italy? ASSISTANT:',
# raw_prompt=None,
# sample_index=0,
# sample_data=None,
# context=None,
# model_name='llama-2-13b',
# finish_reason=None,
# generated_tokens=None
# )
# ]
View the default prompt template and prompt. See our guide to serving your fine-tuned LLM, via LoRAX (prompt instantly, no deploying needed) or a dedicated deployment.
In this example, the model was fine-tuned with two input columns. If you only have one input column, you can simply provide the single string, no dictionary needed.
# View prompt template used for fine-tuning
ft_dep_template = ft_dep.default_prompt_template
print(ft_dep_template)
# Now prompt!
# In the code alpaca example, from the prompt template, we can see that our
# model was fine-tuned using a template that accepts an {instruction} and an {input}.
result = ft_dep.prompt(
{
"instruction": "Write an algorithm in Java to reverse the words in a string.",
"input": "The quick brown fox"
},
max_new_tokens=256)
print(result.response)
Serverless LLMs
For our SaaS users, Predibase offers shared serverless LLMs which you can prompt without needing to deploy anything.