Batch Inference
Process inference requests in bulk at a lower cost
Warning: This is a beta feature. APIs and behavior may change.
Use the batch inference API to process inference requests in bulk at a lower cost.
Dataset Format
Batch inference takes in a dataset as input, with each row being used as a prompt to the model. The input dataset must conform to the OpenAI batch inference format.
Example dataset row:
To use an adapter, specify the adapter ID in the model
field. The base model
will be specified when creating the batch job.
Once your JSONL file is ready, upload it to Predibase.
For information about preparing your dataset, see Dataset Preparation.
Creating a Batch Job
Warning: Currently only base models up to 16B parameters are supported.
Monitoring Jobs
Getting Results
Once the job status is completed
, download the results:
Example result row:
Note: Result rows may not be in the same order as the input dataset. Use the
custom_id
field to match results with inputs.
Pricing
Batch inference is priced at $0.50 per million tokens, with no difference between input and output tokens.