Inference API
Metrics
Prometheus metrics scrape endpoint
GET
/
metrics
Copy
Ask AI
curl --request GET \
--url https://serving.app.predibase.com/tenant_id/deployments/v2/llms/deployment_name/metrics
Copy
Ask AI
{
"lorax_request_count": 123,
"lorax_request_skipped_tokens": 123,
"lorax_queue_length": 123,
"lorax_batch_inference_count": 123,
"lorax_request_max_new_tokens": {},
"lorax_request_inference_duration": {},
"lorax_request_mean_time_per_token_duration": {},
"lorax_request_generated_tokens": {},
"lorax_request_success": 123,
"lorax_batch_next_size": {},
"lorax_request_failure": 123,
"lorax_request_input_length": {},
"lorax_batch_current_size": {},
"lorax_batch_inference_success": 123,
"lorax_batch_inference_duration": {},
"lorax_request_queue_duration": {},
"lorax_request_duration": {},
"lorax_request_validation_duration": {}
}
Response
200 - text/plain
Metrics Response
The response is of type object
.
Copy
Ask AI
curl --request GET \
--url https://serving.app.predibase.com/tenant_id/deployments/v2/llms/deployment_name/metrics
Copy
Ask AI
{
"lorax_request_count": 123,
"lorax_request_skipped_tokens": 123,
"lorax_queue_length": 123,
"lorax_batch_inference_count": 123,
"lorax_request_max_new_tokens": {},
"lorax_request_inference_duration": {},
"lorax_request_mean_time_per_token_duration": {},
"lorax_request_generated_tokens": {},
"lorax_request_success": 123,
"lorax_batch_next_size": {},
"lorax_request_failure": 123,
"lorax_request_input_length": {},
"lorax_batch_current_size": {},
"lorax_batch_inference_success": 123,
"lorax_batch_inference_duration": {},
"lorax_request_queue_duration": {},
"lorax_request_duration": {},
"lorax_request_validation_duration": {}
}
Assistant
Responses are generated using AI and may contain mistakes.