Skip to main content



Indexing refers to the practice of using an embedding index to fetch semantically similar chunks of context from an internal corpus to be inserted into the LLM prompt for the purpose of factually answering questions about data the model was never trained on. This approach is also known in the literature as In-Context Retrieval Augmented Language Modeling (RALM).

When to use Indexing

  • You want to factually answer questions about an internal corpus of documents.
  • You want to perform Few-Shot Learning using semantically relevant examples.
  • All the information needed to answer your question can fit within a single prompt to the model.

When NOT to use Indexing

  • You are trying to adapt the model to perform a generative task, rather than answer specific questions (see: Fine-Tuning).
  • You are trying to perform a predictive task and have thousands of ground truth examples (see: Supervised ML).
  • You want to answer questions that requires more domain-specific information than will fit in a single prompt.
  • Your queries are latency or throughput sensitive, in which case augmenting prompts to a general-purpose LLM will be slower than fine-tuning to build a specialized model.

How Indexing Works

The indexing pipeline in Predibase consists of the following components:

  1. A Dataset to be indexed.
  2. An Embedding Model that converts samples of data / prompts into a semantic embedding space.
  3. An Embedding Index that provides fast retrieval of data samples from semantically similar embeddings for a given prompt.
  4. A Large Language Model that takes the context provided by the Embedding Index and uses it to answer the question in the prompt.
  5. An Engine to execute the query and operate over the dataset.

Indexing Flow

  1. User submits "prompt" query for execution.
  2. Query Engine begins processing the query.
  3. Create the embedding index over the Dataset to Index if it does not exist or is out-of-date.
    1. Submit each row / chunk of data from the Dataset to Index to the Embedding Model to generate embeddings.
    2. Insert / update each embedding into the Embedding Index.
  4. Populate the prompt template any additional context from the Dataset for Batch Inference, if provided (see: Batch Prediction).
  5. Submit the prompt to the Embedding Model to generate it embedding.
  6. Fetch the top K most relevant samples from the data from the index based on similarity to the prompt embedding ("K" here is determined by the size of the LLM context).
  7. Augment the prompt with the K data samples.
  8. Submit the augmented prompt to the LLM to answer the query.
  9. Return results back to the user.



From the LLM tab on the Query page, select a dataset from the dropdown titled Dataset to Index.

Then enter a query into the input field and click Run. The result will contain the prompt and response columns. Clicking on the prompt cell, you can view the full context (rows from your dataset) that was inserted into the prompt.


pbase prompt llm -t "What is a good dry red wine?" --model-name my-llm -index-name wine_reviews