Retrieval-Augmented Generation
Build an end-to-end RAG system with Predibase and LlamaIndex
Here we are going to show how to build an end-to-end Retrieval Augmented Generation (RAG) system using Predibase and other frameworks in the ecosystem. Specifically, Predibase currently integrates with other tools that specialize in RAG workflows including Langchain and Llamaindex.
The following walkthrough will show you how to use Predibase + LlamaIndex to set up all the moving parts of a RAG system, with Predibase as the LLM provider. Feel free to follow along in the colab notebook!
Predibase + LlamaIndex: Building a RAG System
The following walkthrough shows you how to use Predibase-hosted LLMs with LlamaIndex to build a RAG system.
There are a few pieces required to build a RAG system:
- LLM provider
- Predibase is the LLM provider here. We can serve base LLMs and/or fine-tuned LLMs for whatever generative task you have.
- Embedding Model
- This model generates embeddings for the data that you are storing in your Vector Store
- In this example you have the option of using a local HuggingFace embedding
model, or OpenAI’s embedding model.
- Note: You need to have an OpenAI account with funds and an API token to use the OpenAI embedding model.
- In the near future, you will be able to train and deploy your own embedding models using Predibase
- Vector Store
- This is where we store the embedded data that we want to retrieve later at query time
- In this example we will use Pinecone for our Vector Store
Getting Started
Predibase
- If you don’t have a Predibase account already, sign up for a free trial here
- Once you’ve logged in, navigate to Settings > My profile
- Generate a new API token
- Copy the API token and paste in the first setup cell below
OpenAI (Optional)
- If you don’t have an OpenAI account already, sign up here
- Navigate to OpenAI’s API keys page
- If you have not already, generate an API key
- Copy the API key and paste in the second setup cell below
Pinecone
- If you don’t have a Pinecone account already, they have a free tier available for trial
- Navigate to the API Keys page
- If you have not already, generate an API key
Step 0: Setup
The following is only required if you’ll be using an OpenAI embedding model.
Step 1: Setting up the Predibase LLM
There a few parameters to keep in mind while setting up your Predibase LLM:
- model_name: This must be an LLM currently deployed in your Predibase
environment.
- Any of models shown in the LLM query view dropdown are valid options.
- If you are running Predibase in a VPC, you’ll need to deploy an LLM first.
- adapter_id: An optional (Predibase or HuggingFace) ID of a fine-tuned
LLM adapter, whose base model is the
model
parameter;- the fine-tuned adapter must be compatible with its base model; otherwise, an error is raised.
- adapter_version: version number of the fine-tuned LLM adapter;
- The version is a required parameter for Predibase (ignored otherwise).
- temperature: Controls the randomness of your model responses.
- A higher value will give the model more creative leeway
- A lower value will give a more reproducible and consistent response
- max_new_tokens: Controls the number of tokens the model can produce.
Step 2: Set up Embedding model
If you are using a local HuggingFace embedding model, you can use the following code to set up your embedding model:
If you are using OpenAI’s embedding model, you can use the following code to set up your embedding model:
Now with our embedding model set up, we will create the service context that will be used to query the LLM and embed our data/queries.
Step 3: Set up Vector Store
As mentioned before, we’ll be using Pinecone for this example. Pinecone has a free tier that you can use to try out this example. You can also swap out any other Vector Store supported by LlamaIndex.
If you are using the HuggingFace embedding model, you can use the following code to set up your Vector Store:
If you are using the OpenAI embedding model, you can use the following code to set up your Vector Store:
Note: You need to have OpenAI set up and configured for this option. If you do not have an OpenAI API key, we recommend you go with the HuggingFace Index option above.
Finally, we’ll select our index, create the storage context, and index our documents!
Step 4: Set up index
Here we create the index so that any query you make will pull the relevant context from your Vector Store.
Step 5: Querying the LLM with RAG
Now that we’ve set up our index, we can ask questions over the documents and Predibase + LlamaIndex will search for the relevant context and provide a response to your question within said context.
Now we can ask questions over our documents!
To see the response to your query, you can pass the response variable to a
print
statement.