Predibase + LlamaIndex: Building a RAG System
The following walkthrough shows you how to use Predibase-hosted LLMs with LlamaIndex to build a RAG system. There are a few pieces required to build a RAG system:- LLM provider
- Predibase is the LLM provider here. We can serve base LLMs and/or fine-tuned LLMs for whatever generative task you have.
- Embedding Model
- This model generates embeddings for the data that you are storing in your Vector Store
- In this example you have the option of using a local HuggingFace embedding
model, or OpenAI’s embedding model.
- Note: You need to have an OpenAI account with funds and an API token to use the OpenAI embedding model.
- In the near future, you will be able to train and deploy your own embedding models using Predibase
- Vector Store
- This is where we store the embedded data that we want to retrieve later at query time
- In this example we will use Pinecone for our Vector Store
Getting Started
Predibase
- If you don’t have a Predibase account already, sign up for a free trial here
- Once you’ve logged in, navigate to Settings > My profile
- Generate a new API token
- Copy the API token and paste in the first setup cell below
OpenAI (Optional)
- If you don’t have an OpenAI account already, sign up here
- Navigate to OpenAI’s API keys page
- If you have not already, generate an API key
- Copy the API key and paste in the second setup cell below
Pinecone
- If you don’t have a Pinecone account already, they have a free tier available for trial
- Navigate to the API Keys page
- If you have not already, generate an API key
Step 0: Setup
Step 1: Setting up the Predibase LLM
There a few parameters to keep in mind while setting up your Predibase LLM:- model_name: This must be an LLM currently deployed in your Predibase
environment.
- Any of models shown in the LLM query view dropdown are valid options.
- If you are running Predibase in a VPC, you’ll need to deploy an LLM first.
- adapter_id: An optional (Predibase or HuggingFace) ID of a fine-tuned
LLM adapter, whose base model is the
model
parameter;- the fine-tuned adapter must be compatible with its base model; otherwise, an error is raised.
- adapter_version: version number of the fine-tuned LLM adapter;
- The version is a required parameter for Predibase (ignored otherwise).
- temperature: Controls the randomness of your model responses.
- A higher value will give the model more creative leeway
- A lower value will give a more reproducible and consistent response
- max_new_tokens: Controls the number of tokens the model can produce.
Step 2: Set up Embedding model
If you are using a local HuggingFace embedding model, you can use the following code to set up your embedding model:Step 3: Set up Vector Store
As mentioned before, we’ll be using Pinecone for this example. Pinecone has a free tier that you can use to try out this example. You can also swap out any other Vector Store supported by LlamaIndex.Step 4: Set up index
Here we create the index so that any query you make will pull the relevant context from your Vector Store.Step 5: Querying the LLM with RAG
Now that we’ve set up our index, we can ask questions over the documents and Predibase + LlamaIndex will search for the relevant context and provide a response to your question within said context.print
statement.