Build an end-to-end RAG system with Predibase and LlamaIndex
Here we are going to show how to build an end-to-end Retrieval Augmented
Generation (RAG) system using Predibase and other frameworks in the ecosystem.
Specifically, Predibase currently integrates with other tools that specialize in
RAG workflows including Langchain and
Llamaindex.The following walkthrough will show you how to use Predibase + LlamaIndex to set
up all the moving parts of a RAG system, with Predibase as the LLM provider.
Feel free to follow along in the colab notebook!Open in Colab Notebook
The following walkthrough shows you how to use Predibase-hosted LLMs with
LlamaIndex to build a RAG system.There are a few pieces required to build a RAG system:
LLM provider
Predibase is the LLM provider here. We can serve base LLMs and/or
fine-tuned LLMs for whatever generative task you have.
Embedding Model
This model generates embeddings for the data that you are storing in your
Vector Store
In this example you have the option of using a local HuggingFace embedding
model, or OpenAI’s embedding model.
Note: You need to have an OpenAI account with funds and an API
token to use the OpenAI embedding model.
In the near future, you will be able to train and deploy your own embedding
models using Predibase
Vector Store
This is where we store the embedded data that we want to retrieve later at
query time
In this example we will use Pinecone for our Vector Store
If you are using OpenAI’s embedding model, you can use the following code to set
up your embedding model:
Copy
Ask AI
# loads text-embedding-ada-002 OpenAI embedding model - uncomment and run for the OpenAI optionopenai_embed_model = OpenAIEmbedding()
Now with our embedding model set up, we will create the service context that
will be used to query the LLM and embed our data/queries.
Copy
Ask AI
# Create a ServiceContext with our Predibase LLM and chosen embedding modelctx = ServiceContext.from_defaults(llm=predibase_llm, embed_model=hf_embed_model)# Set the Predibase LLM ServiceContext to the defaultset_global_service_context(ctx)
As mentioned before, we’ll be using Pinecone for this example. Pinecone has a
free tier that you can use to try
out this example. You can also swap out any other
Vector Store
supported by LlamaIndex.
Copy
Ask AI
# Initialize pinecone and create indexpinecone.init(api_key="YOUR API TOKEN HERE", environment="gcp-starter")
If you are using the HuggingFace embedding model, you can use the following code
to set up your Vector Store:
Copy
Ask AI
# HF Index - Compatible with local HF embedding model output dimensionspinecone.create_index("predibase-demo-hf", dimension=384, metric="euclidean", pod_type="p1")
If you are using the OpenAI embedding model, you can use the following code to
set up your Vector Store:Note: You need to have OpenAI set up and configured for this option. If
you do not have an OpenAI API key, we recommend you go with the HuggingFace
Index option above.
Copy
Ask AI
# OpenAI Index - Compatible with OpenAI embedding model (text-embedding-ada-002) output dimensionspinecone.create_index("predibase-demo-openai", dimension=1536, metric="euclidean", pod_type="p1")
Finally, we’ll select our index, create the storage context, and index our
documents!
Copy
Ask AI
# construct vector store and custom storage contextpincone_vector_store = PineconeVectorStore(pinecone.Index("predibase-demo-hf"))pinecone_storage_context = StorageContext.from_defaults(vector_store=pincone_vector_store)# Load in the documents you want to indexdocuments = SimpleDirectoryReader("/Users/connor/Documents/Projects/datasets/huffington_post_pdfs/").load_data()
Now that we’ve set up our index, we can ask questions over the documents and
Predibase + LlamaIndex will search for the relevant context and provide a
response to your question within said context.