Augment GPT-4 with External Knowledge
A prompt workflow that connects GPT-4 to Pinecone vector database to retrieve relevant context from LangChain documentation, reducing hallucinations by
Why it matters
Enhance GPT-4's responses by retrieving relevant information from a Pinecone vector database, reducing hallucinations and grounding answers in factual data.
Outcomes
What it gets done
Scrape and process documentation from web sources.
Embed text data into vectors using OpenAI's embedding models.
Index embeddings in a Pinecone vector database for efficient retrieval.
Query Pinecone for relevant context and feed it to GPT-4 for augmented generation.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/oai-gpt4retrievalaugmentation | bash Steps
Steps in the chain
Download the LangChain docs from langchain.readthedocs.io/. Get all .html files located on the site and download them into the `rtdocs` directory. Use LangChain's `ReadTheDocsLoader` to process these docs into hundreds of processed doc pages.
Chunk the processed documents into ~400 token chunks using langchain and tiktoken. Create a data list containing the plaintext page content and source information from each document.
Use `text-embedding-3-small` as the embedding model to embed text. Apply this embedding logic to the langchain docs dataset. Each vector embedding will contain 1536 dimensions (the output dimensionality of the `text-embedding-3-small` model).
Get a free API key from Pinecone and initialize your connection to Pinecone. Create a new index to store the embeddings and enable efficient vector search through them.
Populate the Pinecone index with OpenAI `text-embedding-3-small` built embeddings of all langchain docs. This adds all documents to the index for later retrieval.
Create a query vector `xq` from your search query. Use `xq` to retrieve the most relevant chunks from the LangChain docs stored in Pinecone.
Pass the retrieved document chunks into GPT-4 via the `ChatCompletions` endpoint. Add the retrieved information into the model by passing it into user prompts alongside the original query to generate answers backed by real data sources.
Overview
Retrieval Augmentation for GPT-4 using Pinecone
What it does
This workflow connects GPT-4 to a Pinecone vector database to implement retrieval-augmented generation.
How it connects
Use this when you need GPT-4 to answer questions about specific documentation or knowledge bases where accuracy matters and hallucinations must be minimized. It's ideal for technical documentation Q&A, customer support systems, or any scenario where responses must be grounded in verifiable, up-to-date source material that you control.
Source README
Retrieval Augmentation for GPT-4 using Pinecone
Fixing LLMs that Hallucinate
In this notebook we will learn how to query relevant contexts to our queries from Pinecone, and pass these to a GPT-4 model to generate an answer backed by real data sources.
GPT-4 is a big step up from previous OpenAI completion models. It also exclusively uses the ChatCompletion endpoint, so we must use it in a slightly different way to usual. However, the power of the model makes the change worthwhile, particularly when augmented with an external knowledge base like the Pinecone vector database.
Required installs for this notebook are:
Preparing the Data
In this example, we will download the LangChain docs from langchain.readthedocs.io/. We get all .html files located on the site like so:
This downloads all HTML into the rtdocs directory. Now we can use LangChain itself to process these docs. We do this using the ReadTheDocsLoader like so:
This leaves us with hundreds of processed doc pages. Let's take a look at the format each one contains:
We access the plaintext page content like so:
We can also find the source of each document:
We can use these to create our data list:
It's pretty ugly but it's good enough for now. Let's see how we can process all of these. We will chunk everything into ~400 token chunks, we can do this easily with langchain and tiktoken:
Process the data into more chunks using this approach.
Our chunks are ready so now we move onto embedding and indexing everything.
Initialize Embedding Model
We use text-embedding-3-small as the embedding model. We can embed text like so:
In the response res we will find a JSON-like object containing our new embeddings within the 'data' field.
Inside 'data' we will find two records, one for each of the two sentences we just embedded. Each vector embedding contains 1536 dimensions (the output dimensionality of the text-embedding-3-small model.
We will apply this same embedding logic to the langchain docs dataset we've just scraped. But before doing so we must create a place to store the embeddings.
Initializing the Index
Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a free API key and enter it below where we will initialize our connection to Pinecone and create a new index.
We can see the index is currently empty with a total_vector_count of 0. We can begin populating it with OpenAI text-embedding-3-small built embeddings like so:
Now we've added all of our langchain docs to the index. With that we can move on to retrieval and then answer generation using GPT-4.
Retrieval
To search through our documents we first need to create a query vector xq. Using xq we will retrieve the most relevant chunks from the LangChain docs, like so:
With retrieval complete, we move on to feeding these into GPT-4 to produce answers.
Retrieval Augmented Generation
GPT-4 is currently accessed via the ChatCompletions endpoint of OpenAI. To add the information we retrieved into the model, we need to pass it into our user prompts alongside our original query. We can do that like so:
Now we ask the question:
To display this response nicely, we will display it in markdown.
Let's compare this to a non-augmented query...
If we drop the "I don't know" part of the primer?
Step 1: Preparing the Data
Download the LangChain docs from langchain.readthedocs.io/. Get all .html files located on the site and download them into the `rtdocs` directory. Use LangChain's `ReadTheDocsLoader` to process these docs into hundreds of processed doc pages.
Step 2: Process Documents into Chunks
Chunk the processed documents into ~400 token chunks using langchain and tiktoken. Create a data list containing the plaintext page content and source information from each document.
Step 3: Initialize Embedding Model
Use `text-embedding-3-small` as the embedding model to embed text. Apply this embedding logic to the langchain docs dataset. Each vector embedding will contain 1536 dimensions (the output dimensionality of the `text-embedding-3-small` model).
Step 4: Initialize Pinecone Index
Get a free API key from Pinecone and initialize your connection to Pinecone. Create a new index to store the embeddings and enable efficient vector search through them.
Step 5: Populate Index with Embeddings
Populate the Pinecone index with OpenAI `text-embedding-3-small` built embeddings of all langchain docs. This adds all documents to the index for later retrieval.
Step 6: Retrieval
Create a query vector `xq` from your search query. Use `xq` to retrieve the most relevant chunks from the LangChain docs stored in Pinecone.
Step 7: Retrieval Augmented Generation with GPT-4
Pass the retrieved document chunks into GPT-4 via the `ChatCompletions` endpoint. Add the retrieved information into the model by passing it into user prompts alongside the original query to generate answers backed by real data sources.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.