Prompt Chain

Build Q&A System with Langchain, Tair & OpenAI

End-to-end question answering workflow using Langchain, Tair vector database, and OpenAI embeddings to build a knowledge base that retrieves context and

Works with openailangchaintair

59
Spark score
out of 100
Updated yesterday
Version 1.0.0
Models

Add to Favorites

Why it matters

Implement an end-to-end question answering system leveraging Langchain, Tair for knowledge base storage, and OpenAI for embeddings and LLM capabilities.

Outcomes

What it gets done

01

Calculate document embeddings using OpenAI API.

02

Store embeddings in Tair to create a knowledge base.

03

Perform nearest neighbor searches in Tair for relevant context.

04

Utilize LLM to generate answers based on retrieved context.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-qawithlangchaintairandopenai | bash

Steps

Steps in the chain

01
Install requirements

Install the following Python packages: openai, tiktoken, langchain and tair. openai provides convenient access to the OpenAI API. tiktoken is a fast BPE tokeniser for use with OpenAI's models. langchain helps us to build applications with LLM more easily. tair library is used to interact with the tair vector database.

02
Prepare your OpenAI API key

The OpenAI API key is used for vectorization of the documents and queries. If you don't have an OpenAI API key, you can get one from https://platform.openai.com/account/api-keys. Once you get your key, please add it by getpass.

03
Prepare your Tair URL

To build the Tair connection, you need to have TAIR_URL.

04
Load data

Load the data containing some natural questions and answers to them. All the data will be used to create a Langchain application with Tair being the knowledge base.

05
Chain definition

Langchain is already integrated with Tair and performs all the indexing for given list of documents. In our case we are going to store the set of answers we have. At this stage all the possible answers are already stored in Tair, so we can define the whole QA chain.

06
Search data

Once the data is put into Tair we can start asking some questions. A question will be automatically vectorized by OpenAI model, and the created vector will be used to find some possibly matching answers in Tair. Once retrieved, the most similar answers will be incorporated into the prompt sent to OpenAI Large Language Model.

07
Custom prompt templates

The stuff chain type in Langchain uses a specific prompt with question and context documents incorporated. We can provide our prompt template and change the behaviour of the OpenAI LLM, while still using the stuff chain type. It is important to keep {context} and {question} as placeholders.

08
Experimenting with custom prompts

Try using a different prompt template, so the model: 1. Responds with a single-sentence answer if it knows it. 2. Suggests a random song title if it doesn't know the answer to our question.

Overview

Question Answering with Langchain, Tair and OpenAI

What it does

This notebook demonstrates an end-to-end question answering system that combines Langchain, Tair vector database, and OpenAI embeddings. It calculates embeddings for documents using OpenAI API, stores them in Tair to build a knowledge base, converts text queries to embeddings, performs nearest neighbor search to find relevant context, and uses an LLM to generate answers. The workflow supports custom prompt templates while maintaining required placeholders for context and questions.

How it connects

Use this when you need to implement a QA system over a corpus of documents or answers, where users ask natural language questions and receive contextually grounded responses. It's ideal for building knowledge bases, customer support systems, or any application requiring semantic search combined with generative AI, especially when you want the flexibility to customize how the LLM responds based on retrieved context.

Source README

Question Answering with Langchain, Tair and OpenAI

This notebook presents how to implement a Question Answering system with Langchain, Tair as a knowledge based and OpenAI embeddings. If you are not familiar with Tair, it’s better to check out the Getting_started_with_Tair_and_OpenAI.ipynb notebook.

This notebook presents an end-to-end process of:

  • Calculating the embeddings with OpenAI API.
  • Storing the embeddings in an Tair instance to build a knowledge base.
  • Converting raw text query to an embedding with OpenAI API.
  • Using Tair to perform the nearest neighbour search in the created collection to find some context.
  • Asking LLM to find the answer in a given context.

All the steps will be simplified to calling some corresponding Langchain methods.

Prerequisites

For the purposes of this exercise we need to prepare a couple of things:
Tair cloud instance.
Langchain as a framework.
An OpenAI API key.

Install requirements

This notebook requires the following Python packages: openai, tiktoken, langchain and tair.

  • openai provides convenient access to the OpenAI API.
  • tiktoken is a fast BPE tokeniser for use with OpenAI's models.
  • langchain helps us to build applications with LLM more easily.
  • tair library is used to interact with the tair vector database.
! pip install openai tiktoken langchain tair

Prepare your OpenAI API key

The OpenAI API key is used for vectorization of the documents and queries.

If you don't have an OpenAI API key, you can get one from [https://platform.openai.com/account/api-keys ).

Once you get your key, please add it by getpass.

import getpass

openai_api_key = getpass.getpass("Input your OpenAI API key:")

Prepare your Tair URL

To build the Tair connection, you need to have TAIR_URL.

# The format of url: redis://[[username]:[password]]@localhost:6379/0
TAIR_URL = getpass.getpass("Input your tair url:")

Load data

In this section we are going to load the data containing some natural questions and answers to them. All the data will be used to create a Langchain application with Tair being the knowledge base.

import wget

# All the examples come from https://ai.google.com/research/NaturalQuestions
# This is a sample of the training set that we download and extract for some
# further processing.
wget.download("https://storage.googleapis.com/dataset-natural-questions/questions.json")
wget.download("https://storage.googleapis.com/dataset-natural-questions/answers.json")
import json

with open("questions.json", "r") as fp:
    questions = json.load(fp)

with open("answers.json", "r") as fp:
    answers = json.load(fp)
print(questions[0])
print(answers[0])

Chain definition

Langchain is already integrated with Tair and performs all the indexing for given list of documents. In our case we are going to store the set of answers we have.

from langchain.vectorstores import Tair
from langchain.embeddings import OpenAIEmbeddings
from langchain import VectorDBQA, OpenAI

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
doc_store = Tair.from_texts(
    texts=answers, embedding=embeddings, tair_url=TAIR_URL,
)

At this stage all the possible answers are already stored in Tair, so we can define the whole QA chain.

llm = OpenAI(openai_api_key=openai_api_key)
qa = VectorDBQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    vectorstore=doc_store,
    return_source_documents=False,
)

Search data

Once the data is put into Tair we can start asking some questions. A question will be automatically vectorized by OpenAI model, and the created vector will be used to find some possibly matching answers in Tair. Once retrieved, the most similar answers will be incorporated into the prompt sent to OpenAI Large Language Model.

import random

random.seed(52)
selected_questions = random.choices(questions, k=5)
import time
for question in selected_questions:
    print(">", question)
    print(qa.run(question), end="\n\n")
    # wait 20seconds because of the rate limit
    time.sleep(20)

Custom prompt templates

The stuff chain type in Langchain uses a specific prompt with question and context documents incorporated. This is what the default prompt looks like:

Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Helpful Answer:

We can, however, provide our prompt template and change the behaviour of the OpenAI LLM, while still using the stuff chain type. It is important to keep {context} and {question} as placeholders.

Experimenting with custom prompts

We can try using a different prompt template, so the model:

  1. Responds with a single-sentence answer if it knows it.
  2. Suggests a random song title if it doesn't know the answer to our question.
from langchain.prompts import PromptTemplate
custom_prompt = """
Use the following pieces of context to answer the question at the end. Please provide
a short single-sentence summary answer only. If you don't know the answer or if it's
not present in given context, don't try to make up an answer, but suggest me a random
unrelated song title I could listen to.
Context: {context}
Question: {question}
Helpful Answer:
"""

custom_prompt_template = PromptTemplate(
    template=custom_prompt, input_variables=["context", "question"]
)
custom_qa = VectorDBQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    vectorstore=doc_store,
    return_source_documents=False,
    chain_type_kwargs={"prompt": custom_prompt_template},
)
random.seed(41)
for question in random.choices(questions, k=5):
    print(">", question)
    print(custom_qa.run(question), end="\n\n")
    # wait 20seconds because of the rate limit
    time.sleep(20)

Step 1: Install requirements

Install the following Python packages: openai, tiktoken, langchain and tair. openai provides convenient access to the OpenAI API. tiktoken is a fast BPE tokeniser for use with OpenAI's models. langchain helps us to build applications with LLM more easily. tair library is used to interact with the tair vector database.

Step 2: Prepare your OpenAI API key

The OpenAI API key is used for vectorization of the documents and queries. If you don't have an OpenAI API key, you can get one from https://platform.openai.com/account/api-keys. Once you get your key, please add it by getpass.

Step 3: Prepare your Tair URL

To build the Tair connection, you need to have TAIR_URL.

Step 4: Load data

Load the data containing some natural questions and answers to them. All the data will be used to create a Langchain application with Tair being the knowledge base.

Step 5: Chain definition

Langchain is already integrated with Tair and performs all the indexing for given list of documents. In our case we are going to store the set of answers we have. At this stage all the possible answers are already stored in Tair, so we can define the whole QA chain.

Step 6: Search data

Once the data is put into Tair we can start asking some questions. A question will be automatically vectorized by OpenAI model, and the created vector will be used to find some possibly matching answers in Tair. Once retrieved, the most similar answers will be incorporated into the prompt sent to OpenAI Large Language Model.

Step 7: Custom prompt templates

The stuff chain type in Langchain uses a specific prompt with question and context documents incorporated. We can provide our prompt template and change the behaviour of the OpenAI LLM, while still using the stuff chain type. It is important to keep {context} and {question} as placeholders.

Step 8: Experimenting with custom prompts

Try using a different prompt template, so the model: 1. Responds with a single-sentence answer if it knows it. 2. Suggests a random song title if it doesn't know the answer to our question.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.