Prompt Chain

Build Question Answering System with Langchain and AnalyticDB

Name: Build Question Answering System with Langchain and AnalyticDB
Availability: OnlineOnly
Author: OpenAI Cookbook

End-to-end question answering workflow using Langchain, AnalyticDB vector database, and OpenAI embeddings to build a knowledge base that retrieves context and

Copy chain

Works with openai langchainanalyticdb

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Models

gpt 4o

Add to Favorites

Why it matters

Implement an intelligent question-answering system by leveraging Langchain, OpenAI embeddings, and AnalyticDB as a knowledge base. This asset enables efficient retrieval and synthesis of information to provide accurate answers.

Outcomes

What it gets done

Calculate document embeddings using OpenAI API.

Store embeddings in AnalyticDB to create a searchable knowledge base.

Vectorize user queries and perform nearest neighbor searches in AnalyticDB.

Utilize retrieved context to generate answers with an LLM.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-qawithlangchainanalyticdbandopenai | bash

Steps

Steps in the chain

Install requirements

Install the following Python packages: openai, tiktoken, langchain and psycopg2cffi. openai provides convenient access to the OpenAI API. tiktoken is a fast BPE tokeniser for use with OpenAI's models. langchain helps us to build applications with LLM more easily. psycopg2cffi library is used to interact with the vector database, but any other PostgreSQL client library is also acceptable.

Prepare your OpenAI API key

The OpenAI API key is used for vectorization of the documents and queries. If you don't have an OpenAI API key, get one from https://platform.openai.com/account/api-keys. Once you get your key, add it to your environment variables as OPENAI_API_KEY.

Prepare your AnalyticDB connection string

To build the AnalyticDB connection string, you need to have the following parameters: PG_HOST, PG_PORT, PG_DATABASE, PG_USER, and PG_PASSWORD. Export them first to set correct connect string. Then build the connection string.

Load data

Load the data containing some natural questions and answers to them. All the data will be used to create a Langchain application with AnalyticDB being the knowledge base.

Chain definition

Langchain is already integrated with AnalyticDB and performs all the indexing for given list of documents. Store the set of answers you have. At this stage all the possible answers are already stored in AnalyticDB, so you can define the whole QA chain.

Search data

Once the data is put into AnalyticDB you can start asking some questions. A question will be automatically vectorized by OpenAI model, and the created vector will be used to find some possibly matching answers in AnalyticDB. Once retrieved, the most similar answers will be incorporated into the prompt sent to OpenAI Large Language Model.

Custom prompt templates

Provide your own prompt template to change the behaviour of the OpenAI LLM while still using the stuff chain type. Keep {context} and {question} as placeholders in your custom template.

Experimenting with custom prompts

Try using a different prompt template so the model: 1. Responds with a single-sentence answer if it knows it. 2. Suggests a random song title if it doesn't know the answer to your question.

Overview

Question Answering with Langchain, AnalyticDB and OpenAI

What it does

This prompt chain implements a complete question answering system that combines Langchain, AnalyticDB (Alibaba Cloud's PostgreSQL-compatible vector database), and OpenAI's embedding and language models. It calculates embeddings for documents using the OpenAI API, stores them in AnalyticDB to create a searchable knowledge base, converts user queries into embeddings, performs nearest neighbor searches to find relevant context, and uses an LLM to generate answers grounded in that context. The entire workflow is simplified through Langchain methods that handle indexing, retrieval, and answer gener

How it connects

Use this workflow when you need to build a question answering system over a custom corpus of documents, especially when working within the Alibaba Cloud ecosystem or requiring AnalyticDB's PostgreSQL compatibility. The notebook demonstrates loading data containing natural questions and answers to create a Langchain application with AnalyticDB as the knowledge base. Do NOT use this if you need real-time streaming answers or if your use case requires databases other than AnalyticDB-the implementation is specifically tied to AnalyticDB's vector search capabilities. Avoid this approach if you don'

Source README

Question Answering with Langchain, AnalyticDB and OpenAI

This notebook presents how to implement a Question Answering system with Langchain, AnalyticDB as a knowledge based and OpenAI embeddings. If you are not familiar with AnalyticDB, it’s better to check out the Getting_started_with_AnalyticDB_and_OpenAI.ipynb notebook.

This notebook presents an end-to-end process of:

Calculating the embeddings with OpenAI API.
Storing the embeddings in an AnalyticDB instance to build a knowledge base.
Converting raw text query to an embedding with OpenAI API.
Using AnalyticDB to perform the nearest neighbour search in the created collection to find some context.
Asking LLM to find the answer in a given context.

All the steps will be simplified to calling some corresponding Langchain methods.

Prerequisites

For the purposes of this exercise we need to prepare a couple of things:
AnalyticDB cloud instance.
Langchain as a framework.
An OpenAI API key.

Install requirements

This notebook requires the following Python packages: openai, tiktoken, langchain and psycopg2cffi.

openai provides convenient access to the OpenAI API.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
langchain helps us to build applications with LLM more easily.
psycopg2cffi library is used to interact with the vector database, but any other PostgreSQL client library is also acceptable.

Prepare your OpenAI API key

The OpenAI API key is used for vectorization of the documents and queries.

If you don't have an OpenAI API key, you can get one from [https://platform.openai.com/account/api-keys ).

Once you get your key, please add it to your environment variables as OPENAI_API_KEY by running following command:

Prepare your AnalyticDB connection string

To build the AnalyticDB connection string, you need to have the following parameters: PG_HOST, PG_PORT, PG_DATABASE, PG_USER, and PG_PASSWORD. You need to export them first to set correct connect string. Then build the connection string.

Load data

In this section we are going to load the data containing some natural questions and answers to them. All the data will be used to create a Langchain application with AnalyticDB being the knowledge base.

Chain definition

Langchain is already integrated with AnalyticDB and performs all the indexing for given list of documents. In our case we are going to store the set of answers we have.

At this stage all the possible answers are already stored in AnalyticDB, so we can define the whole QA chain.

Search data

Once the data is put into AnalyticDB we can start asking some questions. A question will be automatically vectorized by OpenAI model, and the created vector will be used to find some possibly matching answers in AnalyticDB. Once retrieved, the most similar answers will be incorporated into the prompt sent to OpenAI Large Language Model.

Custom prompt templates

The stuff chain type in Langchain uses a specific prompt with question and context documents incorporated. This is what the default prompt looks like:

Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Helpful Answer:

We can, however, provide our prompt template and change the behaviour of the OpenAI LLM, while still using the stuff chain type. It is important to keep {context} and {question} as placeholders.

Experimenting with custom prompts

We can try using a different prompt template, so the model:

Responds with a single-sentence answer if it knows it.
Suggests a random song title if it doesn't know the answer to our question.

Step 1: Install requirements

Install the following Python packages: openai, tiktoken, langchain and psycopg2cffi. openai provides convenient access to the OpenAI API. tiktoken is a fast BPE tokeniser for use with OpenAI's models. langchain helps us to build applications with LLM more easily. psycopg2cffi library is used to interact with the vector database, but any other PostgreSQL client library is also acceptable.

Step 2: Prepare your OpenAI API key

The OpenAI API key is used for vectorization of the documents and queries. If you don't have an OpenAI API key, get one from https://platform.openai.com/account/api-keys. Once you get your key, add it to your environment variables as OPENAI_API_KEY.

Step 3: Prepare your AnalyticDB connection string

To build the AnalyticDB connection string, you need to have the following parameters: PG_HOST, PG_PORT, PG_DATABASE, PG_USER, and PG_PASSWORD. Export them first to set correct connect string. Then build the connection string.

Step 4: Load data

Load the data containing some natural questions and answers to them. All the data will be used to create a Langchain application with AnalyticDB being the knowledge base.

Step 5: Chain definition

Langchain is already integrated with AnalyticDB and performs all the indexing for given list of documents. Store the set of answers you have. At this stage all the possible answers are already stored in AnalyticDB, so you can define the whole QA chain.

Step 6: Search data

Once the data is put into AnalyticDB you can start asking some questions. A question will be automatically vectorized by OpenAI model, and the created vector will be used to find some possibly matching answers in AnalyticDB. Once retrieved, the most similar answers will be incorporated into the prompt sent to OpenAI Large Language Model.

Step 7: Custom prompt templates

Provide your own prompt template to change the behaviour of the OpenAI LLM while still using the stuff chain type. Keep {context} and {question} as placeholders in your custom template.

Step 8: Experimenting with custom prompts

Try using a different prompt template so the model: 1. Responds with a single-sentence answer if it knows it. 2. Suggests a random song title if it doesn't know the answer to your question.

Discussion

Build Question Answering System with Langchain and AnalyticDB

What it gets done

Add it to your toolbox

Steps in the chain

Question Answering with Langchain, AnalyticDB and OpenAI

What it does

How it connects

Question Answering with Langchain, AnalyticDB and OpenAI

Prerequisites

Install requirements

Prepare your OpenAI API key

Prepare your AnalyticDB connection string

Load data

Chain definition

Search data

Custom prompt templates

Experimenting with custom prompts

Step 1: Install requirements

Step 2: Prepare your OpenAI API key

Step 3: Prepare your AnalyticDB connection string

Step 4: Load data

Step 5: Chain definition

Step 6: Search data

Step 7: Custom prompt templates

Step 8: Experimenting with custom prompts

Questions & comments · 0