Prompt Chain

Orchestrate Multi-Tool Responses with RAG

Name: Orchestrate Multi-Tool Responses with RAG
Availability: OnlineOnly
Author: OpenAI Cookbook

Multi-tool orchestration workflow using OpenAI Responses API to route queries between web search, external vector databases like Pinecone, and RAG retrieval

Copy chain

Works with openaipineconehuggingface

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Models

gpt 4o

Add to Favorites

Why it matters

Dynamically route user queries to the most appropriate tools, including web search and vector databases, to generate context-aware responses.

Outcomes

What it gets done

Implement RAG for intelligent tool selection.

Integrate with external vector databases like Pinecone.

Orchestrate sequential tool calls for complex queries.

Leverage OpenAI's Responses API for dynamic response generation.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-responsesapitoolorchestration | bash

Steps

Steps in the chain

Create a Pinecone Index Based on the Dataset

Use the dataset itself to determine the embedding dimensionality. For example, compute one embedding from the merged column and then create the index accordingly.

Upsert the Dataset into Pinecone index

Process the dataset in batches, generate embeddings for each merged text, prepare metadata (including separate Question and Answer fields), and upsert each batch into the index. You may also update metadata for specific entries if needed.

Query the Pinecone Index

Create a natural language query, compute its embedding, and perform a similarity search on the Pinecone index. The returned results include metadata that provides context for generating answers.

Generate a Response Using the Retrieved Context

Select the best matching result from your query results and use the OpenAI Responses API to generate a final answer by combining the retrieved context with the original question.

Orchestrate Multi-Tool Calls

Define the built-in function available through the Responses API, including the ability to invoke the external Vector Store - Pinecone. Configure Web Search Preview Tool for live web searches and Pinecone Search Tool for querying the vector database using semantic search.

Implement Multi-tool orchestration flow

Modify the input query and system instructions to the Responses API to follow a tool calling sequence. The model selects the appropriate tool based on the input query: general questions use web-search, medical inquiries use Pinecone retrieval, and other queries may not require tool calls.

Overview

Responses Api Tool Orchestration

What it does

This is a technical cookbook that guides developers through building multi-tool workflows with OpenAI's Responses API, demonstrating RAG implementation with external vector databases.

How it connects

Use this when you need to route queries intelligently between different data sources-web search for current information, vector databases for domain-specific content, or direct responses-and want to see a practical implementation using the Responses API with Pinecone.

Source README

Multi-Tool Orchestration with RAG approach using OpenAI's Responses API

This cookbook guides you through building dynamic, multi-tool workflows using OpenAI's Responses API. It demonstrates how to implement a Retrieval-Augmented Generation (RAG) approach that intelligently routes user queries to the appropriate in-built or external tools. Whether your query calls for general knowledge or requires accessing specific internal context from a vector database (like Pinecone), this guide shows you how to integrate function calls, web searches in-built tool, and leverage document retrieval to generate accurate, context-aware responses.

For a practical example of performing RAG on PDFs using the Responses API's file search feature, refer to this notebook.

This example showcases the flexibility of the Responses API, illustrating that beyond the internal file_search tool-which connects to an internal vector store-there is also the capability to easily connect to external vector databases. This allows for the implementation of a RAG approach in conjunction with hosted tooling, providing a versatile solution for various retrieval and generation tasks.

In this example we use a sample medical reasoning dataset from Hugging Face. We convert the dataset into a Pandas DataFrame and merge the “Question” and “Response” columns into a single string. This merged text is used for embedding and later stored as metadata.

Create a Pinecone Index Based on the Dataset

Use the dataset itself to determine the embedding dimensionality. For example, compute one embedding from the merged column and then create the index accordingly.

Upsert the Dataset into Pinecone index

Query the Pinecone Index

Create a natural language query, compute its embedding, and perform a similarity search on the Pinecone index. The returned results include metadata that provides context for generating answers.

Generate a Response Using the Retrieved Context

Select the best matching result from your query results and use the OpenAI Responses API to generate a final answer by combining the retrieved context with the original question.

Orchestrate Multi-Tool Calls

Now, we'll define the built-in function available through the Responses API, including the ability to invoke the external Vector Store - Pinecone as an example.

Web Search Preview Tool: Enables the model to perform live web searches and preview the results. This is ideal for retrieving real-time or up-to-date information from the internet.

Pinecone Search Tool: Allows the model to query a vector database using semantic search. This is especially useful for retrieving relevant documents-such as medical literature or other domain-specific content-that have been stored in a vectorized format.

As shown above, depending on the query, appropriate tool is invoked in order to determine the optimal response.

For instance, looking at the third example, when the model triggers the tool named "PineconeSearchDocuments", the code calls query_pinecone_index with the current query and then extracts the best match (or an appropriate context) as the result. For non health related inqueries or queries where explicit internet search is asked, the code calls the web_search_call function and for other queries, it may choose to not call any tool and rather provide a response based on the question under consideration.

Finally, the tool call and its output are appended to the conversation, and the final answer is generated by the Responses API.

Multi-tool orchestration flow

Now let us try to modify the input query and the system instructions to the responses API in order to follow a tool calling sequence and generate the output.

Here, we have seen how to utilize OpenAI's Responses API to implement a Retrieval-Augmented Generation (RAG) approach with multi-tool calling capabilities. It showcases an example where the model selects the appropriate tool based on the input query: general questions may be handled by built-in tools such as web-search, while specific medical inquiries related to internal knowledge are addressed by retrieving context from a vector database (such as Pinecone) via function calls. Additonally, we have showcased how multiple tool calls can be sequentially combined to generate a final response based on our instructions provided to responses API.

As you continue to experiment and build upon these concepts, consider exploring additional resources and examples to further enhance your understanding and applications

Happy coding!

Step 1: Create a Pinecone Index Based on the Dataset

Use the dataset itself to determine the embedding dimensionality. For example, compute one embedding from the merged column and then create the index accordingly.

Step 2: Upsert the Dataset into Pinecone index

Process the dataset in batches, generate embeddings for each merged text, prepare metadata (including separate Question and Answer fields), and upsert each batch into the index. You may also update metadata for specific entries if needed.

Step 3: Query the Pinecone Index

Create a natural language query, compute its embedding, and perform a similarity search on the Pinecone index. The returned results include metadata that provides context for generating answers.

Step 4: Generate a Response Using the Retrieved Context

Select the best matching result from your query results and use the OpenAI Responses API to generate a final answer by combining the retrieved context with the original question.

Step 5: Orchestrate Multi-Tool Calls

Define the built-in function available through the Responses API, including the ability to invoke the external Vector Store - Pinecone. Configure Web Search Preview Tool for live web searches and Pinecone Search Tool for querying the vector database using semantic search.

Step 6: Implement Multi-tool orchestration flow

Modify the input query and system instructions to the Responses API to follow a tool calling sequence. The model selects the appropriate tool based on the input query: general questions use web-search, medical inquiries use Pinecone retrieval, and other queries may not require tool calls.

Discussion

Orchestrate Multi-Tool Responses with RAG

What it gets done

Add it to your toolbox

Steps in the chain

Responses Api Tool Orchestration

What it does

How it connects

Multi-Tool Orchestration with RAG approach using OpenAI's Responses API

Create a Pinecone Index Based on the Dataset

Upsert the Dataset into Pinecone index

Query the Pinecone Index

Generate a Response Using the Retrieved Context

Orchestrate Multi-Tool Calls

Multi-tool orchestration flow

Step 1: Create a Pinecone Index Based on the Dataset

Step 2: Upsert the Dataset into Pinecone index

Step 3: Query the Pinecone Index

Step 4: Generate a Response Using the Retrieved Context

Step 5: Orchestrate Multi-Tool Calls

Step 6: Implement Multi-tool orchestration flow

Questions & comments · 0