What are the three main components in this semantic search workflow?

OpenAI's `text-embedding-3-small` model converts text into vector representations, LangChain's Oracle vector store integration provides a Python interface for writing and querying vectors, and Oracle AI Database Vector Search stores the embeddings and performs similarity searches.

Do I need to provision an Oracle database, or can I use an existing one?

The notebook does not provision a database for you—it only connects to an existing Oracle AI Database endpoint you already have, such as a local Oracle Database Free container, Oracle Autonomous Database, or an internal instance.

What similarity metric does this workflow use for vector search?

The demo uses cosine distance as the similarity metric to find the nearest stored documents.

How many results does the retrieval step return by default?

The retrieval uses `similarity_search(..., k=3)`, which returns the 3 nearest documents as LangChain `Document` objects.

Prompt Chain

Build Semantic Search with Oracle AI Database

Name: Building Semantic Search with OpenAI Embeddings, LangChain, and Oracle AI Database
Availability: OnlineOnly
Author: OpenAI Cookbook

Cookbook building semantic search with OpenAI embeddings, LangChain's Oracle vector store, and Oracle AI Database Vector Search.

Copy chain

Works with openai langchainoracle

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated last month

Version 1.0.0

Models

gpt 4o

Add to Favorites

Why it matters

Integrate OpenAI embeddings and LangChain with Oracle AI Database to enable powerful semantic search capabilities directly within your existing Oracle data infrastructure.

Outcomes

What it gets done

Embed text documents using OpenAI's models.

Store embeddings and metadata in Oracle AI Database.

Query Oracle AI Database for semantically similar documents.

Retrieve relevant text for RAG or application search.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-oraclevectorsearchlangchain | bash

Steps

Steps in the chain

Load credentials from environment variables

Connect to Oracle AI Database

Initialize embeddings and Oracle vector store

Insert rerunnable sample data

Run semantic similarity search

Inspect returned documents

Overview

Building Semantic Search with OpenAI Embeddings, LangChain, and Oracle AI Database

Cookbook building a semantic search workflow with OpenAI embeddings for vectorization, LangChain's Oracle vector store integration for the Python interface, and Oracle AI Database Vector Search for storing embeddings alongside relational/application data. Covers connecting to Oracle, inserting rerunnable sample data idempotently, and running similarity_search queries. Use it when building RAG or internal semantic search on data that already lives in Oracle, to add vector retrieval without standing up a separate vector database. Requires an OpenAI API key and an Oracle AI Database instance with Vector Search enabled.

What it does

This cookbook builds a semantic search workflow on top of three explicitly separated roles: OpenAI's text-embedding-3-small turns text into vector representations, LangChain's Oracle vector store integration provides a familiar Python interface for writing and querying those vectors, and Oracle AI Database Vector Search stores the embeddings alongside relational/application data and performs the similarity search itself. The architecture is a standard five-step RAG-retrieval flow: embed documents, write text/metadata/vectors into an Oracle table via LangChain, embed a natural-language query with the same model, retrieve the nearest stored documents via Oracle's Vector Search, and hand the retrieved text to a generation model in a larger application.

When to use - and when NOT to

Use this pattern for retrieval-augmented generation, internal semantic search, or any application where the source data already lives in Oracle - it adds semantic retrieval without introducing a separate vector database, while keeping LangChain's retrieval interface available for the rest of the application. It assumes access to an Oracle AI Database endpoint with Vector Search enabled; the notebook does not provision that database for you, only connects to one you already have (local Oracle Database Free container, Oracle Autonomous Database, or an existing internal instance).

Inputs and outputs

Requirements: Python 3.10+, an OpenAI API key with embeddings access, an Oracle AI Database with Vector Search enabled, and the packages langchain, langchain-openai, langchain-oracledb, oracledb, python-dotenv, numpy. Credentials are read from environment variables or a local .env file (OPENAI_API_KEY, ORACLE_USER, ORACLE_PASSWORD, ORACLE_DSN, e.g. localhost:1521/FREEPDB1 for a local Oracle Database Free container). Oracle's Python driver runs in Thin mode by default (no separate Oracle client libraries needed) and can use Thick mode if Oracle Instant Client is installed. The demo uses one table, LANGCHAIN_DEMO_VECTORS (columns ID, TEXT, METADATA, EMBEDDING), with cosine distance as the similarity metric; an incompatible pre-existing demo table is dropped and recreated, and sample rows are cleared before each insert so repeated notebook runs stay idempotent rather than accumulating duplicates or shifting top-k results. Retrieval itself is similarity_search(..., k=3), returning LangChain Document objects whose page_content (and, in larger workflows, metadata for source IDs, URLs, or sections) can be inspected or passed downstream.

Integrations

OpenAI's role is deliberately scoped to embedding generation only - the cookbook is explicit that OpenAI does not provide a separate managed vector search feature, Oracle does. LangChain sits as the abstraction layer between the two, so the same vector store can be reused across additional queries without reinitializing the embedding model, and the same pattern extends naturally into a full RAG workflow (passing retrieved documents to a generation model) or into a broader application search layer where Oracle remains the source of truth for both relational and vector data.

Who it's for

Developers already using Oracle for application or relational data who want to add semantic/vector search without adopting a separate vector database, using OpenAI embeddings and LangChain's familiar retrieval API as the glue.

Source README

Building Semantic Search with OpenAI Embeddings, LangChain, and Oracle AI Database

This cookbook shows how to build a semantic search workflow using:

OpenAI embeddings to turn text into vector representations
LangChain's Oracle vector store integration to write and query vectors through a familiar Python interface
Oracle AI Database Vector Search to store embeddings alongside relational/application data

OpenAI is used for embedding generation. LangChain provides the vector store abstraction, and Oracle AI Database provides vector storage and similarity search. This keeps the OpenAI role explicit without implying that OpenAI provides a separate managed vector search feature.

This pattern is useful for retrieval-augmented generation (RAG), internal semantic search, and applications where source data already lives in Oracle. It lets you add semantic retrieval without introducing a separate vector database, while still keeping the LangChain retrieval interface available for larger application workflows.

Architecture at a glance

Text documents are embedded with text-embedding-3-small.
LangChain writes the text, metadata, and vectors into an Oracle AI Database table.
A natural-language query is embedded with the same OpenAI model.
Oracle AI Database Vector Search returns the nearest stored documents.
The retrieved text can be passed to a generation model in a larger RAG application.

What this notebook demonstrates

Load credentials from environment variables or a local .env file.
Connect to Oracle AI Database with the Python oracledb driver.
Initialize OpenAI embeddings and LangChain's Oracle vector store.
Insert a small rerunnable sample dataset without accumulating duplicates.
Run semantic similarity search with similarity_search(..., k=3).

Requirements

Python 3.10+
OpenAI API key with embeddings access
Oracle AI Database with Vector Search enabled
Python packages: langchain, langchain-openai, langchain-oracledb, oracledb, python-dotenv, numpy

Install dependencies

If these packages are already installed in your environment, this cell will not make any changes.

Configure credentials and Oracle connectivity

The notebook reads configuration from environment variables. You can set them in your shell, in your notebook environment, or in a local .env file next to this notebook:

OPENAI_API_KEY=your-openai-api-key
ORACLE_USER=your-oracle-user
ORACLE_PASSWORD=your-oracle-password
ORACLE_DSN=host:port/service_name

For local experiments, one common option is an Oracle Database Free container with port 1521 exposed and a pluggable database service such as FREEPDB1, making ORACLE_DSN look like localhost:1521/FREEPDB1. You can also use an Oracle Autonomous Database or another Oracle AI Database instance that has Vector Search enabled.

Oracle's Python driver can run in Thin mode, which needs no separate Oracle client libraries. If Oracle Instant Client is installed, the same driver can optionally use Thick mode for compatibility with some advanced client features. The setup cell below tries to enable Thick mode when available and otherwise continues in Thin mode.

Oracle setup options

This notebook assumes you already have access to an Oracle AI Database endpoint. Any of these setup paths can work:

Local development: Oracle Database Free in a container or local install, with a service name such as FREEPDB1.
Managed cloud: Oracle Autonomous Database with Vector Search enabled.
Existing environment: an internal Oracle AI Database instance provided by your team.

The notebook only needs a SQL connection string through ORACLE_DSN. It does not create a database instance for you.

Connect to Oracle AI Database

This connection is used by LangChain's Oracle vector store to create or reuse the demo table, insert embeddings, and run similarity search.

Choose a demo table and distance metric

The demo uses one table, LANGCHAIN_DEMO_VECTORS, and cosine distance. Cosine distance is a common default for text embeddings because it compares vector direction rather than raw magnitude.

Initialize embeddings and the Oracle vector store

We initialize the OpenAI embedding model close to the vector store setup, because LangChain uses the embedding function when documents are inserted and when natural language queries are searched.

If an incompatible demo table already exists from an earlier version of the notebook, it is dropped so LangChain can recreate it with the columns it expects: ID, TEXT, METADATA, and EMBEDDING.

Insert rerunnable sample data

The sample data is intentionally small so the retrieval behavior is easy to inspect. Before inserting, the cell clears the demo table. This keeps the notebook idempotent: repeated runs do not accumulate duplicate rows or change retrieval results.

The DELETE FROM in the next cell is intentional. It resets only the demo table's sample rows before insertion, making repeated notebook runs deterministic. Without that reset, rerunning the notebook would append the same documents again and could change the top-k retrieval results.

Inspect embedding and table details

These checks are optional and are guarded so the notebook can continue gracefully if embedding quota is unavailable. The table inspection does not call the OpenAI API.

Run semantic similarity search

LangChain embeds the natural language query and searches the Oracle vector store for the closest stored documents. The example uses similarity_search(..., k=3), which returns the top matching documents.

Inspect returned documents

LangChain returns Document objects. In this simple example the important field is page_content, but in larger workflows metadata can be used to track source IDs, URLs, document sections, or other application-specific attributes.

Try another query

The same vector store can be reused for additional questions without reinitializing the embedding model.

Conclusion

This notebook implemented semantic vector search with OpenAI embeddings, LangChain, and Oracle AI Database. OpenAI converts text and queries into embeddings, LangChain provides the vector store API, and Oracle AI Database stores and searches the vectors alongside application data.

The same pattern can be extended into a RAG workflow by passing retrieved documents to a generation model, or into an application search layer where Oracle remains the source of truth for both relational and vector data.

FAQ

Common questions

Discussion

Build Semantic Search with Oracle AI Database

What it gets done

Add it to your toolbox

Steps in the chain

Building Semantic Search with OpenAI Embeddings, LangChain, and Oracle AI Database

What it does

When to use - and when NOT to

Inputs and outputs

Integrations

Who it's for

Building Semantic Search with OpenAI Embeddings, LangChain, and Oracle AI Database

Architecture at a glance

What this notebook demonstrates

Requirements

Install dependencies

Configure credentials and Oracle connectivity

Oracle setup options

Connect to Oracle AI Database

Choose a demo table and distance metric

Initialize embeddings and the Oracle vector store

Insert rerunnable sample data

Inspect embedding and table details

Run semantic similarity search

Inspect returned documents

Try another query

Conclusion

Common questions

Questions & comments · 0