Query & move data

Build Semantic Search with Oracle AI Database

Leverage OpenAI embeddings and LangChain to implement semantic search on data stored in Oracle AI Database, enhancing retrieval for RAG and applications.

Without it

Piece it together by hand, every time.

With it

Integrate OpenAI embeddings and LangChain with Oracle AI Database to enable powerful semantic search capabilities directly within your existing Oracle data infrastructure.

What you get

  • Embed text documents using OpenAI's models.
  • Store embeddings and metadata in Oracle AI Database.
  • Query Oracle AI Database for semantically similar documents.
  • Retrieve relevant text for RAG or application search.

Use this prompt chain

OpenAI Cookbook RAG indexQuery a databaseExtractSummarize

Building Semantic Search with OpenAI Embeddings, LangChain, and Oracle AI Database

This cookbook shows how to build a semantic search workflow using:

  • OpenAI embeddings to turn text into vector representations
  • LangChain's Oracle vector store integration to write and query vectors through a familiar Python interface
  • Oracle AI Database Vector Search to store embeddings alongside relational/application data

OpenAI is used for embedding generation. LangChain provides the vector store abstraction, and Oracle AI Database provides vector storage and similarity search. This keeps the OpenAI role explicit without implying that OpenAI provides a separate managed vector search feature.

This pattern is useful for retrieval-augmented generation (RAG), internal semantic search, and applications where source data already lives in Oracle. It lets you add semantic retrieval without introducing a separate vector database, while still keeping the LangChain retrieval interface available for larger application workflows.

Architecture at a glance

  1. Text documents are embedded with text-embedding-3-small.
  2. LangChain writes the text, metadata, and vectors into an Oracle AI Database table.
  3. A natural-language query is embedded with the same OpenAI model.
  4. Oracle AI Database Vector Search returns the nearest stored documents.
  5. The retrieved text can be passed to a generation model in a larger RAG application.

What this notebook demonstrates

  1. Load credentials from environment variables or a local .env file.
  2. Connect to Oracle AI Database with the Python oracledb driver.
  3. Initialize OpenAI embeddings and LangChain's Oracle vector store.
  4. Insert a small rerunnable sample dataset without accumulating duplicates.
  5. Run semantic similarity search with similarity_search(..., k=3).

Requirements

  • Python 3.10+
  • OpenAI API key with embeddings access
  • Oracle AI Database with Vector Search enabled
  • Python packages: langchain, langchain-openai, langchain-oracledb, oracledb, python-dotenv, numpy

Install dependencies

If these packages are already installed in your environment, this cell will not make any changes.

Configure credentials and Oracle connectivity

The notebook reads configuration from environment variables. You can set them in your shell, in your notebook environment, or in a local .env file next to this notebook:

OPENAI_API_KEY=your-openai-api-key
ORACLE_USER=your-oracle-user
ORACLE_PASSWORD=your-oracle-password
ORACLE_DSN=host:port/service_name

For local experiments, one common option is an Oracle Database Free container with port 1521 exposed and a pluggable database service such as FREEPDB1, making ORACLE_DSN look like localhost:1521/FREEPDB1. You can also use an Oracle Autonomous Database or another Oracle AI Database instance that has Vector Search enabled.

Oracle's Python driver can run in Thin mode, which needs no separate Oracle client libraries. If Oracle Instant Client is installed, the same driver can optionally use Thick mode for compatibility with some advanced client features. The setup cell below tries to enable Thick mode when available and otherwise continues in Thin mode.

Oracle setup options

This notebook assumes you already have access to an Oracle AI Database endpoint. Any of these setup paths can work:

  • Local development: Oracle Database Free in a container or local install, with a service name such as FREEPDB1.
  • Managed cloud: Oracle Autonomous Database with Vector Search enabled.
  • Existing environment: an internal Oracle AI Database instance provided by your team.

The notebook only needs a SQL connection string through ORACLE_DSN. It does not create a database instance for you.

Connect to Oracle AI Database

This connection is used by LangChain's Oracle vector store to create or reuse the demo table, insert embeddings, and run similarity search.

Choose a demo table and distance metric

The demo uses one table, LANGCHAIN_DEMO_VECTORS, and cosine distance. Cosine distance is a common default for text embeddings because it compares vector direction rather than raw magnitude.

Initialize embeddings and the Oracle vector store

We initialize the OpenAI embedding model close to the vector store setup, because LangChain uses the embedding function when documents are inserted and when natural language queries are searched.

If an incompatible demo table already exists from an earlier version of the notebook, it is dropped so LangChain can recreate it with the columns it expects: ID, TEXT, METADATA, and EMBEDDING.

Insert rerunnable sample data

The sample data is intentionally small so the retrieval behavior is easy to inspect. Before inserting, the cell clears the demo table. This keeps the notebook idempotent: repeated runs do not accumulate duplicate rows or change retrieval results.

The DELETE FROM in the next cell is intentional. It resets only the demo table's sample rows before insertion, making repeated notebook runs deterministic. Without that reset, rerunning the notebook would append the same documents again and could change the top-k retrieval results.

Inspect embedding and table details

These checks are optional and are guarded so the notebook can continue gracefully if embedding quota is unavailable. The table inspection does not call the OpenAI API.

Run semantic similarity search

LangChain embeds the natural language query and searches the Oracle vector store for the closest stored documents. The example uses similarity_search(..., k=3), which returns the top matching documents.

Inspect returned documents

LangChain returns Document objects. In this simple example the important field is page_content, but in larger workflows metadata can be used to track source IDs, URLs, document sections, or other application-specific attributes.

Try another query

The same vector store can be reused for additional questions without reinitializing the embedding model.

Conclusion

This notebook implemented semantic vector search with OpenAI embeddings, LangChain, and Oracle AI Database. OpenAI converts text and queries into embeddings, LangChain provides the vector store API, and Oracle AI Database stores and searches the vectors alongside application data.

The same pattern can be extended into a RAG workflow by passing retrieved documents to a generation model, or into an application search layer where Oracle remains the source of truth for both relational and vector data.

Comments (0)

Sign In Sign in to leave a comment.