Prompt Chain

Search Code with Embeddings

Implement semantic code search with Ada embeddings to find functions in Python codebases using natural language queries.

Works with githubopenai

91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0
Models

Add to Favorites

Why it matters

Leverage AI embeddings to perform semantic code search within your codebase. Find relevant functions and code snippets using natural language queries.

Outcomes

What it gets done

01

Parse and extract functions from Python files.

02

Generate vector embeddings for code snippets.

03

Index code embeddings for efficient searching.

04

Query code embeddings using natural language descriptions.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-codesearchusingembeddings | bash

Steps

Steps in the chain

01
Embed query string with text-embedding-3-small

We first embed our query string (code_query) with `text-embedding-3-small`. The reasoning here is that a query string like 'a function that reverses a string' and a function like 'def reverse(string): return string[::-1]' will be very similar when embedded.

02
Calculate cosine similarity between embeddings

We then calculate the cosine similarity between our query string embedding and all data points in our database. This gives a distance between each point and our query.

03
Sort results by distance and return

We finally sort all of our data points by their distance to our query string and return the number of results requested in the function parameters.

Overview

Data Loading

What it does

This prompt chain enables semantic code search within Python codebases. It parses code, extracts functions, generates vector embeddings using the `text-embedding-3-small` model, and indexes them. This allows users to query for functions using natural English descriptions.

How it connects

Use this when you need to efficiently locate specific functions in a Python project without knowing their exact names. It's ideal for navigating codebases like `openai-python` by describing the desired functionality in plain English.

Source README

Code search using embeddings

This notebook shows how Ada embeddings can be used to implement semantic code search. For this demonstration, we use our own openai-python code repository. We implement a simple version of file parsing and extracting of functions from python files, which can be embedded, indexed, and queried.

Helper Functions

We first setup some simple parsing functions that allow us to extract important information from our codebase.

Data Loading

We'll first load the openai-python folder and extract the needed information using the functions we defined above.

Now that we have our content, we can pass the data to the text-embedding-3-small model and get back our vector embeddings.

Testing

Let's test our endpoint with some simple queries. If you're familiar with the openai-python repository, you'll see that we're able to easily find functions we're looking for only a simple English description.

We define a search_functions method that takes our data that contains our embeddings, a query string, and some other configuration options. The process of searching our database works like such:

  1. We first embed our query string (code_query) with text-embedding-3-small. The reasoning here is that a query string like 'a function that reverses a string' and a function like 'def reverse(string): return string[::-1]' will be very similar when embedded.
  2. We then calculate the cosine similarity between our query string embedding and all data points in our database. This gives a distance between each point and our query.
  3. We finally sort all of our data points by their distance to our query string and return the number of results requested in the function parameters.

Step 1: Embed query string with text-embedding-3-small

We first embed our query string (code_query) with `text-embedding-3-small`. The reasoning here is that a query string like 'a function that reverses a string' and a function like 'def reverse(string): return string[::-1]' will be very similar when embedded.

Step 2: Calculate cosine similarity between embeddings

We then calculate the cosine similarity between our query string embedding and all data points in our database. This gives a distance between each point and our query.

Step 3: Sort results by distance and return

We finally sort all of our data points by their distance to our query string and return the number of results requested in the function parameters.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.