Prompt Chain

Structure and Navigate Codebases

LlamaIndex pack that splits long code files into hierarchical chunks by scope (function, class, method) and creates an agent to navigate the code tree

Works with openai

91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0
Models

Add to Favorites

Why it matters

Deconstruct large code files into a navigable hierarchy, enabling LLMs to efficiently understand and query complex code structures. This pack creates a tree-like representation of code, linking sections and allowing for targeted information retrieval.

Outcomes

What it gets done

01

Split long code files into manageable, hierarchical chunks.

02

Create an agent to navigate the code hierarchy.

03

Generate a map of repository structure and contents.

04

Enable LLMs to reference specific files or directories within a codebase.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-pack-packs-code-hierarchy | bash

Steps

Steps in the chain

01
Install CodeHierarchyAgentPack

pip install llama-index-packs-code-hierarchy

02
Download source code

llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack

03
Load and parse code documents

Use SimpleDirectoryReader to load code files and CodeHierarchyNodeParser with CodeSplitter to parse them into nodes. Configure language, max_chars, and chunk_lines parameters as needed.

04
Create CodeHierarchyAgentPack

Initialize CodeHierarchyAgentPack with split_nodes from the parser and an LLM instance (e.g., OpenAI gpt-4).

05
Run queries on code hierarchy

Call pack.run() with natural language queries to navigate and understand the code structure, such as asking about specific functions or implementation details.

06
Create CodeHierarchyKeywordQueryEngine

Initialize CodeHierarchyKeywordQueryEngine with the split nodes to generate a map of the repository's structure and contents.

07
Create QueryEngineTool for agent

Use QueryEngineTool.from_defaults() to wrap the CodeHierarchyKeywordQueryEngine as a tool with name 'code_lookup' and appropriate description.

08
Initialize FunctionAgent with tool

Create a FunctionAgent with the QueryEngineTool, system_prompt from query_engine.get_tool_instructions(), and an LLM instance.

09
Add new language support

Edit _DEFAULT_SIGNATURE_IDENTIFIERS in code_hierarchy.py to add a new language. Follow the docstring guidelines for proper configuration and test the new language implementation.

Overview

CodeHierarchyAgentPack

What it does

code hierarchy navigation pack

How it connects

splitting long code files into scope-based hierarchical chunks for LLM navigation

Source README

CodeHierarchyAgentPack

### install
pip install llama-index-packs-code-hierarchy

### download source code
llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack

The CodeHierarchyAgentPack is useful to split long code files into more reasonable chunks, while creating an agent on top to navigate the code. What this will do is create a "Hierarchy" of sorts, where sections of the code are made more reasonable by replacing the scope body with short comments telling the LLM to search for a referenced node if it wants to read that context body.

Nodes in this hierarchy will be split based on scope, like function, class, or method scope, and will have links to their children and parents so the LLM can traverse the tree.

from llama_index.core.text_splitter import CodeSplitter
from llama_index.llms.openai import OpenAI
from llama_index.packs.code_hierarchy import (
    CodeHierarchyAgentPack,
    CodeHierarchyNodeParser,
)

llm = OpenAI(model="gpt-4", temperature=0.2)

documents = SimpleDirectoryReader(
    input_files=[
        Path("../llama_index/packs/code_hierarchy/code_hierarchy.py")
    ],
    file_metadata=lambda x: {"filepath": x},
).load_data()

split_nodes = CodeHierarchyNodeParser(
    language="python",
    # You can further parameterize the CodeSplitter to split the code
    # into "chunks" that match your context window size using
    # chunck_lines and max_chars parameters, here we just use the defaults
    code_splitter=CodeSplitter(
        language="python", max_chars=1000, chunk_lines=10
    ),
).get_nodes_from_documents(documents)

pack = CodeHierarchyAgentPack(split_nodes=split_nodes, llm=llm)

pack.run(
    "How does the get_code_hierarchy_from_nodes function from the code hierarchy node parser work? Provide specific implementation details."
)

A full example can be found [here in combination with `](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-code-hierarchy/examples/CodeHierarchyNodeParserUsage.ipynb).

Repo Maps

The pack contains a CodeHierarchyKeywordQueryEngine that uses a CodeHierarchyNodeParser to generate a map of a repository's structure and contents. This is useful for the LLM to understand the structure of a codebase, and to be able to reference specific files or directories.

For example:

  • code_hierarchy
    • _SignatureCaptureType
    • _SignatureCaptureOptions
    • _ScopeMethod
    • _CommentOptions
    • _ScopeItem
    • _ChunkNodeOutput
    • CodeHierarchyNodeParser
      • class_name
      • init
      • _get_node_name
        • recur
      • _get_node_signature
        • find_start
        • find_end
      • _chunk_node
      • get_code_hierarchy_from_nodes
        • get_subdict
        • recur_inclusive_scope
        • dict_to_markdown
      • _parse_nodes
      • _get_indentation
      • _get_comment_text
      • _create_comment_line
      • _get_replacement_text
      • _skeletonize
      • _skeletonize_list
        • recur

Usage as a Tool with an Agent

You can create a tool for any agent using the nodes from the node parser:

from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import QueryEngineTool
from llama_index.packs.code_hierarchy import CodeHierarchyKeywordQueryEngine

query_engine = CodeHierarchyKeywordQueryEngine(
    nodes=split_nodes,
)

tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="code_lookup",
    description="Useful for looking up information about the code hierarchy codebase.",
)

agent = FunctionAgent(
    tools=[tool],
    system_prompt=query_engine.get_tool_instructions(),
    llm=OpenAI(model="gpt-4.1"),
)

Adding new languages

To add a new language you need to edit _DEFAULT_SIGNATURE_IDENTIFIERS in code_hierarchy.py.

The docstrings are infomative as how you ought to do this and its nuances, it should work for most languages.

Please test your new language by

Step 1: Install CodeHierarchyAgentPack

pip install llama-index-packs-code-hierarchy

Step 2: Download source code

llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack

Step 3: Load and parse code documents

Use SimpleDirectoryReader to load code files and CodeHierarchyNodeParser with CodeSplitter to parse them into nodes. Configure language, max_chars, and chunk_lines parameters as needed.

Step 4: Create CodeHierarchyAgentPack

Initialize CodeHierarchyAgentPack with split_nodes from the parser and an LLM instance (e.g., OpenAI gpt-4).

Step 5: Run queries on code hierarchy

Call pack.run() with natural language queries to navigate and understand the code structure, such as asking about specific functions or implementation details.

Step 6: Create CodeHierarchyKeywordQueryEngine

Initialize CodeHierarchyKeywordQueryEngine with the split nodes to generate a map of the repository's structure and contents.

Step 7: Create QueryEngineTool for agent

Use QueryEngineTool.from_defaults() to wrap the CodeHierarchyKeywordQueryEngine as a tool with name 'code_lookup' and appropriate description.

Step 8: Initialize FunctionAgent with tool

Create a FunctionAgent with the QueryEngineTool, system_prompt from query_engine.get_tool_instructions(), and an LLM instance.

Step 9: Add new language support

Edit _DEFAULT_SIGNATURE_IDENTIFIERS in code_hierarchy.py to add a new language. Follow the docstring guidelines for proper configuration and test the new language implementation.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.