Structure and Navigate Codebases
LlamaIndex pack that splits long code files into hierarchical chunks by scope (function, class, method) and creates an agent to navigate the code tree
Why it matters
Deconstruct large code files into a navigable hierarchy, enabling LLMs to efficiently understand and query complex code structures. This pack creates a tree-like representation of code, linking sections and allowing for targeted information retrieval.
Outcomes
What it gets done
Split long code files into manageable, hierarchical chunks.
Create an agent to navigate the code hierarchy.
Generate a map of repository structure and contents.
Enable LLMs to reference specific files or directories within a codebase.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/li-pack-packs-code-hierarchy | bash Steps
Steps in the chain
pip install llama-index-packs-code-hierarchy
llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack
Use SimpleDirectoryReader to load code files and CodeHierarchyNodeParser with CodeSplitter to parse them into nodes. Configure language, max_chars, and chunk_lines parameters as needed.
Initialize CodeHierarchyAgentPack with split_nodes from the parser and an LLM instance (e.g., OpenAI gpt-4).
Call pack.run() with natural language queries to navigate and understand the code structure, such as asking about specific functions or implementation details.
Initialize CodeHierarchyKeywordQueryEngine with the split nodes to generate a map of the repository's structure and contents.
Use QueryEngineTool.from_defaults() to wrap the CodeHierarchyKeywordQueryEngine as a tool with name 'code_lookup' and appropriate description.
Create a FunctionAgent with the QueryEngineTool, system_prompt from query_engine.get_tool_instructions(), and an LLM instance.
Edit _DEFAULT_SIGNATURE_IDENTIFIERS in code_hierarchy.py to add a new language. Follow the docstring guidelines for proper configuration and test the new language implementation.
Overview
CodeHierarchyAgentPack
What it does
code hierarchy navigation pack
How it connects
splitting long code files into scope-based hierarchical chunks for LLM navigation
Source README
CodeHierarchyAgentPack
### install
pip install llama-index-packs-code-hierarchy
### download source code
llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack
The CodeHierarchyAgentPack is useful to split long code files into more reasonable chunks, while creating an agent on top to navigate the code. What this will do is create a "Hierarchy" of sorts, where sections of the code are made more reasonable by replacing the scope body with short comments telling the LLM to search for a referenced node if it wants to read that context body.
Nodes in this hierarchy will be split based on scope, like function, class, or method scope, and will have links to their children and parents so the LLM can traverse the tree.
from llama_index.core.text_splitter import CodeSplitter
from llama_index.llms.openai import OpenAI
from llama_index.packs.code_hierarchy import (
CodeHierarchyAgentPack,
CodeHierarchyNodeParser,
)
llm = OpenAI(model="gpt-4", temperature=0.2)
documents = SimpleDirectoryReader(
input_files=[
Path("../llama_index/packs/code_hierarchy/code_hierarchy.py")
],
file_metadata=lambda x: {"filepath": x},
).load_data()
split_nodes = CodeHierarchyNodeParser(
language="python",
# You can further parameterize the CodeSplitter to split the code
# into "chunks" that match your context window size using
# chunck_lines and max_chars parameters, here we just use the defaults
code_splitter=CodeSplitter(
language="python", max_chars=1000, chunk_lines=10
),
).get_nodes_from_documents(documents)
pack = CodeHierarchyAgentPack(split_nodes=split_nodes, llm=llm)
pack.run(
"How does the get_code_hierarchy_from_nodes function from the code hierarchy node parser work? Provide specific implementation details."
)
A full example can be found [here in combination with `](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-code-hierarchy/examples/CodeHierarchyNodeParserUsage.ipynb).
Repo Maps
The pack contains a CodeHierarchyKeywordQueryEngine that uses a CodeHierarchyNodeParser to generate a map of a repository's structure and contents. This is useful for the LLM to understand the structure of a codebase, and to be able to reference specific files or directories.
For example:
- code_hierarchy
- _SignatureCaptureType
- _SignatureCaptureOptions
- _ScopeMethod
- _CommentOptions
- _ScopeItem
- _ChunkNodeOutput
- CodeHierarchyNodeParser
- class_name
- init
- _get_node_name
- recur
- _get_node_signature
- find_start
- find_end
- _chunk_node
- get_code_hierarchy_from_nodes
- get_subdict
- recur_inclusive_scope
- dict_to_markdown
- _parse_nodes
- _get_indentation
- _get_comment_text
- _create_comment_line
- _get_replacement_text
- _skeletonize
- _skeletonize_list
- recur
Usage as a Tool with an Agent
You can create a tool for any agent using the nodes from the node parser:
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import QueryEngineTool
from llama_index.packs.code_hierarchy import CodeHierarchyKeywordQueryEngine
query_engine = CodeHierarchyKeywordQueryEngine(
nodes=split_nodes,
)
tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="code_lookup",
description="Useful for looking up information about the code hierarchy codebase.",
)
agent = FunctionAgent(
tools=[tool],
system_prompt=query_engine.get_tool_instructions(),
llm=OpenAI(model="gpt-4.1"),
)
Adding new languages
To add a new language you need to edit _DEFAULT_SIGNATURE_IDENTIFIERS in code_hierarchy.py.
The docstrings are infomative as how you ought to do this and its nuances, it should work for most languages.
Please test your new language by
Step 1: Install CodeHierarchyAgentPack
pip install llama-index-packs-code-hierarchy
Step 2: Download source code
llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack
Step 3: Load and parse code documents
Use SimpleDirectoryReader to load code files and CodeHierarchyNodeParser with CodeSplitter to parse them into nodes. Configure language, max_chars, and chunk_lines parameters as needed.
Step 4: Create CodeHierarchyAgentPack
Initialize CodeHierarchyAgentPack with split_nodes from the parser and an LLM instance (e.g., OpenAI gpt-4).
Step 5: Run queries on code hierarchy
Call pack.run() with natural language queries to navigate and understand the code structure, such as asking about specific functions or implementation details.
Step 6: Create CodeHierarchyKeywordQueryEngine
Initialize CodeHierarchyKeywordQueryEngine with the split nodes to generate a map of the repository's structure and contents.
Step 7: Create QueryEngineTool for agent
Use QueryEngineTool.from_defaults() to wrap the CodeHierarchyKeywordQueryEngine as a tool with name 'code_lookup' and appropriate description.
Step 8: Initialize FunctionAgent with tool
Create a FunctionAgent with the QueryEngineTool, system_prompt from query_engine.get_tool_instructions(), and an LLM instance.
Step 9: Add new language support
Edit _DEFAULT_SIGNATURE_IDENTIFIERS in code_hierarchy.py to add a new language. Follow the docstring guidelines for proper configuration and test the new language implementation.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.