Research & summarize
Analyze Financial Documents with LlamaIndex
Streamline financial document analysis by indexing and querying 10-K reports with LlamaIndex for rapid insight generation and comparison.
Without it
Piece it together by hand, every time.
With it
Automate the extraction and synthesis of insights from lengthy financial documents like 10-K forms, enabling faster and more informed financial analysis.
What you get
- Load and index financial documents (e.g., 10-K forms) using LlamaIndex.
- Perform simple question-answering over indexed financial data.
- Conduct advanced compare-and-contrast analysis across multiple financial documents.
- Leverage RAG systems for efficient information retrieval and insight generation.
Use this prompt chain
Financial Document Analysis with LlamaIndex
In this example notebook, we showcase how to perform financial analysis over 10-K documents with the LlamaIndex framework with just a few lines of code.
Notebook Outline
Introduction
LLamaIndex
LlamaIndex is a data framework for LLM applications.
You can get started with just a few lines of code and build a retrieval-augmented generation (RAG) system in minutes.
For more advanced users, LlamaIndex offers a rich toolkit for ingesting and indexing your data, modules for retrieval and re-ranking, and composable components for building custom query engines.
See full documentation for more details.
Financial Analysis over 10-K documents
A key part of a financial analyst's job is to extract information and synthesize insight from long financial documents.
A great example is the 10-K form - an annual report required by the U.S. Securities and Exchange Commission (SEC), that gives a comprehensive summary of a company's financial performance.
These documents typically run hundred of pages in length, and contain domain-specific terminology that makes it challenging for a layperson to digest quickly.
We showcase how LlamaIndex can support a financial analyst in quickly extracting information and synthesize insights across multiple documents with very little coding.
Setup
To begin, we need to install the llama-index library
Now, we import all modules used in this tutorial
Before we start, we can configure the LLM provider and model that will power our RAG system.
Here, we pick gpt-3.5-turbo-instruct from OpenAI.
We construct a ServiceContext and set it as the global default, so all subsequent operations that depends on LLM calls will use the model we configured here.
Data Loading and Indexing
Now, we load and parse 2 PDFs (one for Uber 10-K in 2021 and another for Lyft 10-k in 2021).
Under the hood, the PDFs are converted to plain text Document objects, separate by page.
Note: this operation might take a while to run, since each document is more than 100 pages.
Now, we can build an (in-memory) VectorStoreIndex over the documents that we've loaded.
Note: this operation might take a while to run, since it calls OpenAI API for computing vector embedding over document chunks.
Simple QA
Now we are ready to run some queries against our indices!
To do so, we first configure a QueryEngine, which just captures a set of configurations for how we want to query the underlying index.
For a VectorStoreIndex, the most common configuration to adjust is similarity_top_k which controls how many document chunks (which we call Node objects) are retrieved to use as context for answering our question.
Let's see some queries in action!
Advanced QA - Compare and Contrast
For more complex financial analysis, one often needs to reference multiple documents.
As a example, let's take a look at how to do compare-and-contrast queries over both Lyft and Uber financials.
For this, we build a SubQuestionQueryEngine, which breaks down a complex compare-and-contrast query, into simpler sub-questions to execute on respective sub query engine backed by individual indices.
Let's see these queries in action!