Prompt Chain

Build and Query Temporal Knowledge Graphs

Name: Build and Query Temporal Knowledge Graphs
Availability: OnlineOnly
Author: OpenAI Cookbook

Build and query temporally-aware knowledge graphs for multi-hop retrieval, ensuring data accuracy and enabling complex reasoning.

Copy chain

Works with openai

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated yesterday

Version 1.0.0

Add to Favorites

Why it matters

Construct temporally-aware knowledge graphs and perform multi-hop retrieval directly over them, enabling advanced data analysis and complex query answering.

Outcomes

What it gets done

Build temporally-aware knowledge graphs by updating and validating entries as new data arrives.

Perform multi-hop retrieval by combining LLMs with structured graph queries for complex reasoning.

Decompose raw documents into time-stamped triplets for precise, time-based querying.

Iteratively traverse graph relationships to uncover complex dependencies and latent connections.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-temporalagents | bash

Steps

Steps in the chain

Creating a Temporally-Aware Knowledge Graph with a Temporal Agent

Build a pipeline that extracts entities and relations from unstructured text, resolves temporal conflicts, and keeps your graph up-to-date as new information arrives.

Multi-Step Retrieval Over a Knowledge Graph

Use structured queries and language model reasoning to chain multiple hops across your graph and answer complex questions.

Prototype to Production

Move from experimentation to deployment. This section covers architectural tips, integration patterns, and considerations for scaling reliably.

Overview

1. Executive Summary

What it does

A hands-on cookbook for engineers, architects, and analysts building temporally-aware knowledge graphs and performing multi-hop retrieval. Covers two core workflows: (1) constructing and validating time-stamped knowledge graphs as new data arrives, and (2) combining OpenAI models with structured graph queries to traverse multiple hops across entities and relationships. Includes practical decision frameworks, plug-and-play code examples, and a clear path from prototype to production with best practices for scaling and reliability.

How it connects

Use this when you need to maintain a knowledge base that captures how facts evolve over time, answer complex questions requiring reasoning across multiple linked entities, or move beyond single-hop retrieval limitations. Ideal for prototyping temporal graph systems, deploying at scale, or exploring new ways to leverage structured, time-aware data in AI applications.

Source README

1. Executive Summary

1.1. Purpose and Audience

This notebook provides a hands-on guide for building temporally-aware knowledge graphs and performing multi-hop retrieval directly over those graphs.

It's designed for engineers, architects, and analysts working on temporally-aware knowledge graphs. Whether you’re prototyping, deploying at scale, or exploring new ways to use structured data, you’ll find practical workflows, best practices, and decision frameworks to accelerate your work.

This cookbook presents two hands-on workflows you can use, extend, and deploy right away:

Temporally-aware knowledge graph (KG) construction

A key challenge in developing knowledge-driven AI systems is maintaining a database that stays current and relevant. While much attention is given to boosting retrieval accuracy with techniques like semantic similarity and re-ranking, this guide focuses on a fundamental-yet frequently overlooked-aspect: systematically updating and validating your knowledge base as new data arrives.

No matter how advanced your retrieval algorithms are, their effectiveness is limited by the quality and freshness of your database. This cookbook demonstrates how to routinely validate and update knowledge graph entries as new data arrives, helping ensure that your knowledge base remains accurate and up to date.
Multi-hop retrieval using knowledge graphs

Learn how to combine OpenAI models (such as o3, o4-mini, GPT-4.1, and GPT-4.1-mini) with structured graph queries via tool calls, enabling the model to traverse your graph in multiple steps across entities and relationships.

This method lets your system answer complex, multi-faceted questions that require reasoning over several linked facts, going well beyond what single-hop retrieval can accomplish.

Inside, you'll discover:

Practical decision frameworks for choosing models and prompting techniques at each stage
Plug-and-play code examples for easy integration into your ML and data pipelines
Links to in-depth resources on OpenAI tool use, fine-tuning, graph backend selection, and more
A clear path from prototype to production, with actionable best practices for scaling and reliability

Note: All benchmarks and recommendations are based on the best available models and practices as of June 2025. As the ecosystem evolves, periodically revisit your approach to stay current with new capabilities and improvements.

1.2. Key takeaways

Creating a Temporally-Aware Knowledge Graph with a Temporal Agent

Why make your knowledge graph temporal?

Traditional knowledge graphs treat facts as static, but real-world information evolves constantly. What was true last quarter may be outdated today, risking errors or misinformed decisions if the graph does not capture change over time. Temporal knowledge graphs allow you to precisely answer questions like “What was true on a given date?” or analyse how facts and relationships have shifted, ensuring decisions are always based on the most relevant context.
What is a Temporal Agent?

A Temporal Agent is a pipeline component that ingests raw data and produces time-stamped triplets for your knowledge graph. This enables precise time-based querying, timeline construction, trend analysis, and more.
How does the pipeline work?

The pipeline starts by semantically chunking your raw documents. These chunks are decomposed into statements ready for our Temporal Agent, which then creates time-aware triplets. An Invalidation Agent can then perform temporal validity checks, spotting and handling any statements that are invalidated by new statements that are incident on the graph.

Multi-Step Retrieval Over a Knowledge Graph

Why use multi-step retrieval?

Direct, single-hop queries frequently miss salient facts distributed across a graph's topology. Multi-step (multi-hop) retrieval enables iterative traversal, following relationships and aggregating evidence across several hops. This methodology surfaces complex dependencies and latent connections that would remain hidden with one-shot lookups, providing more comprehensive and nuanced answers to sophisticated queries.
Planners

Planners orchestrate the retrieval process. Task-orientated planners decompose queries into concrete, sequential subtasks. Hypothesis-orientated planners, by contrast, propose claims to confirm, refute, or evolve. Choosing the optimal strategy depends on where the problem lies on the spectrum from deterministic reporting (well-defined paths) to exploratory research (open-ended inference).
Tool Design Paradigms

Tool design spans a continuum: Fixed tools provide consistent, predictable outputs for specific queries (e.g., a service that always returns today’s weather for San Francisco). At the other end, Free-form tools offer broad flexibility, such as code execution or open-ended data retrieval. Semi-structured tools fall between these extremes, restricting certain actions while allowing tailored flexibility-specialized sub-agents are a typical example. Selecting the appropriate paradigm is a trade-off between control, adaptability, and complexity.
Evaluating Retrieval Systems

High-fidelity evaluation hinges on expert-curated "golden" answers, though these are costly and labor-intensive to produce. Automated judgments, such as those from LLMs or tool traces, can be quickly generated to supplement or pre-screen, but may lack the precision of human evaluation. As your system matures, transition towards leveraging real user feedback to measure and optimize retrieval quality in production.

A proven workflow: Start with synthetic tests, benchmark on your curated human-annotated "golden" dataset, and iteratively refine using live user feedback and ratings.

Prototype to Production

Keep the graph lean

Established archival policies and assign numeric relevance scores to each edge (e.g., recency x trust x query-frequency). Automate the archival or sparsification of low-value nodes and edges, ensuring only the most critical and frequently accessed facts remain for rapid retrieval.
Parallelize the ingestion pipeline

Transition from a linear document → chunk → extraction → resolution pipeline to a staged, asynchronous architecture. Assign each processing phase its own queue and dedicated worker pool. Apply clustering or network-based batching for invalidation jobs to maximize efficiency. Batch external API requests (e.g., OpenAI) and database writes wherever possible. This design increases throughput, introduces backpressure for reliability, and allows you to scale each pipeline stage independently.
Integrate Robust Production Safeguards

Enforce rigorous output validation: standardise temporal fields (e.g., ISO-8601 date formatting), constrain entity types to your controlled vocabulary, and apply lightweight model-based sanity checks for output consistency. Employ structured logging with traceable identifiers and monitor real-time quality and performance metrics in real lime to proactively detect data drift, regressions, or pipeline anomalised before they impact downstream applications.

2. How to Use This Cookbook

This cookbook is designed for flexible engagement:

Use it as a comprehensive technical guide-read from start to finish for a deep understanding of temporally-aware knowledge graph systems.
Skim for advanced concepts, methodologies, and implementation patterns if you prefer a high-level overview.
Jump into any of the three modular sections; each is self-contained and directly applicable to real-world scenarios.

Inside, you'll find:

Creating a Temporally-Aware Knowledge Graph with a Temporal Agent

Build a pipeline that extracts entities and relations from unstructured text, resolves temporal conflicts, and keeps your graph up-to-date as new information arrives.
Multi-Step Retrieval Over a Knowledge Graph

Use structured queries and language model reasoning to chain multiple hops across your graph and answer complex questions.
Prototype to Production

Move from experimentation to deployment. This section covers architectural tips, integration patterns, and considerations for scaling reliably.

2.1. Pre-requisites

Before diving into building temporal agents and knowledge graphs, let's set up your environment. Install all required dependencies with pip, and set your OpenAI API key as an environment variable. Python 3.12 or later is required.

!python -V
%pip install --upgrade pip
%pip install -qU chonkie datetime ipykernel jinja2 matplotlib networkx numpy openai plotly pydantic rapidfuzz scipy tenacity tiktoken pandas
%pip install -q "datasets<3.0"

import os

if "OPENAI_API_KEY" not in os.environ:
    import getpass
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Paste your OpenAI API key here: ")

3. Creating a Temporally-Aware Knowledge Graph with a Temporal Agent

Accurate data is the foundation of any good business decision.
OpenAI’s latest models like o3, o4-mini, and the GPT-4.1 family are enabling businesses to build state-of-the-art retrieval systems for their most important workflows. However, information evolves rapidly: facts ingested confidently yesterday may already be outdated today.

Benefits of Temporal Knowledge Base

Without the ability to track when each fact was valid, retrieval systems risk returning answers that are outdated, non-compliant, or misleading. The consequences of missing temporal context can be severe in any industry, as illustrated by the following examples.

Industry	Example question	Risk if database is not temporal
Financial Services	"How has Moody’s long‑term rating for Bank YY evolved since Feb 2023?"	Mispricing credit risk by mixing historical & current ratings
	"Who was the CFO of Retailer ZZ when the FY‑22 guidance was issued?"	Governance/insider‑trading analysis may blame the wrong executive
	"Was Fund AA sanctioned under Article BB at the time it bought Stock CC in Jan 2024?"	Compliance report could miss an infraction if rules changed later
Manufacturing / Automotive	"Which ECU firmware was deployed in model Q3 cars shipped between 2022‑05 and 2023‑03?"	Misdiagnosing field failures due to firmware drift
	"Which robot‑controller software revision ran on Assembly Line 7 during Lot 8421?"	Root‑cause analysis may blame the wrong software revision
	"What torque specification applied to steering‑column bolts in builds produced in May 2024?"	Safety recall may miss affected vehicles

While we've called out some specific examples here, this theme is true across many industries including pharmaceuticals, law, consumer goods, and more.

Looking beyond standard retrieval

A temporally-aware knowledge graph allows you to go beyond static fact lookup. It enables richer retrieval workflows such as factual Q&A grounded in time, timeline generation, change tracking, counterfactual analysis, and more. We dive into these in more detail in our retrieval section later in the cookbook.

Question types suitable for temporal knowledge bases

3.1. Introducing our Temporal Agent

A temporal agent is a specialized pipeline that converts raw, free-form statements into time-aware triplets ready for ingesting into a knowledge graph that can then be queried with the questions of the character “What was true at time T?”.

Triplets are the basic building blocks of knowledge graphs. It's a way to represent a single fact or piece of knowledge using three parts (hence, "triplet"):

Subject - the entity you are talking about
Predicate - the type of relationship or property
Object - the value or other entity that the subject is connected to

You can thinking of this like a sentence with a structure [Subject] - [Predicate] - [Object]. As a more clear example:

"London" - "isCapitalOf" - "United Kingdom"

The Temporal Agent implemented in this cookbook draws inspiration from Zep and Graphiti, while introducing tighter control over fact invalidation and a more nuanced approach to episodic typing.

3.1.1. Key enhancements introduced in this cookbook

Temporal validity extraction

Builds on Graphiti's prompt design to identify temporal spans and episodic context without requiring auxiliary reference statements.
Fact invalidation logic

Introduces bidirectionality checks and constrains comparisons by episodic type. This retains Zep's non-lossy approach while reducing unnecessary evaluations.
Temporal & episodic typing

Differentiates between Fact, Opinion, Prediction, as well as between temporal classes Static, Dynamic, Atemporal.
Multi‑event extraction

Handles compound sentences and nested date references in a single pass.

This process allows us to update our sources of truth efficiently and reliably:

Statement Invalidation in practice

Note: While the implementation in this cookbook is focused on a graph-based implementation, this approach is generalizable to other knowledge base structures e.g., pgvector-based systems.

3.1.2. The Temporal Agent Pipeline

The Temporal Agent processes incoming statements through a three-stage pipeline:

Temporal Classification

Labels each statement as Atemporal, Static, or Dynamic:
- Atemporal statements never change (e.g., “The speed of light in a vaccuum is ≈3×10⁸ m s⁻¹”).
- Static statements are valid from a point in time but do not change afterwards (e.g., "Person YY was CEO of Company XX on October 23rd 2014.").
- Dynamic statements evolve (e.g., "Person YY is CEO of Company XX.").
Temporal Event Extraction

Identifies relative or partial dates (e.g., “Tuesday”, “three months ago”) and resolves them to an absolute date using the document timestamp or fallback heuristics (e.g., default to the 1st or last of the month if only the month is known).
Temporal Validity Check

Ensures every statement includes a t_created timestamp and, when applicable, a t_expired timestamp. The agent then compares the candidate triplet to existing knowledge graph entries to:
- Detect contradictions and mark outdated entries with t_invalid
- Link newer statements to those they invalidate with invalidated_by

Temporal Agent

3.1.3. Selecting the right model for a Temporal Agent

When building systems with LLMs, it is a good practice to start with larger models then later look to optimize and shrink.

The GPT-4.1 series is particularly well-suited for building Temporal Agents due to its strong instruction-following ability. On benchmarks like Scale’s MultiChallenge, GPT-4.1 outperforms GPT-4o by $10.5%_{abs}$, demonstrating superior ability to maintain context, reason in-conversation, and adhere to instructions - key traits for extracting time-stamped triplets. These capabilities make it an excellent choice for prototyping agents that rely on time-aware data extraction.

Recommended development workflow

Prototype with GPT-4.1

Maximize correctness and reduce prompt-debug time while you build out the core pipeline logic.
Swap to GPT-4.1-mini or GPT-4.1-nano

Once prompts and logic are stable, switch to smaller variants for lower latency and cost-effective inference.
Distill onto GPT-4.1-mini or GPT-4.1-nano

Use OpenAI's Model Distillation to train smaller models with high-quality outputs from a larger 'teacher' model such as GPT-4.1, preserving (or even improving) performance relative to GPT-4.1.

Model	Relative cost	Relative latency	Intelligence	Ideal Role in Workflow
GPT-4.1	★★★	★★	★★★ (highest)	Ground-truth prototyping, generating data for distillation
GPT-4.1-mini	★★	★	★★	Balanced cost-performance, mid to large scale production systems
GPT-4.1-nano	★ (lowest)	★ (fastest)	★	Cost-sensitive and ultra-large scale bulk processing

In practice, this looks like: prototype with GPT-4.1 → measure quality → step down the ladder until the trade-offs no longer meet your needs.

3.2. Building our Temporal Agent Pipeline

Before diving into the implementation details, it's useful to understand the ingestion pipeline at a high level:

Load transcripts
Creating a Semantic Chunker
Laying the Foundations for our Temporal Agent
Statement Extraction
Temporal Range Extraction
Creating our Triplets
Temporal Events
Defining our Temporal Agent
Entity Resolution
Invalidation Agent
Building our pipeline

Architecture diagram

Temporal Agent Architecture

3.2.1. Load transcripts

For the purposes of this cookbook, we have selected the "Earnings Calls Dataset" (jlh-ibm/earnings_call) which is made available under the Creative Commons Zero v1.0 license. This dataset contains a collection of 188 earnings call transcripts originating in the period 2016-2020 in relation to the NASDAQ stock market. We believe this dataset is a good choice for this cookbook as extracting information from - and subsequently querying information from - earnings call transcripts is a common problem in many financial institutions around the world.

Moreover, the often variable character of statements and topics from the same company across multiple earnings calls provides a useful vector through which to demonstrate the temporal knowledge graph concept.

Despite this dataset's focus on the financial world, we build up the Temporal Agent in a general structure, so it will be quick to adapt to similar problems in other industries such as pharmaceuticals, law, automotive, and more.

For the purposes of this cookbook we are limiting the processing to two companies - AMD and Nvidia - though in practice this pipeline can easily be scaled to any company.

Let’s start by loading the dataset from HuggingFace.

from datasets import load_dataset

hf_dataset_name = "jlh-ibm/earnings_call"
subset_options = ["stock_prices", "transcript-sentiment", "transcripts"]

hf_dataset = load_dataset(hf_dataset_name, subset_options[2])
my_dataset = hf_dataset["train"]

my_dataset

row = my_dataset[0]
row["company"], row["date"], row["transcript"][:200]

from collections import Counter

company_counts = Counter(my_dataset["company"])
company_counts

Database Set-up

Before we get to processing this data, let’s set up our database.

For convenience within a notebook format, we've chosen SQLite as our database for this implementation. In the "Prototype to Production" section, and in Appendix section A.1 "Storing and Retrieving High-Volume Graph Data" we go into more detail of considerations around different dataset choices in a production environment.

If you are running this cookbook locally, you may chose to set memory = False to save the database to storage, the default file path my_database.db will be used to store your database or you may pass your own db_path arg into make_connection.

We will set up several tables to store the following information:

Transcripts
Chunks
Temporal Events
Triplets
Entities (including canonical mappings)

This code is abstracted behind a make_connection method which creates the new SQLite database. The details of this method can be found in the db_interface.py script in the GitHub repository for this cookbook.

from db_interface import make_connection

sqlite_conn = make_connection(memory=False, refresh=True)

3.2.2. Creating a Semantic Chunker

Before diving into buidling the Chunker class itself, we begin by defining our first data models. As is generally considered good practice when working with Python, Pydantic is used to ensure type safety and clarity in our model definitions. Pydantic provides a clean, declarative way to define data structures whilst automatically validating and parsing input data, making our data models both robust and easy to work with.

Chunk model

This is a core data model that we'll use to store individual segments of text extracted from transcripts, along with any associated metadata. As we process the transcripts by breaking them into semantically meaningful chunks, each piece will be saved as a separate Chunk.

Each Chunk contains:

id: A unique identifier automatically generated for each chunk. This helps us identify and track chunks of text throughout
text: A string field that contains the text content of the chunk
metadata: A dictionary to allow for flexible metadata storage

import uuid
from typing import Any

from pydantic import BaseModel, Field


class Chunk(BaseModel):
    """A chunk of text from an earnings call."""

    id: uuid.UUID = Field(default_factory=uuid.uuid4)
    text: str
    metadata: dict[str, Any]

Transcript model

As the name suggests, we will use the Transcript model to represent the full content of an earnings call transcript. It captures several key pieces of information:

id: Analogous to Chunk, this gives us a unique identifier
text: The full text of the transcript
company: The name of the company that the earnings call was about
date: The date of the earnings call
quarter: The fiscal quarter that the earnings call was in
chunks: A list of Chunk objects, each representing a meaningful segment of the full transcript

To ensure the date field is handled correctly, the to_datetime validator is used to convert the value to datetime format.

from datetime import datetime

from pydantic import field_validator


class Transcript(BaseModel):
    """A transcript of a company earnings call."""

    id: uuid.UUID = Field(default_factory=uuid.uuid4)
    text: str
    company: str
    date: datetime
    quarter: str | None = None
    chunks: list[Chunk] | None = None

    @field_validator("date", mode="before")
    @classmethod
    def to_datetime(cls, d: Any) -> datetime:
        """Convert input to a datetime object."""
        if isinstance(d, datetime):
            return d
        if hasattr(d, "isoformat"):
            return datetime.fromisoformat(d.isoformat())
        return datetime.fromisoformat(str(d))

Chunker class

Now, we define the Chunker class to split each transcript into semantically meaningful chunks. Instead of relying on arbitrary rules like character count or line break, we apply semantic chunking to preserve more of the contextual integrity of the original transcript. This ensures that each chunk is a self-contained unit that keeps contextually linked ideas together. This is particularly helpful for downstream tasks like statement extraction, where context heavily influences accuracy.

The chunker class contains two methods:

find_quarter

This method attempts to extract the fiscal quarter (e.g., "Q1 2023") directly from the transcript text using a simple regular expression. In this case, this is straightforward as the data format of quarters in the transcripts is consistent and well defined.

However, in real world scenarios, detecting the quarter reliably may require more work. Across multiple sources or document types the detailing of the quarter is likely to be different. LLMs are great tools to help alleviate this issue. Try using GPT-4.1-mini with a prompt specifically to extract the quarter given wider context from the document.
generate_transcripts_and_chunks

This is the core method that takes in a dataset (as an iterable of dictionaries) and returns a list of Transcript objects each populated with semantically derived Chunks. It performs the following steps:
1. Transcript creation: Initializes Transcript objects using the provided text, company, and date fields
2. Filtering: Uses the SemanticChunker from chonkie along with OpenAI's text-embedding-3-small model to split the transcript into logical segments
3. Chunk assignment: Wraps each semantic segment into a Chunk model, attaching relevant metadata like start and end indices

The chunker falls in to this part of our pipeline:

Temporal Agent Pipeline - Chunker

import re
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Any

from chonkie import OpenAIEmbeddings, SemanticChunker
from tqdm import tqdm


class Chunker:
    """
    Takes in transcripts of earnings calls and extracts quarter information and splits
    the transcript into semantically meaningful chunks using embedding-based similarity.
    """

    def __init__(self, model: str = "text-embedding-3-small"):
        self.model = model

    def find_quarter(self, text: str) -> str | None:
        """Extract the quarter (e.g., 'Q1 2023') from the input text if present, otherwise return None."""
        # In this dataset we can just use regex to find the quarter as it is consistently defined
        search_results = re.findall(r"[Q]\d\s\d{4}", text)

        if search_results:
            quarter = str(search_results[0])
            return quarter

        return None


    def generate_transcripts_and_chunks(
        self,
        dataset: Any,
        company: list[str] | None = None,
        text_key: str = "transcript",
        company_key: str = "company",
        date_key: str = "date",
        threshold_value: float = 0.7,
        min_sentences: int = 3,
        num_workers: int = 50,
    ) -> list[Transcript]:
        """Populate Transcript objects with semantic chunks."""
        # Populate the Transcript objects with the passed data on the transcripts
        transcripts = [
            Transcript(
                text=d[text_key],
                company=d[company_key],
                date=d[date_key],
                quarter=self.find_quarter(d[text_key]),
            )
            for d in dataset
        ]

        if company:
            transcripts = [t for t in transcripts if t.company in company]

        def _process(t: Transcript) -> Transcript:
            if not hasattr(_process, "chunker"):
                embed_model = OpenAIEmbeddings(self.model)
                _process.chunker = SemanticChunker(
                    embedding_model=embed_model,
                    threshold=threshold_value,
                    min_sentences=max(min_sentences, 1),
                )
            semantic_chunks = _process.chunker.chunk(t.text)
            t.chunks = [
                Chunk(
                    text=c.text,
                    metadata={
                        "start_index": getattr(c, "start_index", None),
                        "end_index": getattr(c, "end_index", None),
                    },
                )
                for c in semantic_chunks
            ]
            return t

        # Create the semantic chunks and add them to their respective Transcript object using a thread pool
        with ThreadPoolExecutor(max_workers=num_workers) as pool:
            futures = [pool.submit(_process, t) for t in transcripts]
            transcripts = [
                f.result()
                for f in tqdm(
                    as_completed(futures),
                    total=len(futures),
                    desc="Generating Semantic Chunks",
                )
            ]

        return transcripts

raw_data = list(my_dataset)

chunker = Chunker()
transcripts = chunker.generate_transcripts_and_chunks(raw_data)

Alternatively, we can load just the AMD and NVDA pre-chunked transcripts from pre-processed files in transcripts/

import pickle
from pathlib import Path


def load_transcripts_from_pickle(directory_path: str = "transcripts/") -> list[Transcript]:
    """Load all pickle files from a directory into a dictionary."""
    loaded_transcripts = []
    dir_path = Path(directory_path).resolve()


    for pkl_file in sorted(dir_path.glob("*.pkl")):
        try:
            with open(pkl_file, "rb") as f:
                transcript = pickle.load(f)
                # Ensure it's a Transcript object
                if not isinstance(transcript, Transcript):
                    transcript = Transcript(**transcript)
                loaded_transcripts.append(transcript)
                print(f"✅ Loaded transcript from {pkl_file.name}")
        except Exception as e:
            print(f"❌ Error loading {pkl_file.name}: {e}")

    return loaded_transcripts

# transcripts = load_transcripts_from_pickle()

Now we can inspect a couple of chunks:

chunks = transcripts[0].chunks
if chunks is not None:
    for i, chunk in enumerate(chunks[21:23]):
        print(f"Chunk {i+21}:")
        print(f"  ID: {chunk.id}")
        print(f"  Text: {repr(chunk.text[:200])}{'...' if len(chunk.text) > 100 else ''}")
        print(f"  Metadata: {chunk.metadata}")
        print()
else:
    print("No chunks found for the first transcript.")

With this, we have successfully split our transcripts into semantically sectioned chunks. We can now move onto the next steps in our pipeline.

3.2.3. Laying the Foundations for our Temporal Agent

Before we move onto defining the TemporalAgent class, we will first define the prompts and data models that are needed for it to function.

Formalizing our label definitions

For our temporal agent to be able to accurately extract the statement and temporal types we need to provide it with sufficiently detailed and specific context. For convenience, we define these within a structured format below.

Each label contains three crucial pieces of information that we will later pass to our LLMs in prompts.

definition

Provides a concise description of what the label represents. It establishes the conceptual boundaries of the statement or temporal type and ensures consistency in interpretation across examples.
date_handling_guidance

Explains how to interpret the temporal validity of a statement associated with the label. It describes how the valid_at and invalid_at dates should be derived when processing instances of that label.
date_handling_examples

Includes illustrative examples of how real-world statements would be labelled and temporally annotated under this label. These will be used as few-shot examples to the LLMs downstream.

LABEL_DEFINITIONS: dict[str, dict[str, dict[str, str]]] = {
    "episode_labelling": {
        "FACT": dict(
            definition=(
                "Statements that are objective and can be independently "
                "verified or falsified through evidence."
            ),
            date_handling_guidance=(
                "These statements can be made up of multiple static and "
                "dynamic temporal events marking for example the start, end, "
                "and duration of the fact described statement."
            ),
            date_handling_example=(
                "'Company A owns Company B in 2022', 'X caused Y to happen', "
                "or 'John said X at Event' are verifiable facts which currently "
                "hold true unless we have a contradictory fact."
            ),
        ),
        "OPINION": dict(
            definition=(
                "Statements that contain personal opinions, feelings, values, "
                "or judgments that are not independently verifiable. It also "
                "includes hypothetical and speculative statements."
            ),
            date_handling_guidance=(
                "This statement is always static. It is a record of the date the "
                "opinion was made."
            ),
            date_handling_example=(
                "'I like Company A's strategy', 'X may have caused Y to happen', "
                "or 'The event felt like X' are opinions and down to the reporters "
                "interpretation."
            ),
        ),
        "PREDICTION": dict(
            definition=(
                "Uncertain statements about the future on something that might happen, "
                "a hypothetical outcome, unverified claims. It includes interpretations "
                "and suggestions. If the tense of the statement changed, the statement "
                "would then become a fact."
            ),
            date_handling_guidance=(
                "This statement is always static. It is a record of the date the "
                "prediction was made."
            ),
            date_handling_example=(
                "'It is rumoured that Dave will resign next month', 'Company A expects "
                "X to happen', or 'X suggests Y' are all predictions."
            ),
        ),
    },
    "temporal_labelling": {
        "STATIC": dict(
            definition=(
                "Often past tense, think -ed verbs, describing single points-in-time. "
                "These statements are valid from the day they occurred and are never "
                "invalid. Refer to single points in time at which an event occurred, "
                "the fact X occurred on that date will always hold true."
            ),
            date_handling_guidance=(
                "The valid_at date is the date the event occurred. The invalid_at date "
                "is None."
            ),
            date_handling_example=(
                "'John was appointed CEO on 4th Jan 2024', 'Company A reported X percent "
                "growth from last FY', or 'X resulted in Y to happen' are valid the day "
                "they occurred and are never invalid."
            ),
        ),
        "DYNAMIC": dict(
            definition=(
                "Often present tense, think -ing verbs, describing a period of time. "
                "These statements are valid for a specific period of time and are usually "
                "invalidated by a Static fact marking the end of the event or start of a "
                "contradictory new one. The statement could already be referring to a "
                "discrete time period (invalid) or may be an ongoing relationship (not yet "
                "invalid)."
            ),
            date_handling_guidance=(
                "The valid_at date is the date the event started. The invalid_at date is "
                "the date the event or relationship ended, for ongoing events this is None."
            ),
            date_handling_example=(
                "'John is the CEO', 'Company A remains a market leader', or 'X is continuously "
                "causing Y to decrease' are valid from when the event started and are invalidated "
                "by a new event."
            ),
        ),
        "ATEMPORAL": dict(
            definition=(
                "Statements that will always hold true regardless of time therefore have no "
                "temporal bounds."
            ),
            date_handling_guidance=(
                "These statements are assumed to be atemporal and have no temporal bounds. Both "
                "their valid_at and invalid_at are None."
            ),
            date_handling_example=(
                "'A stock represents a unit of ownership in a company', 'The earth is round', or "
                "'Europe is a continent'. These statements are true regardless of time."
            ),
        ),
    },
}

3.2.4. Statement Extraction

"Statement Extraction" refers to the process of splitting our semantic chunks into the smallest possible "atomic" facts. Within our Temporal Agent, this is achieved by:

Finding every standalone, declarative claim

Extract statements that can stand on their own as complete subject-predicate-object expressions without relying on surrounding context.
Ensuring atomicity

Break down complex or compound sentences into minimal, indivisible factual units, each expressing a single relationship.
Resolving references

Replace pronouns or abstract references (e.g., "he" or "The Company") with specific entities (e.g., "John Smith", "AMD") using the main subject for disambiguation.
Preserving temporal and quantitative precision

Retain explicit dates, durations, and quantities to anchor each fact precisely in time and scale.
Labelling each extracted statement

Every statement is annotated with a StatementType and a TemporalType.

Temporal Types

The TemporalType enum provides a standardized set of temporal categories that make it easier to classify and work with statements extracted from earnings call transcripts.

Each category captures a different kind of temporal reference:

Atemporal: Statements that are universally true and invariant over time (e.g., “The speed of light in a vacuum is ≈3×10⁸ m s⁻¹.”).
Static: Statements that became true at a specific point in time and remain unchanged thereafter (e.g., “Person YY was CEO of Company XX on October 23rd, 2014.”).
Dynamic: Statements that may change over time and require temporal context to interpret accurately (e.g., “Person YY is CEO of Company XX.”).

from enum import StrEnum


class TemporalType(StrEnum):
    """Enumeration of temporal types of statements."""

    ATEMPORAL = "ATEMPORAL"
    STATIC = "STATIC"
    DYNAMIC = "DYNAMIC"

Statement Types

Similarly, the StatementType enum classifies the nature of each extracted statement, capturing its epistemic characteristics.

Fact: A statement that asserts a verifiable claim considered true at the time it was made. However, it may later be superseded or contradicted by other facts (e.g., updated information or corrections).
Opinion: A subjective statement reflecting a speaker’s belief, sentiment, or judgment. By nature, opinions are considered temporally true at the moment they are expressed.
Prediction: A forward-looking or hypothetical statement about a potential future event or outcome. Temporally, a prediction is assumed to hold true from the time of utterance until the conclusion of the inferred prediction window.

class StatementType(StrEnum):
    """Enumeration of statement types for statements."""

    FACT = "FACT"
    OPINION = "OPINION"
    PREDICTION = "PREDICTION"

Raw Statement

The RawStatement model represents an individual statement extracted by an LLM, annotated with both its semantic type (StatementType) and temporal classification (TemporalType). These raw statements serve as intermediate representations and are intended to be transformed into TemporalEvent objects in later processing stages.

Core fields:

statement: The textual content of the extracted statement
statement_type: The type of statement (Fact, Opinion, Prediction), based on the StatementType enum
temporal_type: The temporal classification of the statement (Static, Dynamic, Atemporal), drawn from the TemporalType enum

The model includes field-level validators to ensure that all type annotations conform to their respective enums, providing a layer of robustness against invalid input.

The companion model RawStatementList contains the output of the statement extraction step: a list of RawStatement instances.

from pydantic import field_validator


class RawStatement(BaseModel):
    """Model representing a raw statement with type and temporal information."""

    statement: str
    statement_type: StatementType
    temporal_type: TemporalType

    @field_validator("temporal_type", mode="before")
    @classmethod
    def _parse_temporal_label(cls, value: str | None) -> TemporalType:
        if value is None:
            return TemporalType.ATEMPORAL
        cleaned_value = value.strip().upper()
        try:
            return TemporalType(cleaned_value)
        except ValueError as e:
            raise ValueError(f"Invalid temporal type: {value}. Must be one of {[t.value for t in TemporalType]}") from e

    @field_validator("statement_type", mode="before")
    @classmethod
    def _parse_statement_label(cls, value: str | None = None) -> StatementType:
        if value is None:
            return StatementType.FACT
        cleaned_value = value.strip().upper()
        try:
            return StatementType(cleaned_value)
        except ValueError as e:
            raise ValueError(f"Invalid temporal type: {value}. Must be one of {[t.value for t in StatementType]}") from e

class RawStatementList(BaseModel):
    """Model representing a list of raw statements."""

    statements: list[RawStatement]

Statement Extraction Prompt

This is the core prompt that powers our Temporal Agent's ability to extract and label atomic statements. It is written in Jinja allowing us to modularly compose dynamic inputs without rewriting the core logic.

Anatomy of the prompt

Step 1: Creating a Temporally-Aware Knowledge Graph with a Temporal Agent

Build a pipeline that extracts entities and relations from unstructured text, resolves temporal conflicts, and keeps your graph up-to-date as new information arrives.

Step 2: Multi-Step Retrieval Over a Knowledge Graph

Use structured queries and language model reasoning to chain multiple hops across your graph and answer complex questions.

Step 3: Prototype to Production

Move from experimentation to deployment. This section covers architectural tips, integration patterns, and considerations for scaling reliably.

Discussion

Build and Query Temporal Knowledge Graphs

What it gets done

Add it to your toolbox

Steps in the chain

1. Executive Summary

What it does

How it connects

1. Executive Summary

1.1. Purpose and Audience

1.2. Key takeaways

Creating a Temporally-Aware Knowledge Graph with a Temporal Agent

Multi-Step Retrieval Over a Knowledge Graph

Prototype to Production

2. How to Use This Cookbook

2.1. Pre-requisites

3. Creating a Temporally-Aware Knowledge Graph with a Temporal Agent

3.1. Introducing our Temporal Agent

3.1.1. Key enhancements introduced in this cookbook

3.1.2. The Temporal Agent Pipeline

3.1.3. Selecting the right model for a Temporal Agent

Recommended development workflow

3.2. Building our Temporal Agent Pipeline

Architecture diagram

3.2.1. Load transcripts

3.2.2. Creating a Semantic Chunker

Chunk model

Transcript model

Chunker class

3.2.3. Laying the Foundations for our Temporal Agent

Formalizing our label definitions

3.2.4. Statement Extraction

Temporal Types

Statement Types

Raw Statement

Statement Extraction Prompt

Anatomy of the prompt

Step 1: Creating a Temporally-Aware Knowledge Graph with a Temporal Agent

Step 2: Multi-Step Retrieval Over a Knowledge Graph

Step 3: Prototype to Production

Questions & comments · 0