Prompt Chain

Cluster and Describe Transactional Data

Name: Cluster and Describe Transactional Data
Availability: OnlineOnly
Author: OpenAI Cookbook

Multi-step prompt workflow that uses embeddings and K-Means clustering to automatically group unlabeled transaction data, then generates human-readable cluster

Copy chain

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Models

gpt 4o

Add to Favorites

Why it matters

Leverage unsupervised learning to cluster unlabeled transactional data based on embeddings. Use LLM to generate human-readable descriptions for each cluster, enabling effective labeling of previously unclassified transactions.

Outcomes

What it gets done

Generate embeddings for transactional data.

Apply K-Means clustering to group similar transactions.

Utilize LLM to create descriptive labels for identified clusters.

Visualize and refine cluster effectiveness for improved classification.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-clusteringfortransactionclassification | bash

Steps

Steps in the chain

Setup

Prepare the environment and data for clustering analysis. Set up embeddings created using the approach from the Multiclass classification for transactions Notebook, applied to the full 359 transactions in the dataset.

Clustering

Reuse the approach from the Clustering Notebook, using K-Means to cluster the dataset using the feature embeddings created previously. Then use the Completions endpoint to generate cluster descriptions and judge their effectiveness.

Overview

Clustering for Transaction Classification

What it does

A notebook demonstrating how to apply K-Means clustering to transaction embeddings and use GPT-3 to generate human-readable descriptions for the resulting clusters, enabling the labeling of previously unlabeled data.

How it connects

Use this when you have unlabeled transaction data with features that can be grouped into meaningful categories, and you need to generate interpretable cluster descriptions to create labels for classification tasks.

Source README

Clustering for Transaction Classification

This notebook covers use cases where your data is unlabelled but has features that can be used to cluster them into meaningful categories. The challenge with clustering is making the features that make those clusters stand out human-readable, and that is where we'll look to use GPT-3 to generate meaningful cluster descriptions for us. We can then use these to apply labels to a previously unlabelled dataset.

To feed the model we use embeddings created using the approach displayed in the notebook Multiclass classification for transactions Notebook, applied to the full 359 transactions in the dataset to give us a bigger pool for learning

Setup

Clustering

We'll reuse the approach from the Clustering Notebook, using K-Means to cluster our dataset using the feature embeddings we created previously. We'll then use the Completions endpoint to generate cluster descriptions for us and judge their effectiveness

Conclusion

We now have five new clusters that we can use to describe our data. Looking at the visualisation some of our clusters have some overlap and we'll need some tuning to get to the right place, but already we can see that GPT-3 has made some effective inferences. In particular, it picked up that items including legal deposits were related to literature archival, which is true but the model was given no clues on. Very cool, and with some tuning we can create a base set of clusters that we can then use with a multiclass classifier to generalise to other transactional datasets we might use.

Step 1: Setup

Prepare the environment and data for clustering analysis. Set up embeddings created using the approach from the Multiclass classification for transactions Notebook, applied to the full 359 transactions in the dataset.

Step 2: Clustering

Reuse the approach from the Clustering Notebook, using K-Means to cluster the dataset using the feature embeddings created previously. Then use the Completions endpoint to generate cluster descriptions and judge their effectiveness.

Discussion

Cluster and Describe Transactional Data

What it gets done

Add it to your toolbox

Steps in the chain

Clustering for Transaction Classification

What it does

How it connects

Clustering for Transaction Classification

Setup

Clustering

Conclusion

Step 1: Setup

Step 2: Clustering

Questions & comments · 0