Prompt Chain

Classify Transactions Using Multiple AI Methods

Classify transactions into predefined categories using zero-shot, embedding-based, or fine-tuned models. Handles labelled and unlabelled data.

Works with github

91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0

Add to Favorites

Why it matters

Leverage AI to categorize financial transactions into predefined classes. This asset explores zero-shot, embedding-based, and fine-tuned classification approaches for robust transaction analysis.

Outcomes

What it gets done

01

Perform zero-shot classification on transaction data using prompts.

02

Generate embeddings from transaction features for classification.

03

Train and apply a fine-tuned model for transaction categorization.

04

Analyze and compare the performance of different classification techniques.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-multiclassclassificationfortransactions | bash

Steps

Steps in the chain

01
Zero-shot Classification

Assess the performance of the base models at classifying transactions using a simple prompt. Provide the model with 5 categories and a catch-all of 'Could not classify' for ones that it cannot place. Start with a small sample and expand to 25 transactions to evaluate performance with no labelled examples.

02
Create embeddings

Create embeddings from the labelled set of transactions that were classified and manually corrected. Reuse the approach from the Get_embeddings_from_dataset Notebook to create embeddings from a combined field concatenating all features (Supplier, Description, Value).

03
Use embeddings for classification

Use the created embeddings to classify transactions into the named categories. Apply a template from the Classification_using_embeddings notebook to test if classifying embeddings gives better success than zero-shot classification alone.

04
Prepare training and validation sets

Prepare data for fine-tuning by creating message sequences. The first message for each will be the user prompt formatted with transaction details, and the final message will be the expected classification response from the model.

05
Prepare test set

Create a test set containing the initial user prompt for each transaction, along with the corresponding expected class label. This will be used to generate actual classifications from the fine-tuned model.

06
Apply Fine-tuned Classifier

Apply the fine-tuned classifier to unseen transactions to evaluate its performance. Compare results against the zero-shot and embedding-based classification approaches using the labelled dataset of 101 transactions.

Overview

Multiclass Classification for Transactions

What it does

This workflow provides multiple approaches to multiclass classification for transactional data. It enables the categorization of transactions into predefined buckets using zero-shot classification, classification with embeddings, and fine-tuned models.

How it connects

Use this workflow when you need to categorize transactional data into specific, predefined categories. It is effective for both labelled and unlabelled datasets.

Source README

Multiclass Classification for Transactions

For this notebook we will be looking to classify a public dataset of transactions into a number of categories that we have predefined. These approaches should be replicable to any multiclass classification use case where we are trying to fit transactional data into predefined categories, and by the end of running through this you should have a few approaches for dealing with both labelled and unlabelled datasets.

The different approaches we'll be taking in this notebook are:

  • Zero-shot Classification: First we'll do zero shot classification to put transactions in one of five named buckets using only a prompt for guidance
  • Classification with Embeddings: Following this we'll create embeddings on a labelled dataset, and then use a traditional classification model to test their effectiveness at identifying our categories
  • Fine-tuned Classification: Lastly we'll produce a fine-tuned model trained on our labelled dataset to see how this compares to the zero-shot and few-shot classification approaches

Setup

Load dataset

We're using a public transaction dataset of transactions over £25k for the Library of Scotland. The dataset has three features that we'll be using:

  • Supplier: The name of the supplier
  • Description: A text description of the transaction
  • Value: The value of the transaction in GBP

Source:

https://data.nls.uk/data/organisational-data/transactions-over-25k/

Zero-shot Classification

We'll first assess the performance of the base models at classifying these transactions using a simple prompt. We'll provide the model with 5 categories and a catch-all of "Could not classify" for ones that it cannot place.

Our first attempt is correct, M & J Ballantyne Ltd are a house builder and the work they performed is indeed Building Improvement.

Lets expand the sample size to 25 and see how it performs, again with just a simple prompt to guide it

Initial results are pretty good even with no labelled examples! The ones that it could not classify were tougher cases with few clues as to their topic, but maybe if we clean up the labelled dataset to give more examples we can get better performance.

Classification with Embeddings

Lets create embeddings from the small set that we've classified so far - we've made a set of labelled examples by running the zero-shot classifier on 101 transactions from our dataset and manually correcting the 15 Could not classify results that we got

Create embeddings

This initial section reuses the approach from the Get_embeddings_from_dataset Notebook to create embeddings from a combined field concatenating all of our features

Use embeddings for classification

Now that we have our embeddings, let see if classifying these into the categories we've named gives us any more success.

For this we'll use a template from the Classification_using_embeddings notebook

Performance for this model is pretty strong, so creating embeddings and using even a simpler classifier looks like an effective approach as well, with the zero-shot classifier helping us do the initial classification of the unlabelled dataset.

Lets take it one step further and see if a fine-tuned model trained on this same labelled datasets gives us comparable results

Fine-tuned Transaction Classification

For this use case we're going to try to improve on the few-shot classification from above by training a fine-tuned model on the same labelled set of 101 transactions and applying this fine-tuned model on group of unseen transactions

Building Fine-tuned Classifier

We'll need to do some data prep first to get our data ready. This will take the following steps:

  • To prepare our training and validation sets, we'll create a set of message sequences. The first message for each will be the user prompt formatted with the details of the transaction, and the final message will be the expected classification response from the model
  • Our test set will contain the initial user prompt for each transaction, along with the corresponding expected class label. We will then use the fine-tuned model to generate the actual classification for each transaction.

Applying Fine-tuned Classifier

Now we'll apply our classifier to see how it performs. We only had 31 unique observations in our training set and 8 in our validation set, so lets see how the performance is

Step 1: Zero-shot Classification

Assess the performance of the base models at classifying transactions using a simple prompt. Provide the model with 5 categories and a catch-all of 'Could not classify' for ones that it cannot place. Start with a small sample and expand to 25 transactions to evaluate performance with no labelled examples.

Step 2: Create embeddings

Create embeddings from the labelled set of transactions that were classified and manually corrected. Reuse the approach from the Get_embeddings_from_dataset Notebook to create embeddings from a combined field concatenating all features (Supplier, Description, Value).

Step 3: Use embeddings for classification

Use the created embeddings to classify transactions into the named categories. Apply a template from the Classification_using_embeddings notebook to test if classifying embeddings gives better success than zero-shot classification alone.

Step 4: Prepare training and validation sets

Prepare data for fine-tuning by creating message sequences. The first message for each will be the user prompt formatted with transaction details, and the final message will be the expected classification response from the model.

Step 5: Prepare test set

Create a test set containing the initial user prompt for each transaction, along with the corresponding expected class label. This will be used to generate actual classifications from the fine-tuned model.

Step 6: Apply Fine-tuned Classifier

Apply the fine-tuned classifier to unseen transactions to evaluate its performance. Compare results against the zero-shot and embedding-based classification approaches using the labelled dataset of 101 transactions.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.