What problem does this prompt chain solve?

It addresses Whisper's tendency to misspell unfamiliar proper nouns like company and product names by comparing two correction strategies: using Whisper's prompt parameter to guide initial transcription, or using GPT-4 post-processing to fix misspellings after transcription.

What are the tradeoffs between the two correction approaches?

Whisper's prompt parameter is faster and cheaper but limited to 244 tokens and less reliable for longer term lists. GPT-4 post-processing is more reliable and scalable to larger lists but costs more, adds latency from an extra API call, and is still bounded by GPT-4's context window.

When should I use this pattern?

Use it when transcribing audio containing proprietary or unusual proper nouns that Whisper misspells and you have a known list of correct spellings. For short term lists, Whisper's prompt may suffice; for longer lists, GPT-4 post-processing is more reliable.

What models and APIs does this use?

It uses the OpenAI Python SDK with the `whisper-1` model for transcription and `gpt-4` via chat completions at `temperature=0` for post-processing spell correction.

Prompt Chain

Enhance Transcription Accuracy with Prompting and Post-Processing

Name: Addressing transcription misspellings: prompt vs post-processing
Availability: OnlineOnly
Author: OpenAI Cookbook

Fix Whisper transcription misspellings of proper nouns using its prompt parameter or GPT-4 post-processing.

Copy chain

Works with openai

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated 19 days ago

Version 1.0.0

Models

gpt 4ogpt 4

Add to Favorites

Why it matters

Improve the accuracy of audio transcriptions, especially for proper nouns like company and product names, by leveraging both Whisper's prompt capabilities and GPT-4's post-processing.

Outcomes

What it gets done

Guide Whisper transcriptions using a list of correct spellings in the prompt.

Utilize GPT-4 for post-processing to correct misspellings identified in transcriptions.

Compare the effectiveness of prompt-based guidance versus GPT-4 post-processing for accuracy.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-whispercorrectmisspelling | bash

Steps

Steps in the chain

Import OpenAI library and download audio

Set baseline with fictitious audio recording

Pass correct names list to Whisper prompt

Use GPT-4 for post-processing spell correction

Overview

Addressing transcription misspellings: prompt vs post-processing

An OpenAI Cookbook comparison of fixing Whisper transcription misspellings on proper nouns via Whisper's prompt parameter versus a GPT-4 post-processing correction pass. Use Whisper's prompt parameter for short spelling lists (244-token limit). Use GPT-4 post-processing for larger or uncertain lists, accepting higher cost and latency for better reliability.

What it does

This notebook compares two ways to fix Whisper transcription errors on unfamiliar proper nouns - company names, product names, acronyms - using a fictitious company, ZyntriQix, with products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, and DigiFractal Matrix, plus acronyms like PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., and F.L.I.N.T. Approach one feeds a list of correct spellings directly into Whisper's prompt parameter to guide the initial transcription; approach two instead uses GPT-4 to correct misspellings after transcription, using that same spelling list in its prompt. Both start from a baseline recording where Whisper transcribes the company name, product names, and acronym capitalization incorrectly without any guidance.

When to use - and when NOT to

Use Whisper's prompt parameter when your list of correct spellings is short, since it has a hard 244-token limit; some product names still come through misspelled even with the list supplied. Use GPT-4 post-processing when the spelling list is too large for that limit, or when you don't know which terms will actually appear in a given transcription - it's more scalable and, in this notebook's testing, more reliable at correctly spelling proprietary names, including when tested against a larger list mixing known and new product names. The tradeoff is cost and latency: GPT-4 post-processing costs more and is slower than relying on Whisper's prompt alone.

Inputs and outputs

Both approaches take the same input - a list of correct spellings - but apply it at different points: as Whisper's prompt parameter before transcription, or as GPT-4's prompt after transcription, acting purely as a spell-checking pass over Whisper's raw output.

Integrations

GPT-4 post-processing itself has a ceiling too: it's bounded by the model's context window, so companies with thousands of SKUs may still find a single GPT-4 pass insufficient and need an alternative approach, such as retrieval or chunking the correction list.

Who it's for

Developers building transcription pipelines for domains full of unfamiliar proper nouns - company names, product catalogs, brand terms - who need to choose between Whisper's built-in prompt guidance and a GPT-4 post-processing pass based on list size, cost, and latency tolerance. The baseline test in this notebook is a monologue that was itself generated by ChatGPT from author-supplied prompts and then read aloud and recorded by the author, giving a controlled way to know exactly which proper nouns should appear in the transcript and to measure precisely which ones Whisper got wrong before any correction technique was applied.

Source README

Addressing transcription misspellings: prompt vs post-processing

We are addressing the problem of enhancing the precision of transcriptions, particularly when it comes to company names and product references. Our solution involves a dual strategy that utilizes both the Whisper prompt parameter and GPT-4's post-processing capabilities.

Two approaches to correct inaccuracies are:

We input a list of correct spellings directly into Whisper's prompt parameter to guide the initial transcription.
We utilized GPT-4 to fix misspellings post transcription, again using the same list of correct spellings in the prompt.

These strategies aimed at ensuring precise transcription of unfamilar proper nouns.

Setup

To get started, let's:

Import the OpenAI Python library (if you don't have it, you'll need to install it with pip install openai)
Download the audio file example

Setting our baseline with a fictitious audio recording

Our reference point is a monologue, which was generated by ChatGPT from prompts given by the author. The author then voiced this content. So, the author both guided the ChatGPT's output with prompts and brought it to life by speaking it.

Our fictitious company, ZyntriQix, offers a range of tech products. These include Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, and DigiFractal Matrix. We also spearhead several initiatives such as PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., and F.L.I.N.T.

Whisper transcribed our company name, product names, and miscapitalized our acronyms incorrectly. Let's pass the correct names as a list in the prompt.

When passing the list of product names, some of the product names are transcribed correctly while others are still misspelled.

You can use GPT-4 to fix spelling mistakes

Leveraging GPT-4 proves especially useful when the speech content is unknown beforehand and we have a list of product names readily available.

The post-processing technique using GPT-4 is notably more scalable than depending solely on Whisper's prompt parameter, which has a token limit of 244. GPT-4 allows us to process larger lists of correct spellings, making it a more robust method for handling extensive product lists.

However, this post-processing technique isn't without limitations. It's constrained by the context window of the chosen model, which may pose challenges when dealing with vast numbers of unique terms. For instance, companies with thousands of SKUs may find that the context window of GPT-4 is insufficient to handle their requirements, and they might need to explore alternative solutions.

Interestingly, the GPT-4 post-processing technique seems more reliable than using Whisper alone. This method, which leverages a product list, enhances the reliability of our results. However, this increased reliability comes at a price, as using this approach can increase costs and can result in higher latency.

Now, let's input the original product list into GPT-4 and evaluate its performance. By doing so, we aim to assess the AI model's ability to correctly spell the proprietary product names, even with no prior knowledge of the exact terms to appear in the transcription. In our experiment, GPT-4 was successful in correctly spelling our product names, confirming its potential as a reliable tool for ensuring transcription accuracy.

In this case, we supplied a comprehensive product list that included all the previously used spellings, along with additional new names. This scenario simulates a real-life situation where we have a substantial SKU list and uncertain about the exact terms to appear in the transcription. Feeding this extensive list of product names into the system resulted in a correctly transcribed output.

We are employing GPT-4 as a spell checker, using the same list of correct spellings that was previously used in the prompt.

FAQ

Common questions

Discussion

Enhance Transcription Accuracy with Prompting and Post-Processing

What it gets done

Add it to your toolbox

Steps in the chain

Addressing transcription misspellings: prompt vs post-processing

What it does

When to use - and when NOT to

Inputs and outputs

Integrations

Who it's for

Addressing transcription misspellings: prompt vs post-processing

Setup

Setting our baseline with a fictitious audio recording

You can use GPT-4 to fix spelling mistakes

Common questions

Questions & comments · 0