Enhancing Whisper transcriptions: pre- & post-processing techniques
This notebook offers a guide to improve the Whisper's transcriptions. We'll streamline your audio data via trimming and segmentation, enhancing Whisper's transcription quality. After transcriptions, we'll refine the output by adding punctuation, adjusting product terminology (e.g., 'five two nine' to '529'), and mitigating Unicode issues. These strategies will help improve the clarity of your transcriptions, but remember, customization based on your unique use-case may be beneficial.
Get this prompt chain
Enhancing Whisper transcriptions: pre- & post-processing techniques
This notebook offers a guide to improve the Whisper's transcriptions. We'll streamline your audio data via trimming and segmentation, enhancing Whisper's transcription quality. After transcriptions, we'll refine the output by adding punctuation, adjusting product terminology (e.g., 'five two nine' to '529'), and mitigating Unicode issues. These strategies will help improve the clarity of your transcriptions, but remember, customization based on your unique use-case may be beneficial.
Setup
To get started let's import a few different libraries:
PyDub is a simple and easy-to-use Python library for audio processing tasks such as slicing, concatenating, and exporting audio files.
The
Audioclass from theIPython.displaymodule allows you to create an audio control that can play sound in Jupyter notebooks, providing a straightforward way to play audio data directly in your notebook.For our audio file, we'll use a fictional earnings call written by ChatGPT and read aloud by the author.This audio file is relatively short, but hopefully provides you with an illustrative idea of how these pre and post processing steps can be applied to any audio file.
At times, files with long silences at the beginning can cause Whisper to transcribe the audio incorrectly. We'll use Pydub to detect and trim the silence.
Here, we've set the decibel threshold of 20. You can change this if you would like.
At times, we've seen unicode character injection in transcripts, removing any non-ASCII characters should help mitigate this issue.
Keep in mind you should not use this function if you are transcribing in Greek, Cyrillic, Arabic, Chinese, etc
This function will add formatting and punctuation to our transcript. Whisper generates a transcript with punctuation but without formatting.
Our audio file is a recording from a fake earnings call that includes a lot of financial products. This function can help ensure that if Whisper transcribes these financial product names incorrectly, that they can be corrected.
This function will create a new file with 'trimmed' appended to the original file name
Our fake earnings report audio file is fairly short in length, so we'll adjust the segments accordingly. Keep in mind you can adjust the segment length as you need.