Open source Speech Recognition transformers

Introducing Whisper

7 April 2024

0 Views 0

SaveSavedRemoved 0

Other existing approaches frequently use smaller, more closely paired audio-text training datasets,^{[^reference-1]} ^{[^reference-2]}^{[^reference-3]} or use broad but unsupervised audio pretraining.^{[^reference-4]}^{[^reference-5]}^{[^reference-6]} Because Whisper was trained on a large and diverse dataset and was not fine-tuned to any specific one, it does not beat models that specialize in LibriSpeech performance, a famously competitive benchmark in speech recognition. However, when we measure Whisper’s zero-shot performance across many diverse datasets we find it is much more robust and makes 50% fewer errors than those models.

About a third of Whisper’s audio dataset is non-English, and it is alternately given the task of transcribing in the original language or translating to English. We find this approach is particularly effective at learning speech to text translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot.

Introducing Whisper

Scaling laws for reward model overoptimization

Introducing Whisper

Image Embedding, Image Similarity and Caption generation with Live Streamlit implementation

Image, DC and Kelly Thompson lead 2024 Eisner Award Nominations

Over 700 Pounds of Ketamine Found in Transformer Statues

The power of App Inventor: Democratizing possibilities for mobile applications | MIT News

Leave a reply Cancel reply

Shopping cart