Whisper WebGPU

5 min readNov 26, 2024

A short overview of WebGPU, ONNX (Open Neural Network Exchange), ONNX Runtime, Transformers.js, Whisper from OpenAI and Whisper WebGPU by Xenova

What is WebGPU?

WebGPU provides support for general-purpose GPU computations, faster operations, and access to more advanced GPU features. It supports graphic rendering, but also has first-class support for GPGPU computations.

The WebGPU API is a JavaScript API provided by a web browser that enables web developers to use the underlying system’s GPU (Graphics Processing Unit).

A browser’s WebGPU implementation handles communicating with the GPU via a native GPU API driver. A WebGPU adapter effectively represents a physical GPU and driver available on the underlying system, in your code.

For browser compatibility, you can visit this link. You can read W3C document here. If you would like to try writing an app, you can find a tutorial here to build Conway’s Game of Life using WebGPU, a code lab by Google. You can read this link to learn what is next for WebGPU.

Whisper WebGPU

Whisper WebGPU is a real-time in-browser speech recognition with OpenAI Whisper.

The core of Whisper WebGPU lies in the Whisper-base model, a 73-million-parameter speech recognition model optimized for web inference. Once the model is downloaded, it is cached for future use.

ONNX (Open Neural Network Exchange) is an open-source format for AI models, allowing models trained in different frameworks to be shared and utilized seamlessly.

ONNX Runtime is a cross-platform machine-learning model accelerator, with a flexible interface to integrate hardware-specific libraries.

Transformers.js — State-of-the-art Machine Learning for the Web, is a JavaScript library that allows you to use pre-trained models from Hugging Face in a JavaScript environment. It uses ONNX Runtime to run models in the browser. It is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API.

Whisper from OpenAI

Whisper is an advanced speech recognition system developed by OpenAI. It’s designed to transcribe spoken language into written text and can also translate different languages. Whisper is known for its accuracy and ability to understand a variety of accents, languages, and even background noise, making it one of the most reliable tools for converting audio to text.

Utilizing Hugging Face Transformers.js and ONNX Runtime Web, the Whisper-base model runs entirely within the user’s browser, eliminating the need to send data to a server, enabling functionality even when the device is offline.

As a developer, you no longer need to deal with back-end API code, server infrastructure or data privacy issues associated with cloud processing.

It is possible to create a web application that can transcribe meetings in real time, provide instant translations during international video calls, or enable voice commands to control web interfaces without the latency or privacy concerns associated with server-based processing.

You can find Xenova’s code in this GitHub url:

GitHub - xenova/whisper-web at experimental-webgpu

ML-powered speech recognition directly in your browser - GitHub - xenova/whisper-web at experimental-webgpu

github.com

Make sure you are on “experimental-webgpu” branch. It is a React web application. You can try it out on this website:

Whisper WebGPU - a Hugging Face Space by Xenova

Discover amazing ML apps made by the community

huggingface.co

It first checks if WebGPU is supported by the browser (navigator.gpu) since it is still an experimental feature. You can check by typing “navigator.gpu” in your DevTools Console and then enter:

You can check if your browser and its version supports it via this url. If it is supported but “navigator.gpu” is undefined or you are still getting “WebGPU is not supported by this browser” from Xenova’s Whisper WebGPU application; in Chrome, go to chrome://flags, find “Unsafe WebGPU Support” and check if it is enabled. If not, enable it and relaunch the browser.

In Chrome, you can check the WebGPU status by opening chrome://gpu/.

I have developed an Angular version of the Whisper WebGPU project with Angular v18 and Bootstrap. You can find the code here:

GitHub - senoritadeveloper01/nils-whisper-web-gpu-app: an Angular version of the Whisper WebGPU…

an Angular version of the Whisper WebGPU project developed with Angular v18 and Bootstrap …

github.com

You can try it out here.

Here are some screenshots:

Whisper WebGPU — Angular version — Transcription Settings

Whisper WebGPU — Angular version — Recording Audio

Whisper WebGPU — Angular version — Loading models

Whisper WebGPU — Angular version — Transcription Result

Whisper WebGPU — Angular version — Cached models

Happy Coding!

References:
https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API
https://en.wikipedia.org/wiki/WebGPU
https://www.marktechpost.com/2024/06/08/whisper-webgpu-real-time-in-browser-speech-recognition-with-openai-whisper
https://huggingface.co/docs/transformers.js/en/index
https://onnxruntime.ai/docs/
https://dev.to/proflead/real-time-audio-to-text-in-your-browser-whisper-webgpu-tutorial-j6d

Whisper WebGPU

What is WebGPU?

Whisper WebGPU

Whisper from OpenAI

GitHub - xenova/whisper-web at experimental-webgpu

ML-powered speech recognition directly in your browser - GitHub - xenova/whisper-web at experimental-webgpu

Whisper WebGPU - a Hugging Face Space by Xenova

Discover amazing ML apps made by the community

GitHub - senoritadeveloper01/nils-whisper-web-gpu-app: an Angular version of the Whisper WebGPU…

an Angular version of the Whisper WebGPU project developed with Angular v18 and Bootstrap …

Written by Nil Seri

No responses yet