Whisper WebGPU

Nil Seri
5 min readNov 26, 2024

--

A short overview of WebGPU, ONNX (Open Neural Network Exchange), ONNX Runtime, Transformers.js, Whisper from OpenAI and Whisper WebGPU by Xenova

Photo by Jp Valery on Unsplash

What is WebGPU?

WebGPU provides support for general-purpose GPU computations, faster operations, and access to more advanced GPU features. It supports graphic rendering, but also has first-class support for GPGPU computations.

The WebGPU API is a JavaScript API provided by a web browser that enables web developers to use the underlying system’s GPU (Graphics Processing Unit).

WebGPU General Model

A browser’s WebGPU implementation handles communicating with the GPU via a native GPU API driver. A WebGPU adapter effectively represents a physical GPU and driver available on the underlying system, in your code.

For browser compatibility, you can visit this link. You can read W3C document here. If you would like to try writing an app, you can find a tutorial here to build Conway’s Game of Life using WebGPU, a code lab by Google. You can read this link to learn what is next for WebGPU.

Whisper WebGPU

Whisper WebGPU is a real-time in-browser speech recognition with OpenAI Whisper.

The core of Whisper WebGPU lies in the Whisper-base model, a 73-million-parameter speech recognition model optimized for web inference. Once the model is downloaded, it is cached for future use.

ONNX (Open Neural Network Exchange) is an open-source format for AI models, allowing models trained in different frameworks to be shared and utilized seamlessly.

ONNX Runtime is a cross-platform machine-learning model accelerator, with a flexible interface to integrate hardware-specific libraries.

Transformers.js — State-of-the-art Machine Learning for the Web, is a JavaScript library that allows you to use pre-trained models from Hugging Face in a JavaScript environment. It uses ONNX Runtime to run models in the browser. It is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API.

Whisper from OpenAI

Whisper is an advanced speech recognition system developed by OpenAI. It’s designed to transcribe spoken language into written text and can also translate different languages. Whisper is known for its accuracy and ability to understand a variety of accents, languages, and even background noise, making it one of the most reliable tools for converting audio to text.

Utilizing Hugging Face Transformers.js and ONNX Runtime Web, the Whisper-base model runs entirely within the user’s browser, eliminating the need to send data to a server, enabling functionality even when the device is offline.

As a developer, you no longer need to deal with back-end API code, server infrastructure or data privacy issues associated with cloud processing.

It is possible to create a web application that can transcribe meetings in real time, provide instant translations during international video calls, or enable voice commands to control web interfaces without the latency or privacy concerns associated with server-based processing.

You can find Xenova’s code in this GitHub url:

Make sure you are on “experimental-webgpu” branch. It is a React web application. You can try it out on this website:

It first checks if WebGPU is supported by the browser (navigator.gpu) since it is still an experimental feature. You can check by typing “navigator.gpu” in your DevTools Console and then enter:

Chrome Console — navigator.gpu
Firefox Console — navigator.gpu

You can check if your browser and its version supports it via this url. If it is supported but “navigator.gpu” is undefined or you are still getting “WebGPU is not supported by this browser” from Xenova’s Whisper WebGPU application; in Chrome, go to chrome://flags, find “Unsafe WebGPU Support” and check if it is enabled. If not, enable it and relaunch the browser.

chrome://flags — webgpu

In Chrome, you can check the WebGPU status by opening chrome://gpu/.

chrome://gpu/

I have developed an Angular version of the Whisper WebGPU project with Angular v18 and Bootstrap. You can find the code here:

You can try it out here.

Here are some screenshots:

Whisper WebGPU — Angular version
Whisper WebGPU — Angular version — Transcription Settings
Whisper WebGPU — Angular version — Recording Audio
Whisper WebGPU — Angular version — Loading models
Whisper WebGPU — Angular version — Transcription Result
Whisper WebGPU — Angular version — Cached models

Happy Coding!

Giphy — IT Crowd

--

--

Nil Seri
Nil Seri

Written by Nil Seri

I would love to change the world, but they won’t give me the source code | coding 👩🏻‍💻 | coffee ☕️ | jazz 🎷 | anime 🐲 | books 📚 | drawing 🎨

No responses yet