In-Browser Speech to Text Using the Web Speech API

Nil Seri

4 min readNov 28, 2024

Exploring the Web Speech API through an Angular App

SpeechRecognition

The SpeechRecognition interface of the Web Speech API is the controller interface for the recognition service.

The Web Speech API has two functions:
speech synthesis — text to speech
speech recognition — speech to text.

You can check the browser compatibility table here.

Apparently, it is not supported in Brave browser, giving me a “network” error so I switched back to Chrome.

The SpeechRecognitionEvent interface of the Web Speech API represents the event object for the results and nomatch events, and contains all the data associated with an interim or final speech recognition result.

results property is a list of SpeechRecognitionResult objects. Inspecting that result shows a list of SpeechRecognitionAlternative objects and the first one includes the transcript of what you said and a confidence value between 0 — 1.

Parameters:

continuous:
true — Controls whether continuous results are captured
false — just a single result each time recognition is started.

interimResults:
true — the speech recognition system should return interim results.
false — the speech recognition system should return just final results.

lang:
a string representing the BCP 47 language tag such as en-US, en-GB, de-DE, fr-FR, es-ES, tr-TR, etc.

BCP 47 Language Tags is the Internet Best Current Practices (BCP) for language tags. You will most commonly find language tags written with 2 subtags — language and region.

SpeechRecognition Methods:

start(): starts the speech recognition service listening to incoming audio with intent to recognize grammars.

stop(): stops the speech recognition service from listening to incoming audio, and attempts to return a SpeechRecognitionResult using the audio captured so far.

Events:

listen using addEventListener() or by assigning an event listener to the oneventname.

start: Fired when the speech recognition service has begun listening to incoming audio with intent to recognize grammars.

result: Fired when the speech recognition service returns a result — a word or phrase has been positively recognized.

error: Fired when a speech recognition error occurs.

end: Fired when the speech recognition service has disconnected.

There are also other methods such as audiostart, soundstart, speechstart as well as start and their end versions.

start vs audiostart vs soundstart vs speechstart events

Their flow occurs as follows:

start → audiostart → soundstart → speechstart → speechend → soundend → audioend → result → end

for Speech Recognition (Speech-to-Text):

Calling this feature speech recognition “in the browser” is not exactly accurate. When using the SpeechRecognition interface of the Web Speech API, your speech input is often sent to remote servers (e.g., Google’s servers in Chrome) for processing.

for Speech Synthesis (Text-to-Speech):

SpeechSynthesis, the text-to-speech part of the API, generally does not send data to servers. It works locally using the speech synthesis engines installed in the operating system or the browser itself.

I have developed an Angular project with Angular v19 and Bootstrap. You can find the code here:

Since speechRecognition events occur outside of Angular’s zone, Angular’s change detection may not be aware of changes to the transcript property. To fix this, you need to wrap the update inside Angular’s NgZone.

You can try it out here.

Here are some screenshots:

Other Code Examples & Playgrounds:
https://codepen.io/joelbonetr/pen/poLGNde
https://gist.github.com/alrra/3784549
https://browser-recognition.glitch.me/

Happy Coding!

References:
https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition
https://levelup.gitconnected.com/i-created-a-speech-to-text-app-in-under-60-seconds-b06a44766b74
https://www.twilio.com/en-us/blog/speech-recognition-browser-web-speech-api
https://dev.to/joelbonetr/speech-recognition-with-javascript-59g1
https://www.techonthenet.com/js/language_tags.php
https://stackoverflow.com/questions/74113965/speechrecognition-emitting-network-error-event-in-brave-browser
https://webreflection.medium.com/taming-the-web-speech-api-ef64f5a245e1
https://developer.chrome.com/blog/voice-driven-web-apps-introduction-to-the-web-speech-api

In-Browser Speech to Text Using the Web Speech API

SpeechRecognition

Parameters:

SpeechRecognition Methods:

Events:

Written by Nil Seri

Responses (1)