2 août 2024
Creating an Audio Activity Indicator in the Browser
5 minutes de lecture
During a recent video conferencing app project, I faced the challenge of implementing audio activity detection on participants' thumbnails. A rather common feature. But during my research, I found that resources on this topic were quite scarce. That's why I decided to share my experience with you.
In this article, we will explore how to detect a participant's audio activity using a MediaStream and the browser's Web Audio API.
Audio in the Browser
To develop this feature, we will need a MediaStream (representing the participant's stream) and the browser's Web Audio API (to extract information from this stream). For those not familiar with these concepts, here's a quick overview.
MediaStream
A MediaStream is like a live multimedia data stream. Imagine it as a pipe through which the sound and/or video from your microphone or camera pass. If you're wondering how to obtain one, I recommend this article.
In this article, we assume you have successfully obtained a MediaStream to analyze. What interests us in this stream are the audio tracks.
Web Audio API & AnalyserNode
The Web Audio API is a powerful set of tools provided by the browser to manipulate and analyze sounds. This API is generally well-supported by browsers. Among these tools, we will focus on the analyzer (AnalyserNode). It's like a sophisticated device that gives us the characteristics and detailed information of a sound in real-time. This analyzer, once connected to a MediaStream, will provide us with the information we're looking for at a given moment.
Analyze What?
To detect a participant's audio activity, we need a function that, at regular intervals, can tell us whether an audio stream is active or not.
To do this, we will use the concept of a threshold.
We will define a threshold value representing an intensity above which we consider our audio signal to be active. Below this threshold, we consider it inactive.
This threshold is a parameter that can be adjusted through usage (depending on background noise, microphone quality, etc.).
Through an AnalyzerNode, we can copy the intensity of several frequency bins of an audio signal at a given moment. With a simple average calculation, we can deduce a global intensity and compare it to our threshold value.
This translates to the following function:
/**
* Check if the audio level is exceeding the threshold
*/
export function isAudioLevelExceedingThreshold({
threshold,
analyser,
}: {
threshold: number
analyser: AnalyserNode
}) {
const dataArray = new Uint8Array(analyser.frequencyBinCount)
analyser.getByteFrequencyData(dataArray)
const average = dataArray.reduce((sum, value) => sum + value, 0) / dataArray.length
const normalizedValue = average / 255
return normalizedValue > threshold
}
The Web Audio API uses 8-bit frequency values. This means each value in the dataArray
is an
integer between 0 and 255. Our average
value is thus an integer between 0 and 255. We normalize
this value to bring it to a range between 0 and 1 (0 representing absolute silence and 1 the
maximum possible intensity).
At regular intervals, we will need to call the isAudioLevelExceedingThreshold
function and then, depending on its return value (true
or false
), perform an action.
/**
* Check if the audio level is exceeding the threshold,
* and call the appropriate callback
*/
export function checkAudioLevel({
threshold,
analyser,
onExceedThreshold,
onBelowThreshold,
}: {
threshold: number
analyser: AnalyserNode
onExceedThreshold: () => void
onBelowThreshold: () => void
}) {
if (isAudioLevelExceedingThreshold({ threshold, analyser })) {
onExceedThreshold()
} else {
onBelowThreshold()
}
}
That's it for the logic part of our audio activity indicator. Now, let's integrate it into our application.
Integration in React
Here, I will show you how to integrate this logic into a React component.
To do this, I will isolate the logic in a custom hook useAudioActivityIndicator
which will take mediaStream
, a threshold
value, and an interval
value (representing the time interval between checks) as parameters.
This hook will manage the creation of the AudioContext
, the AnalyserNode
, and start the audio activity checks at regular intervals.
It will return a state isExceedingThreshold
which will be true
if audio activity is detected, and false
otherwise.
import { useEffect, useRef, useState } from 'react'
import { checkAudioLevel } from './utils'
interface AudioLevelDetectorOptions {
threshold?: number // a number between 0 and 1, where 1 represents the maximum audio level and 0 represents the minimum audio level
interval?: number
}
const useAudioLevelDetector = (
mediaStream: MediaStream | null,
options: AudioLevelDetectorOptions = {}
): boolean => {
const [isExceedingThreshold, setIsExceedingThreshold] = useState(false)
const audioContextRef = useRef<AudioContext | null>(null)
const analyserRef = useRef<AnalyserNode | null>(null)
const timeoutIdRef = useRef<number | null>(null)
const threshold = options.threshold || 0.1
const interval = options.interval || 100
function onInterval() {
if (!analyserRef.current) return
checkAudioLevel({
threshold,
analyser: analyserRef.current!,
onExceedThreshold: () => {
setIsExceedingThreshold(true)
},
onBelowThreshold: () => {
setIsExceedingThreshold(false)
},
})
}
function onCleanUp() {
if (timeoutIdRef.current) {
clearTimeout(timeoutIdRef.current)
}
if (audioContextRef.current && audioContextRef.current.state !== 'closed') {
// Close the audio context if it is not closed
audioContextRef.current.close()
}
audioContextRef.current = null
analyserRef.current = null
}
useEffect(() => {
if (!mediaStream || !mediaStream.getAudioTracks().length) {
setIsExceedingThreshold(false)
return
}
if (!audioContextRef.current) {
audioContextRef.current = new (window.AudioContext || (window as any).webkitAudioContext)() // Create an AudioContext, or a webkitAudioContext for Safari
}
if (!analyserRef.current) {
analyserRef.current = audioContextRef.current.createAnalyser() // Create an AnalyserNode
}
const source = audioContextRef.current.createMediaStreamSource(mediaStream) // Add the media stream to the audio context
source.connect(analyserRef.current) // Connect the source to the analyser
timeoutIdRef.current = window.setInterval(onInterval, interval)
onInterval()
return () => {
onCleanUp()
}
}, [mediaStream, threshold, interval])
return isExceedingThreshold
}
export default useAudioLevelDetector
Now, we just need to use this hook in our React component and apply the style of your choice when the threshold is exceeded.
In the following example, I chose to display white borders on the active thumbnail.
The <AudioActivityIndicator>
component handles this.
Github repository of the example
You can, of course, customize the style and even modify the overall behavior of the audio activity indicator to suit your needs. For instance, instead of treating audio activity in a binary manner (active or inactive), you might consider varying the intensity of the indicator based on the audio signal's intensity. In that case, you'd use a range of intensities rather than a single threshold value.
It's more visually pleasing to have a reactive and instant animation when switching from inactive → active, and conversely, animate with a small delay and more gradually in the reverse transition.
The Web Audio API is a powerful tool that allows you to manipulate and analyze sounds within the browser. In this article, we have barely scratched the surface by focusing on audio activity detection. But, if you wish to dive deeper, much more can be done with this API. I recommend this article on how to play music in your browser using the Web Audio API, or you can check out the official documentation to learn more about it.