Implementing Virtual Background with React using MediaPipe

Tauqueer Khan

Software Engineer

5 min

Article Banner

In this article, I will show you how to implement virtual background replacement. We will be using the WebRTC APIs to capture a stream from a camera, and then use Machine Learning with MediaPipe and WebAssembly to remove the background. If you want to see the full code sample, please scroll to the bottom of this article.

MediaPipe

MediaPipe is a cross-platform Machine Learning solution for live and streaming media that was developed and open-sourced by Google. Background replacement is just one of many features it has. It also includes real-time hand, iris and body pose tracking. For more information on what MediaPipe can do, you can read about it here.

The MediaPipe flow

The MediaPipe flow

Getting Started

Before we go further, let's start with the basics of getting a stream from the camera. To do this, let's first create a new React App or you can simply use an existing app.

npx create-react-app virtual-background

cd virtual-background

Capturing video stream from a User's Camera

We'll do all of our work in App.js. Clear out any unnecessary code so that your App.js looks similar to the example below.

1import './App.css';
2
3function App() {
4  return (
5
6  );
7}
8
9export default App;

If you run yarn start you should just see a white screen.

Next, we want to add a useEffect that requests a user's media on app load. Add the following code before the return statement.

1useEffect(() => {
2  const constraints = {
3    video: { width: { min: 1280 }, height: { min: 720 } },
4  };
5  navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
6    console.log(stream);
7  });
8}, []);

navigator.mediaDevices.getUserMedia is the API that allows us to request for either audio or video devices that the browser has access to. For simplicity, we will only be using video and providing some requirements such as min width and height. As soon as the browser reloads, you should see an alert asking for access to the camera.

Since we are simply logging the stream, you should see a MediaStream in your developer console.

Viewing our camera stream

Now we can create a video element and set its source to our media stream. Since we are using React and need access to the video element, we will use the useRef hook. We'll first declare inputVideoRef and assign useRef() to it.

Next, we add a <video> element inside the top level div and give it the inputVideoRef. We are setting the srcObject of the video element to the stream we get from the camera. We will also need a canvas to draw our new view. So we create 2 more refs canvasRef and contextRef.

Lastly, in our useEffect, we will set the contextRef to the 2D context. Your App component should look like this:

1import { useEffect, useRef } from "react";
2import "./App.css";
3function App() {
4  const inputVideoRef = useRef();
5  const canvasRef = useRef();
6  const contextRef = useRef();
7
8  useEffect(() => {
9    contextRef.current = canvasRef.current.getContext("2d");
10    const constraints = {
11      video: { width: { min: 1280 }, height: { min: 720 } },
12    };
13    navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
14      inputVideoRef.current.srcObject = stream;
15    });
16  }, []);
17  return (
18    <div className="App">
19      <video autoPlay ref={inputVideoRef} />
20      <canvas ref={canvasRef} width={1280} height={720} />
21    </div>
22  );
23}
24export default App;

Once you save and refresh the app, you should see someone you recognize on your browser!

Installing Dependencies

We will now install MediaPipe's selfie_segmentation library.

npm install @mediapipe/selfie_segmentation

We can now import selfieSegmentation at the top of the file.

1import { SelfieSegmentation } from "@mediapipe/selfie_segmentation";

Flow of Bytes

Before we go further, let's talk about the flow of bytes in our app. Our camera sends a stream of data to our browser, which gets passed to the video element. Then a frame/image at a point in time from the video element is sent to MediaPipe (locally), which is then run through the ML model to provide two types of results:

  1. a segmentation mask
  2. an image to draw on top of the segmentation mask

What that means is that we will have to constantly feed MediaPipe with data and MediaPipe will keep returning segmented data.

Selfie Segmentation

SelfieSegmentation has a constructor , a setOptions method and an onResults method. We will continue building our app by utilizing each of these.

Selfie Segmentation constructor - Inside of our existing useEffect, we instantiate an instance of selfieSegmentation. The locateFile argument can be your own data or the one below is the default one provided by MediaPipe.

1const selfieSegmentation = new SelfieSegmentation({
2  locateFile: (file) =>
3    "https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}",
4});

Next, we setOptions also inside the same useEffect.

1selfieSegmentation.setOptions({
2  modelSelection: 1,
3  selfieMode: true,
4});

modelSelection is 0 or 1. The general model is 0 and the landscape model is 1. Google Meets uses a variant of landscape. selfieMode will simply flip the image horizontally.

Send a Frame and Loop

Lastly, we want to create a sendToMediaPipe (also inside the useEffect). This method will call the send method on selfieSegmentation sending it a frame of inputVideoRef and constantly call itself via requestAnimationFrame . One thing to note is that there is a delay from when the browser requests a feed from the camera and when the camera sends back data. During this delay, the video element has no frames, but selfieSegmentation doesn't know that and always expects a frame or it will crash. So we have to check and ensure that the videoWidth is 0.

1const sendToMediaPipe = async () => {
2  // If there video isn't ready yet, just loop again
3  if (inputVideoRef.current.videoWidth) {
4    await selfieSegmentation.send({ image: inputVideoRef.current });
5  }
6  requestAnimationFrame(sendToMediaPipe);
7};

Let's deal with the result!

Finally, when we get results back, we want to do something with these results. Let's create a function called onResults. This can be inside or outside the useEffect.

1const onResults = (results) => {
2  contextRef.current.save();
3  contextRef.current.clearRect(
4    0,
5    0,
6    canvasRef.current.width,
7    canvasRef.current.height
8  );
9  contextRef.current.drawImage(
10    results.segmentationMask,
11    0,
12    0,
13    canvasRef.current.width,
14    canvasRef.current.height
15  );
16  // Only overwrite existing pixels.
17  contextRef.current.globalCompositeOperation = "source-out";
18  contextRef.current.fillStyle = "#00FF00";
19  contextRef.current.fillRect(
20    0,
21    0,
22    canvasRef.current.width,
23    canvasRef.current.height
24  );
25
26  // Only overwrite missing pixels.
27  contextRef.current.globalCompositeOperation = "destination-atop";
28  contextRef.current.drawImage(
29    results.image,
30    0,
31    0,
32    canvasRef.current.width,
33    canvasRef.current.height
34  );
35
36  contextRef.current.restore();
37};

Here are a few of the things that are happening.

  • Clearing the canvas
  • We draw on the results we get from the segmentationMask, which are a set of pixes anywhere on our 1280 x 720 canvas that aren't determined to be human
  • We turn all those pixels green. Note: source-out colours segmentationMask (the surroundings) and source-in colours the subject
  • We then draw on the pixels we get from the image result on top

We pass the onResults method to selfieSegmentation.onResults (inside the useEffect).

1selfieSegmentation.onResults(onResults);

For more information, check out MediaPipe.

If you have any questions, feel free to reach out to me @_tkoriginal on Twitter.

Full Code Sample

1import { useEffect, useRef } from "react";
2import { SelfieSegmentation } from "@mediapipe/selfie_segmentation";
3import "./App.css";
4
5function App() {
6  const inputVideoRef = useRef();
7  const canvasRef = useRef();
8  const contextRef = useRef();
9
10  useEffect(() => {
11    contextRef.current = canvasRef.current.getContext("2d");
12    const constraints = {
13      video: { width: { min: 1280 }, height: { min: 720 } },
14    };
15    navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
16      inputVideoRef.current.srcObject = stream;
17      sendToMediaPipe();
18    });
19
20    const selfieSegmentation = new SelfieSegmentation({
21      locateFile: (file) =>
22        `https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}`,
23    });
24
25    selfieSegmentation.setOptions({
26      modelSelection: 1,
27      selfieMode: true,
28    });
29
30    selfieSegmentation.onResults(onResults);
31
32    const sendToMediaPipe = async () => {
33      if (!inputVideoRef.current.videoWidth) {
34        console.log(inputVideoRef.current.videoWidth);
35        requestAnimationFrame(sendToMediaPipe);
36      } else {
37        await selfieSegmentation.send({ image: inputVideoRef.current });
38        requestAnimationFrame(sendToMediaPipe);
39      }
40    };
41  }, []);
42
43  const onResults = (results) => {
44    contextRef.current.save();
45    contextRef.current.clearRect(
46      0,
47      0,
48      canvasRef.current.width,
49      canvasRef.current.height
50    );
51    contextRef.current.drawImage(
52      results.segmentationMask,
53      0,
54      0,
55      canvasRef.current.width,
56      canvasRef.current.height
57    );
58    // Only overwrite existing pixels.
59    contextRef.current.globalCompositeOperation = "source-out";
60    contextRef.current.fillStyle = "#00FF00";
61    contextRef.current.fillRect(
62      0,
63      0,
64      canvasRef.current.width,
65      canvasRef.current.height
66    );
67
68    // Only overwrite missing pixels.
69    contextRef.current.globalCompositeOperation = "destination-atop";
70    contextRef.current.drawImage(
71      results.image,
72      0,
73      0,
74      canvasRef.current.width,
75      canvasRef.current.height
76    );
77
78    contextRef.current.restore();
79  };
80
81  return (
82    <div className="App">
83      <video autoPlay ref={inputVideoRef} />
84      <canvas ref={canvasRef} width={1280} height={720} />
85    </div>
86  );
87}
88
89export default App;