The Beginner's Guide to Understanding WebRTC

Software Engineer

July 5, 2022

6 min

WebRTC, or Real-Time Communication for the Web, is an open-source project supported by Apple, Google, Microsoft, Mozilla, and many others. It allows for voice, video, and data to be sent between peers (two or more computers/devices that are connected). WebRTC is currently supported by all major browsers and native clients on all major platforms.

WebRTC can be slightly overwhelming to learn. There are so many concepts and terms scattered throughout the official documentation, and it takes a little while to fully grasp. It has been built on top of many open standards and protocols that are masterfully stitched together to make the sum better than its parts. Most of these protocols pre-date WebRTC by many years. Fortunately, though, there are many awesome tools available to make writing WebRTC applications easier, such as Simple WebRTC!

Why WebRTC?

Traditionally, when two nodes (computers or mobile devices) want to communicate with each other, they use a server. All the video, audio, and data would pass through this server as a relay. The main issue with this is it can get very expensive to host, and having a server in the middle can have a couple of disadvantages. For example, communication through the server can add latency. A very basic example of this is if two devices having a video call are a block away, but the relay server is hosted somewhere very far away. The other issue is scale. As more and more users begin to use the server, more resources are needed to handle this massive amount of data.

Audio and Video transmitted via server

The ideal scenario that we want is for the devices to talk directly to each other (peer-to-peer), rather than our devices having to stream audio and video through a server (peer-to-server-to-peer). The benefits of this are lower cost, it's more environmentally friendly, and there is improved privacy for users. All of this sounds great, but how does it work?

Server used to transmit initial WebRTC related data and then the video and audio is transmitted Peer-to-Peer

How WebRTC Works

There are so many concepts baked into WebRTC, and it's very easy to get lost when you're picking it up. We'll start with a super simple example to help clarify some concepts.

In our example, Rick and Morty want to hop on a video call. Each of them have their own computer on their own home internet connection, but how does Rick's computer know how to find Morty's computer? We can break WebRTC down into 4 steps from the moment Rick and Morty decide that they want to talk.

Signaling - A server called a signaling server tells Rick that Morty is calling, and to check if Morty has accepted the call. Once they are both ready, the signaling server will tell each of them about the various events, such as each other's IP address, the type of call (video/audio), etc.
STUN - this stands for Session Traversal Utilities for NAT. Rick and Morty's computers have a public address, so we need another server (a STUN server) that identifies each of their public IP addresses. On a super basic level, it tells Rick and Morty what each other's IP addresses are so that they can talk to each other. There is also something called a TURN server, which is used when the STUN server is unable to get the necessary information due to firewalls and other network issues. In this case, the TURN server would act is a traditional server (a backup) since a Peer-to-Peer connection isn't possible, and would stream Rick and Morty's audio and video through it to each other (peer-to-server-to-peer).
Securing - Since these calls have audio and video, Rick and Morty might be talking about sensitive things. There are protocols in place that allow data, audio, and video to be encrypted between them. Video and audio streams that leave Rick's computer are encrypted in flight and not decrypted until they reach Morty's computer for everyone's privacy and safety.
Communicating - Once all the necessary data has been exchanged through the signaling server and STUN server, the peers can then communicate through their audio and video streams.

Below is a more technical example of an actual WebRTC flow, and how the browsers communicate with each other.

Rick joins a chat room called /room/basement, and he is waiting for Morty to join. Rick's web browser tells the signaling server that he's ready to chat.
Morty then joins the same call in the /room/basement chat room, and his computer tells the signaling server that he has also joined.
If all the conditions are met to initiate the call, the signaling server will start a handshake between the two users, and many WebRTC events will fire off.
Rick's browser will call a getUserMedia method that captures audio and/or video from his computer.
Rick's computer will create a connection called RTCPeerConnection. Morty will also create his own unique connection later.
Rick will then choose which microphone and video camera he'd like to use for the call, and those "tracks" will be added to the connection that his browser created.
Rick will then create an Offer (through the STUN servers) and add it to the connection. The Offer is something that will be sent to Morty soon, and it will tell Morty all of the details about Rick, including his public IP address.
The connection starts to talk to the STUN server.
The STUN servers will start returning ICE Candidates.
Note: On a high level, ICE candidates are just a fancy term for describing if the call needs to happen through a STUN or TURN connection, as well as more granular technical details about things like codec information, bandwidth, and other details that help make the connection as reliable as possible between the two users.
The Offer is then sent to the signalling server.
The signalling server will send the offer to Morty.
Morty's browser will then take the same steps that Rick's did. It creates an RTCPeerConnection, it adds Rick's Offer to his own connection, it gets his audio and video device details, and it adds them to the connection.
Morty's browser will then talk to the STUN server and create an Answer, and add it to the connection. The Answer and Offer are basically the same.
Adding the Answer to the connection will cause Morty's connection to start receiving ICE candidates from the STUN server as well.
Morty will also send the Answer to the signaling server for Rick to receive and add to his connection.
Now that all of the details have been gathered about Rick and Morty, their browsers will each start sending ICE candidates to each other, and the call will begin.

Whew! That's a lot of steps. As you can see, WebRTC is quite complicated, and it gets even messier when there are more than two users communicating with each other. Fortunately, there are a lot of amazing tools out there to make writing WebRTC less complicated, such as Simple WebRTC.

Keep an eye out for more articles around WebRTC, and some practical examples very soon. We'll be walking through common use cases and how we create memorable digital experiences with WebRTC very soon!

Opinion

Need some more help with WebRTC? Reach out!