Updated: Jan 29, 2019
Welcome. I'm Jang, a CTO of AirenSoft. Our first topic of the article on our blog was “What is Ultra-low latency video streaming?” Today, I will explain Web Real-Time Communication (WebRTC) as one of the protocols to achieve and implement Ultra-Low Latency (ULL).
What is WebRTC? As a web communication specification suggested by World Wide Web Consortium (W3C) in 2011, WebRTC supports Peer to Peer (P2P) in ultra-low latency and near real-time without any installation of separate applications like Plug-in in a web browser. Now in 2018, if you try to implement ULL in the web, WebRTC is your best choice because many modern web browsers support WebRTC.
If you study ULL but has yet to experience it, access AirenSoft's GitHub now and use OvenMediaEngine (OME) and OvenPlayer which are our open source projects. Oh! We also have a demo page. If you are interested, please come to play.
Current status of WebRTC standardization and browser support.
The API standards necessary for communication in the web through WebRTC are specified in “WebRTC 1.0: Real-time Communication Between Browser”. The standards began to be written as Working Draft on Oct. 27, 2011, and reach the step of Candidate Recommendation through their continuous revision. If you want to know the overall status of WebRTC standardization, go to "Web Real Time Communication Current Status". And as described earlier, a variety of latest web browsers support WebRTC. So, if you want to find what standards are implemented in main browsers, see "Web-Platform-Tests Dashboard".
What you need to know for using WebRTC.
If you try to use WebRTC, you need to know HTML5 well. That is because WebRTC makes use of most functions in HTML5. For example, if you want to transmit your video or voice input data of webcam and microphone to the other person, or if you want to send the other person's video/audio to your screen/speaker as output, you need to use a variety of APIs provided by HTML5 in order for implementation.
What is Signaling? To judge how and where data should be sent between peers over WebRTC, there needs to be an essential prerequisite step which is called "Signaling". In this step, candidate information on Session Description Protocol (SDP) as to How data should be sent and Interactive Connectivity Establishment (ICE) as to Where data should be sent should be exchanged. In fact, if you only use SDP, it can determine how and where to deliver the data. However, I have explained both SDP and ICE to you because modern browsers use SDP and ICE in combination. So, the below figure is a simple drawing of this process.
Here, there is one problem. Look at the above figure that the [???] part on the central-top. Do you know what that means? The signaling specification has not standardized and not even an object. This means that it is induced to be able to implement it in your own way. In other words, it gives you the right to decide to develop it for your situation. For instance, if you use to exchange information through an AJAX or WebSocket server, via e-mail or telephone, or by text message, there is nothing wrong. That is because there is no correct answer. Accordingly, a developer who uses WebRTC needs to think about the Signaling process.
Does something come to your mind? If so, you already have an answer. Otherwise, to give you a tip, I, as a CTO of AirenSoft, will tell you about our story briefly. Our OME includes a Signaling server based on WebSocket and has already determined our signaling specification. Therefore, a player of video/audio does signaling according to the standard. Is it too simple? OvenPlayer, our player, already has its standard applied. Please see the below figure.
#03. Session Traversal Utilities for NAT (STUN)
If you study WebRTC, the term you face most is "STUN". STUN is the protocol for processing NAT traversal in TCP or UDP. So, it is the protocol most used in P2P. And STUN is comprised of Server and Client. If the client requests the server to send information, it is possible to obtain the following information:
Do I use a NAT network?
If I’m in a NAT network, what kind of NAT (Full cone/Restricted cone/Symmetric) is used?
What is my Public IP?
What information is associated with other NAT traversal?
The information obtained by the client can be used to make such judgments as Direct communication between peers or Communication through Traversal Using Relays around NAT in the ICE of WebRTC. I will explain Traversal Using Relays around NAT in the next section.
If you are afraid of building a STUN server, use “Google STUN Server”. Since there are many open servers, you don’t need to make the server independently. Of course, you can also build a STUN server for your WebRTC service.
#04. Traversal Using Relays around NAT (TURN)
As I mentioned earlier, I will explain TURN with an example in this section. First, I assume that Peer1 is in a Public network and Peer2 is in a NAT network. According to the network environment discovery of Peer2 through STUN, it is impossible to receive data with Public IP; a type of NAT in use is Symmetric so that Peer2 fails to receive the data sent by Peer1. At this time, a TURN server can work a useful role in the way of receiving the data sent by Peer1 first and then sending it to Peer2.
Take a look at the example in RFC 5766. You can understand how TURN works. The below Figure 1 illustrates the communication between TURN Client in NAT network, Peer A in a different NAT, and Peer B in another different network.
#05. Transmission Type
In this section, I will briefly explain the main types of Video and Audio transmission WebRTC. WebRTC uses a variety of protocols including TCP/UDP and Stream Control Transmission Protocol (SCTP). Usually, video or audio is sent over UDP, and the data channel is sent over SCTP. Let me tell you more details about Video/Audio transmission. Video/Audio data uses Real-Time Protocol (RTP) for transmission in WebRTC. RTP is the standard protocol for faster data transmission between multiple end-points on the internet. It is the most suitable, given the fact that real-time transmission is important to WebRTC. However, we generally use UDP based RTP so that data can be lost in the middle of communication. At this time, if you don’t come up with any plan for data loss, it is impossible to play Video/Audio data well. But, don’t worry about that. Fortunately, WebRTC uses various the Forward Error Correction (FEC) mechanisms to minimize the damage made by data loss. For more about FEC standard, check “WebRTC Forward Error Correction Requirements”.
First, I will ask you a question. Let’s assume that you developed 1-to-1 video chatting service with the use of WebRTC APIs provided by a web browser. If two users access your service with the latest web browser that supports WebRTC, is it possible to use your service well?
The correct answer is “Yes!” or “No!”. If the users have the same OS and web browser, your service will work well. However, if one user has iOS Safari and the other has Android Chrome, your service may not work well. Why does it happen? That is because iOS Safari supports H.264 only and Android Chrome supports VP8 only. In other words, if the web browser of each user supports the same Codec, the service will work with no problem.
According to the standard, it is possible to use various kinds of Codecs in the world. In reality, the web browsers we generally use support several kinds of Codecs. For this reason, a few Codecs are actually used. According to my research, the web browsers we mostly use, such as Chrome, Firefox, Opera, and Safari, support either H.264 or VP8 as Video Codec and supports Opus as Audio Codec. If you want to know more about the Codec supported by your web browser, take a look at the offer SDP created by your web browser. Mostly, it supports either H.264 or VP8 as video and Opus as audio.
Now, how can we solve the problem? Developers need to think about it. In our OME solved this problem by using Live Transcoder to convert input video. For instance, if an input video as H.264, it is encoded into VP8. Therefore, the video is played with H.264 on iOS Safari, and with VP8 on Android Chrome.
I explained it somewhat to help you understand WebRTC. Do you still think that it is complicated? If you use OME and OvenPlayer, you don’t need to understand such complicated knowledge. But I think it is better for developers to understand its operation principles before development. That is the reason why I chose this topic. Next time, I will explain Security and SFU/MCU and how their main functions work in OME.
Thank you for your taking time to read this long article.