WebRTC in the real world: STUN, TURN and signaling

What is signaling?

Signaling is the process of coordinating communication. In order for a WebRTC application to set up a ‘call‘, its clients need to exchange information:

  • Session control messages used to open or close communication.
  • Error messages.
  • Media metadata such as codecs and codec settings, bandwidth and media types.
  • Key data, used to establish secure connections.
  • Network data, such as a host‘s IP address and port as seen by the outside world.

This signaling process needs a way for clients to pass messages back and forth. That mechanism is not implemented by the WebRTC APIs: you need to build it yourself. We describe below some ways to build a signaling service. First, however, a little context...

Why is signaling not defined by WebRTC?

To avoid redundancy and to maximize compatibility with established technologies, signaling methods and protocols are not specified by WebRTC standards. This approach is outlined by JSEP, theJavaScript
Session Establishment Protocol
:

The thinking behind WebRTC call setup has been to fully specify and control the media plane, but to leave the signaling plane up to the application as much as possible. The rationale is that different applications may prefer to use different protocols,
such as the existing SIP or Jingle call signaling protocols, or something custom to the particular application, perhaps for a novel use case. In this approach, the key information that needs to be exchanged is the multimedia session description, which specifies
the necessary transport and media configuration information necessary to establish the media plane.

JSEP‘s architecture also avoids a browser having to save state: that is, to function as a signaling state machine. This would be problematic if, for example, signaling data was lost each time a page was reloaded. Instead, signaling state can be saved on
a server.

JSEP architecture

JSEP requires the exchange between peers of offer and answer: the media metadata mentioned above. Offers and answers are communicated in Session Description Protocol format (SDP), which look like this:

v=0
o=- 7614219274584779017 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
a=msid-semantic: WMS
m=audio 1 RTP/SAVPF 111 103 104 0 8 107 106 105 13 126
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:W2TGCZw2NZHuwlnf
a=ice-pwd:xdQEccP40E+P0L5qTyzDgfmW
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=mid:audio
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:9c1AHz27dZ9xPI91YNfSlI67/EMkjHHIHORiClQe
a=rtpmap:111 opus/48000/2
…

Want to know what all this SDP gobbledygook actually means? Take a look at theIETF examples.

Bear in mind that WebRTC is designed so that the offer or answer can be tweaked before being set as the local or remote description, by editing the values in the SDP text. For example, thepreferAudioCodec()function in
apprtc.appspot.com can be used to set the default codec and bitrate. SDP is somewhat painful to manipulate with JavaScript, and there is discussion about whether future versions
of WebRTC should use JSON instead, but there aresome advantages to sticking with SDP.

RTCPeerConnection + signaling: offer, answer and candidate

RTCPeerConnection is the API used by WebRTC applications to create a connection between peers and communicate audio and video.

To initialise this process RTCPeerConnection has two tasks:

  • Ascertain local media conditions, such as resolution and codec capabilities. This is the metadata used for the offer and answer mechanism.
  • Get potential network addresses for the application‘s host, known as candidates.

Once this local data has been ascertained, it must be exchanged via a signaling mechanism with the remote peer.

Imagine Alice is trying to call Eve. Here‘s the full offer/answer mechanism in all its gory detail:

  1. Alice creates an RTCPeerConnection object.
  2. Alice creates an offer (an SDP session description) with the RTCPeerConnectioncreateOffer()method.
  3. Alice calls setLocalDescription() with his offer.
  4. Alice stringifies the offer and uses a signaling mechanism to send it to Eve.
  5. Eve calls setRemoteDescription() with Alice‘s offer, so that her RTCPeerConnection knows about Alice‘s setup.
  6. Eve calls createAnswer(), and the success callback for this is passed a local session description: Eve‘sanswer.
  7. Eve sets her answer as the local description by calling setLocalDescription().
  8. Eve then uses the signaling mechanism to send her stringified answer back to Alice.
  9. Alice sets Eve‘s answer as the remote session description using setRemoteDescription().

Strewth!

Alice and Eve also need to exchange network information. The expression ‘finding candidates‘ refers to the process of finding network interfaces and ports using theICE
framework
.

  1. Alice creates an RTCPeerConnection object with an onicecandidate handler.
  2. The handler is called when network candidates become available.
  3. In the handler, Alice sends stringified candidate data to Eve, via their signaling channel.
  4. When Eve gets a candidate message from Alice, she calls addIceCandidate(), to add the candidate to the remote peer description.

JSEP supports ICE Candidate Trickling, which allows the caller to incrementally provide candidates to the callee after the initial offer, and for the callee to begin acting on the call and setting up a connection without waiting for all candidates to arrive.

Coding WebRTC for signaling

Below is a W3C code example that summarises the complete signaling process. The code assumes the existence of some signaling mechanism,SignalingChannel. Signaling is discussed in greater detail below.

var signalingChannel = new SignalingChannel();
var configuration = {
  'iceServers': [{
    'url': 'stun:stun.example.org'
  }]
};
var pc;

// call start() to initiate

function start() {
  pc = new RTCPeerConnection(configuration);

  // send any ice candidates to the other peer
  pc.onicecandidate = function (evt) {
    if (evt.candidate)
      signalingChannel.send(JSON.stringify({
        'candidate': evt.candidate
      }));
  };

  // let the 'negotiationneeded' event trigger offer generation
  pc.onnegotiationneeded = function () {
    pc.createOffer(localDescCreated, logError);
  }

  // once remote stream arrives, show it in the remote video element
  pc.onaddstream = function (evt) {
    remoteView.src = URL.createObjectURL(evt.stream);
  };

  // get a local stream, show it in a self-view and add it to be sent
  navigator.getUserMedia({
    'audio': true,
    'video': true
  }, function (stream) {
    selfView.src = URL.createObjectURL(stream);
    pc.addStream(stream);
  }, logError);
}

function localDescCreated(desc) {
  pc.setLocalDescription(desc, function () {
    signalingChannel.send(JSON.stringify({
      'sdp': pc.localDescription
    }));
  }, logError);
}

signalingChannel.onmessage = function (evt) {
  if (!pc)
    start();

  var message = JSON.parse(evt.data);
  if (message.sdp)
    pc.setRemoteDescription(new RTCSessionDescription(message.sdp), function () {
      // if we received an offer, we need to answer
      if (pc.remoteDescription.type == 'offer')
        pc.createAnswer(localDescCreated, logError);
    }, logError);
  else
    pc.addIceCandidate(new RTCIceCandidate(message.candidate));
};

function logError(error) {
  log(error.name + ': ' + error.message);
}

To see the offer/answer and candidate exchange processes in action, take a look at the console log for the ‘single-page‘ video chat example atsimpl.info/pc. If you want more,
download a complete dump of WebRTC signaling and stats from the chrome://webrtc-internals page in Chrome or the opera://webrtc-internals page in Opera.

Peer discovery

This is fancy way of saying — how do I find someone to talk to?

For telephone calls we have telephone numbers and directories. For online video chat and messaging, we need identity and presence management systems, and a means for users to initiate sessions. WebRTC apps need a way for clients to signal to each other that
they want to start or join a call.

Peer discovery mechanisms are not defined by WebRTC and we won‘t go into the options here. The process can be as simple as emailing or messaging a URL: for video chat applications such astalky.io,tawk.com
and browsermeeting.com you invite people to a call by sharing a custom link. Developer Chris Ball has built an intriguingserverless-webrtc
experiment that enables WebRTC call participants to exchange metadata by any messaging service they like, such as IM, email or homing pigeon.

How can I build a signaling service?

To reiterate: signaling protocols and mechanisms are not defined by WebRTC standards. Whatever you choose, you‘ll need an intermediary server to exchange signaling messages and application data between clients. Sadly, a web app cannot simply shout into the
internet ‘Connect me to my friend!‘

Thankfully signaling messages are small, and mostly exchanged at the start of a call. In testing withapprtc.appspot.com andsamdutton-nodertc.jit.su
we found that, for a video chat session, a total of around 30–45 messages were handled by the signaling service, with a total size for all messages of around 10kB.

As well as being relatively undemanding in terms of bandwidth, WebRTC signaling services don‘t consume much processing or memory, since they only need to relay messages and retain a small amount of session state data (such as which clients are connected).

The signaling mechanism used to exchange session metadata can also be used to communicate application data. It‘s just a messaging service!

Pushing messages from the server to the client

A message service for signaling needs to be bidirectional: client to server and server to client. Bidirectional communication goes against the HTTP client/server request/response model, but various hacks such aslong
polling
have been developed over many years in order to push data from a service running on a web server to a web app running in a browser.

More recently, the EventSource API has been widely implemented. This enables ‘server-sent events‘: data sent from a web server to a browser client via HTTP. There‘s a simple demo atsimpl.info/es. EventSource is designed for one way
messaging, but it can be used in combination with XHR to build a service for exchanging signaling messages: a signaling service passes on a message from a caller, delivered by XHR request, by pushing it via EventSource to the callee.

WebSocket is a more natural solution, designed for full duplex client–server communication (messages can flow in both directions at the same time). One
advantage of a signaling service built with pure WebSocket or Server-Sent Events (EventSource) is that the back-end for these APIs can be implemented on a variety of web frameworks common to most web hosting packages, for languages such as PHP, Python and
Ruby.

About three quarters of browsers support WebSocket and, more importantly, all browsers that support WebRTC also support WebSocket, both on desktop and mobile.TLS should be used for all
connections, to ensure messages cannot be intercepted unencrypted, and also toreduce problems with proxy traversal. (For more information about WebSocket
and proxy traversal see theWebRTC chapter in Ilya Grigorik‘sHigh Performance Browser Networking. Peter Lubber‘sWebSocket
Cheat Sheet
has more information about WebSocket clients and servers.)

Signaling for the canonical apprtc.appspot.com WebRTC video chat application is accomplished via the Google App Engine Channel API, which uses Comet techniques (long polling) to enable signaling with push communication between the App Engine backend and the web client. (There‘s along-standing
bug
for App Engine to support WebSocket. Star the bug to vote it up!) There is adetailed code walkthrough of this app in theHTML5
Rocks WebRTC article
.

apprtc
in action

It is also possible to handle signaling by getting WebRTC clients to poll a messaging server repeatedly via Ajax, but that leads to a lot of redundant network requests, which is especially problematic for mobile devices. Even after a session has been established,
peers need to poll for signaling messages in case of changes or session termination by other peers. TheWebRTC Book app example takes this option, with some optimizations for polling frequency.

Scaling signaling

Although a signaling service consumes relatively little bandwidth and CPU per client, signaling servers for a popular application may have to handle a lot of messages, from different locations, with high levels of concurrency. WebRTC apps that get a lot
of traffic need signaling servers able to handle considerable load.

We won‘t go into detail here, but there are a number of options for high volume, high performance messaging, including the following:

  • eXtensible Messaging and Presence Protocol (XMPP), originally known as Jabber: a protocol developed for instant messaging that can be used for
    signaling. Server implementations include ejabberd and Openfire. JavaScript clients such as Strophe.js use BOSH to emulate bidirectional streaming, but for various reasons BOSH may not be as efficient as WebSocket, and for the same reasons may not scale well. (On a tangent:Jingle
    is an XMPP extension to enable voice and video; the WebRTC project uses network and transport components from thelibjingle
    library, a C++ implementation of Jingle.)
  • Open source libraries such as ZeroMQ (as used by TokBox for theirRumour service) andOpenMQ.
    NullMQ applies ZeroMQ concepts to web platforms, using the STOMP protocol over WebSocket.
  • Commercial cloud messaging platforms that use WebSocket (though they may fall back to long polling) such asPusher,Kaazing
    and PubNub. (PubNub also has an API for WebRTC.)
  • Commercial WebRTC platforms such as vLine.

(Developer Phil Leggetter‘s Real-Time Web Technologies Guide provides a comprehensive list of messaging services and libraries.)

Building a signaling service with Socket.io on Node

Below is code for a simple web application that uses a signaling service built withSocket.io onNode. The design of Socket.io makes
it simple to build a service to exchange messages, and Socket.io is particularly suited to WebRTC signaling because of its built-in concept of ‘rooms‘. This example is not designed to scale as a production-grade signaling service, but works well for a relatively
small number of users.

Socket.io uses WebSocket with the following fallbacks: Adobe Flash Socket, AJAX long polling, AJAX multipart streaming, Forever Iframe and JSONP polling. It has been ported to various backends, but is perhaps best known for its Node version, which we use
in the example below.

There‘s no WebRTC in this example: it‘s designed only to show how to build signaling into a web app. View the console log to see what‘s happening as clients join a room and exchange messages. OurWebRTC
codelab
gives step-by-step instructions how to integrate this example into a complete WebRTC video chat application. You can download the code fromstep
5 of the codelab repo
or try it out live at samdutton-nodertc.jit.su: open the URL in two browsers for video chat.

Here is the client, index.html:

<!DOCTYPE html>
<html>
  <head>
    <title>WebRTC client</title>
  </head>
  <body>
    <script src='/socket.io/socket.io.js'></script>
    <script src='js/main.js'></script>
  </body>
</html>

...and the JavaScript file main.js referenced in the client:

var isInitiator;

room = prompt('Enter room name:');

var socket = io.connect();

if (room !== '') {
  console.log('Joining room ' + room);
  socket.emit('create or join', room);
}

socket.on('full', function (room){
  console.log('Room ' + room + ' is full');
});

socket.on('empty', function (room){
  isInitiator = true;
  console.log('Room ' + room + ' is empty');
});

socket.on('join', function (room){
  console.log('Making request to join room ' + room);
  console.log('You are the initiator!');
});

socket.on('log', function (array){
  console.log.apply(console, array);
});

The complete server app:

var static = require('node-static');
var http = require('http');
var file = new(static.Server)();
var app = http.createServer(function (req, res) {
  file.serve(req, res);
}).listen(2013);

var io = require('socket.io').listen(app);

io.sockets.on('connection', function (socket){

  // convenience function to log server messages to the client
  function log(){
    var array = ['>>> Message from server: '];
    for (var i = 0; i < arguments.length; i++) {
      array.push(arguments[i]);
    }
      socket.emit('log', array);
  }

  socket.on('message', function (message) {
    log('Got message:', message);
    // for a real app, would be room only (not broadcast)
    socket.broadcast.emit('message', message);
  });

  socket.on('create or join', function (room) {
    var numClients = io.sockets.clients(room).length;

    log('Room ' + room + ' has ' + numClients + ' client(s)');
    log('Request to create or join room ' + room);

    if (numClients === 0){
      socket.join(room);
      socket.emit('created', room);
    } else if (numClients === 1) {
      io.sockets.in(room).emit('join', room);
      socket.join(room);
      socket.emit('joined', room);
    } else { // max two clients
      socket.emit('full', room);
    }
    socket.emit('emit(): client ' + socket.id + ' joined room ' + room);
    socket.broadcast.emit('broadcast(): client ' + socket.id + ' joined room ' + room);

  });

});

(You don‘t need to learn about node-static for this: it just makes the server simpler.)

To run this app on localhost, you need to have Node, socket.io and node-static installed. Node can be downloaded from nodejs.org (installation is straightforward and quick). To install socket.io and node-static, run Node Package Manager from a terminal in your application directory:

npm install socket.io
npm install node-static

To start the server, run the following command from a terminal in your application directory:

node server.js

From your browser, open localhost:2013. Open a new tab page or window in any browser and openlocalhost:2013 again. To see what‘s happening, check the console: in Chrome and Opera, you can access this via the DevTools with Command-Option-J
or Ctrl-Shift-J.

Whatever approach you choose for signaling, your backend and client app will — at the very least — need to provide services similar to this example.

Using RTCDataChannel for signaling

A signaling service is required to initiate a WebRTC session.

However, once a connection has been established between two peers, RTCDataChannel could, in theory, take over as the signaling channel. This might reduce latency for signaling — since messages fly direct — and help reduce signaling server bandwidth and processing
costs. We don‘t have a demo, but watch this space!

Signaling gotchas

  • RTCPeerConnection won‘t start gathering candidates until setLocalDescription() is called: this is mandated in theJSEP IETF draft.
  • Take advantage of Trickle ICE (see above): call addIceCandidate() as soon as candidates arrive.

Readymade signaling servers

If you don‘t want to roll your own, there are several WebRTC signaling servers available, which use Socket.io like the example above, and are integrated with WebRTC client JavaScript libraries:

...and if you don‘t want to write any code at all, complete commercial WebRTC platforms are available from companies such asvLine,OpenTok
andAsterisk.

For the record, Ericsson built a signaling server using PHP on Apache in the early days of WebRTC. This is now somewhat obsolete, but it‘s worth looking at the code if you‘re considering something similar.

时间: 2024-10-12 20:08:30

WebRTC in the real world: STUN, TURN and signaling的相关文章

真实场景中WebRTC 用到的服务 STUN, TURN 和 signaling

FQ收录转自:WebRTC in the real world: STUN, TURN and signaling WebRTC enables peer to peer communication. BUT... WebRTC still needs servers: For clients to exchange metadata to coordinate communication: this is called signaling. To cope with network addre

stun/turn服务器部署

一.简介 本文通过在服务器上安装coturn这个软件,实现搭建STUN服务器和TURN服务器. coturn 简介:是一个免费的开源的 TURN/STUN 服务器.coturn 服务器完整的实现了 STUN/TURN/ICE 协议,支持 P2P 穿透防火墙. STUN 服务器用于检测NAT类型. TURN 服务器是在点对点失败后用于通信中继. coturn的Github源码: https://github.com/coturn/coturn coturn的wiki使用说明: https://gi

STUN/TURN/ICE协议在P2P SIP中的应用(二)

1       说明 2       打洞和穿越的概念... 1 3       P2P中的打洞和穿越... 2 4       使用STUN系列 协议穿越的特点... 2 5       STUN/ TURN/ICE协议的关系... 3 6       STUN协议(RFC 5389) 3 6.1             为什么会用到STUN协议... 3 6.2             STUN协议的工作原理... 4 7       TURN协议... 4 7.1            

STUN, TURN, ICE介绍

STUN STUN协议为终端提供一种方式能够获知自己经过NAT映射后的地址,从而替代位于应用层中的私网地址,达到NAT穿透的目的.STUN协议是典型的Client-Server协议,各种具体应用通过嵌入STUN客户端与STUN Server端通讯来完成交互. 在典型的运用STUN进行NAT穿透的场景中,STUN客户端首先向位于公网上的STUN服务器 发送Binding Request消息,STUN服务器接收到请求消息后识别出经过NAT转换后的公网地址60.1.1.1:12345,将其附加在Bin

STUN/TURN/ICE协议在P2P SIP中的应用(一)

1           说明 本文详细描述了基于STUN系列协议实现的P2P SIP电话过程,其中涉及到了SIP信令的交互,P2P的原理,以及STUN.TURN.ICE的协议交互 本文所提到的各个服务单元的交互均使用UDP,不涉及TCP的打洞及其他和TCP相关的操作. 本文假设通信双方均没有防火墙对协议以及端口的限制. 本文不涉及客户端的资源的发布与查找. 本文适用于有一定基础的读者,比如说知道NAT设备的不同类型及其特点.知道STUN/TURN/ICE协议的基本概念.知道SIP协议的相关交互流

webrtc(2):Centos6 安装 stun/turn服务

本文的原文连接是: http://blog.csdn.net/freewebsys/article/details/47109183 未经博主允许不得转载. 1,关于stun和turn STUN(Simple Traversal of UDP over NATs,NAT 的UDP简单穿越)是一种网络协议,它允许位于NAT(或多重NAT)后的客户端找出自己的公网地址,查出自己位于哪种类型的NAT之后以及NAT为某一 个本地端口所绑定的Internet端端口.这些信息被用来在两个同时处于NAT 路由

ICE协议下NAT穿越的实现(STUN&amp;TURN)

正文: 一. 首先来简单讲讲什么是NAT? 原来这是因为IPV4引起的,我们上网很可能会处在一个NAT设备(无线路由器之类)之后.NAT设备会在IP封包通过设备时修改源/目的IP地址. 对于家用路由器来说, 使用的是网络地址端口转换(NAPT), 它不仅改IP, 还修改TCP和UDP协议的端口号, 这样就能让内网中的设备共用同一个外网IP. 举个例子, NAPT维护一个类似下表的NAT表: NAT映射 NAT设备会根据NAT表对出去和进来的数据做修改, 比如将192.168.0.3:8888发出

使用WebRTC搭建前端视频聊天室——数据通道篇

转自 使用WebRTC搭建前端视频聊天室——数据通道篇 在两个浏览器中,为聊天.游戏.或是文件传输等需求发送信息是十分复杂的.通常情况下,我们需要建立一台服务器来转发数据,当然规模比较大的情况下,会扩展成多个数据中心.这种情况下很容易出现很高的延迟,同时难以保证数据的私密性. 这些问题可以通过WebRTC提供的RTCDataChannel API来解决,他能直接在点对点之间传输数据.这篇文章将介绍如何创建并使用数据通道,并提供了一些网络上常见的用例 为了充分理解这篇文章,你可能需要去了解一些RT

使用WebRTC搭建前端视频聊天室——信令篇

博客原文地址 建议看这篇之前先看一下使用WebRTC搭建前端视频聊天室——入门篇 如果需要搭建实例的话可以参照SkyRTC-demo:github地址 其中使用了两个库:SkyRTC(github地址)和SkyRTC-client(github地址) 这两个库和demo都是我写的,如果有bug或是错误欢迎指出,我会尽力更正 前面的话 这篇文章讲述了WebRTC中所涉及的信令交换以及聊天室中的信令交换,主要内容来自WebRTC in the real world: STUN, TURN and s