facebook chat 【转】

Facebook Chat, offered a nice set of software engineering challenges:

Real-time presence notification:

The most resource-intensive operation performed in a chat system is not sending messages. It is rather keeping each online user aware of the online-idle-offline states of their friends, so that conversations can begin.

The naive implementation of sending a notification to all friends whenever a user comes online or goes offline has a worst case cost of O(average friendlist size * peak users * churn rate) messages/second, where churn rate is the frequency with which users come online and go offline, in events/second. This is wildly inefficient to the point of being untenable, given that the average number of friends per user is measured in the hundreds, and the number of concurrent users during peak site usage is on the order of several millions.

Surfacing connected users‘ idleness greatly enhances the chat user experience but further compounds the problem of keeping presence information up-to-date. Each Facebook Chat user now needs to be notified whenever one of his/her friends
(a) takes an action such as sending a chat message or loads a Facebook page (if tracking idleness via a last-active timestamp) or
(b) transitions between idleness states (if representing idleness as a state machine with states like "idle-for-1-minute", "idle-for-2-minutes", "idle-for-5-minutes", "idle-for-10-minutes", etc.).
Note that approach (a) changes the sending a chat message / loading a Facebook page from a one-to-one communication into a multicast to all online friends, while approach (b) ensures that users who are neither chatting nor browsing Facebook are nonetheless generating server load.

Real-time messaging:

Another challenge is ensuring the timely delivery of the messages themselves. The method we chose to get text from one user to another involves loading an iframe on each Facebook page, and having that iframe‘s Javascript make an HTTP GET request over a persistent connection that doesn‘t return until the server has data for the client. The request gets reestablished if it‘s interrupted or times out. This isn‘t by any means a new technique: it‘s a variation of Comet, specifically XHR long polling, and/or BOSH.

Having a large-number of long-running concurrent requests makes the Apache part of the standard LAMP stack a dubious implementation choice. Even without accounting for the sizeable overhead of spawning an OS process that, on average, twiddles its thumbs for a minute before reporting that no one has sent the user a message, the waiting time could be spent servicing 60-some requests for regular Facebook pages. The result of running out of Apache processes over the entire Facebook web tier is not pretty, nor is the dynamic configuration of the Apache process limits enjoyable.

Distribution, Isolation, and Failover:

Fault tolerance is a desirable characteristic of any big system: if an error happens, the system should try its best to recover without human intervention before giving up and informing the user. The results of inevitable programming bugs, hardware failures, et al., should be hidden from the user as much as possible and isolated from the rest of the system.

The way this is typically accomplished in a web application is by separating the model and the view: data is persisted in a database (perhaps with a separate in-memory cache), with each short-lived request retrieving only the parts relevant to that request. Because the data is persisted, a failed read request can be re-attempted. Cache misses and database failure can be detected by the non-database layers and either reported to the user or worked around using replication.

While this architecture works pretty well in general, it isn‘t as successful in a chat application due to the high volume of long-lived requests, the non-relational nature of the data involved, and the statefulness of each request.

For Facebook Chat, we rolled our own subsystem for logging chat messages (in C++) as well as an epoll-driven web server (in Erlang) that holds online users‘ conversations in-memory and serves the long-polled HTTP requests. Both subsystems are clustered and partitioned for reliability and efficient failover. Why Erlang? In short, because the problem domain fits Erlang like a glove. Erlang is a functional concurrency-oriented language with extremely low-weight user-space "processes", share-nothing message-passing semantics, built-in distribution, and a "crash and recover" philosophy proven by two decades of deployment on large soft-realtime production systems.

Glueing with Thrift:

Despite those advantages, using Erlang for a component of Facebook Chat had a downside: that component needed to communicate with the other parts of the system. Glueing together PHP, Javascript, Erlang, and C++ is not a trivial matter. Fortunately, we have Thrift. Thrift translates a service description into the RPC glue code necessary for making cross-language calls (marshalling arguments and responses over the wire) and has templates for servers and clients. Since going open source a year ago (we had the gall to release it on April Fool‘s Day, 2007), the Thrift project has steadily grown and improved (with multiple iterations on the Erlang binding). Having Thrift available freed us to split up the problem of building a chat system and use the best available tool to approach each sub-problem.

facebook chat 【转】

时间: 2024-12-11 20:54:17

facebook chat 【转】的相关文章

Facebook的体系结构分析---外文转载

Facebook的体系结构分析---外文转载 From various readings and conversations I had, my understanding of Facebook's current architecture is: Web front-end written in PHP. Facebook's HipHop Compiler [1] then converts it to C++ and compiles it using g++, thus providi

facebook design question 总结

http://blog.csdn.net/sigh1988/article/details/9790337 这里原帖地址: http://www.mitbbs.com/article_t/JobHunting/32492515.html 以下为转载内容 ===========================我是分割线================== 稍微总结一下 1. 入门级的news feedhttp://www.quora.com/What-are-best-practices-for-

为什么我要选择erlang+go进行服务器架构

服务器非业余研究http://blog.csdn.net/erlib 作者Sunface 估计很多同学看到这里都会觉得迷惑,go的大名已经如雷贯耳了,但是erlang?这个东东是神马?难道是编程语言?怎么从来没听说过. 这里请允许我先介绍一下使用Erlang开发的比较有名的应用: 一:whatsapp 只凭32个技术人员,如何应付4.5亿的用户?对于刚刚被Facebook用190亿美元收购的WhatsApp来说,答案是Erlang--一种诞生于上世纪80年代的编程语言,终于在此时走到了聚光灯下.

Solutions to fix IDM has been registered with a fake serial number

Solutions to fix IDM has been registered with a fake serial number: There are two methods to fix IDM has been registered with a fake serial number error. Prefer using the first one method as it is the better trick to fix the error. If the first metho

What is a good EPUB reader on Linux

Last updated on August 20, 2014 Authored by Adrien Brochard 12 Comments If the habit on reading books on electronic tablets is still on its way, reading books on a computer is even rarer. It is hard enough to focus on the classics of the 16th century

浏览器过程

What really happens when you navigate to a URL --> As a software developer, you certainly have a high-level picture of how web apps work and what kinds of technologies are involved: the browser, HTTP, HTML, web server, request handlers, and so on. In

【转】WP8.1开发人员预览版本已知 Bug

偶的 Lumia 920 已经升级到最新的 8.1 开发人员预览版本,使用中没有发现什么问题. 可能是因为偶玩手机的情况比较少吧!忽然看到 MS 停止此版本的更新,并说明有很多的 BUG,偶就郁闷了. 以下是从网络上复制过来的,大家看看吧. Windows Phone 8.1开发者预览版推出近1周,抢先体验和测试多多少少遇到了一些问题,有的是系统固件方面的问题,有的官方或第三方应用方面的问题,还有的是系统功能的改变带来的问题. 不想尝试新版WP8.1或者嫌麻烦的机友可以直接等今夏开始的WP8.1

Android XMPP服务器, BOSH(Http-Binding)和WEB客户端搭建

目标: 搭建一个XMPP服务器, 实现在web page上用javascript与自己XMPP服务器通信, 匿名登录并与任何一个XMPP(Jabber)帐户通信. (Gtalk目前尚有问题) XMPP服务器可能不是必须的(见下文, 我没有尝试) 环境与配置: XMPP服务器:ejabberd文档HTTP-Binding: 使用ejabberd搭建, 5280端口.Javascript Client:Strophe文档 安装Ejabberd yuminstallejabberd #apt-get

WP8.1开发者预览版本号已知 Bug

偶的 Lumia 920 已经升级到最新的 8.1 开发者预览版本号,使用中没有发现什么问题. 可能是由于偶玩手机的情况比較少吧!忽然看到 MS 停止此版本号的更新,并说明有非常多的 BUG,偶就郁闷了. 下面是从网络上复制过来的,大家看看吧. Windows Phone 8.1开发人员预览版推出近1周,抢先体验和測试多多少少遇到了一些问题,有的是系统固件方面的问题,有的官方或第三方应用方面的问题,还有的是系统功能的改变带来的问题. 不想尝试新版WP8.1或者嫌麻烦的机友能够直接等今夏開始的WP