Facebook的体系结构分析---外文转载

Facebook的体系结构分析---外文转载

From various readings and conversations I had, my understanding of Facebook‘s current architecture is:

  • Web front-end written in PHP. Facebook‘s HipHop Compiler [1] then converts it to C++ and compiles it using g++, thus providing a high performance templating and Web logic execution layer.
  • Because of the limitations of relying entirely on static compilation, Facebook‘s started to work on a HipHop Interpreter [2] as well as a HipHop Virtual Machine which translate PHP code to HipHop ByteCode [3].
  • Business logic is exposed as services using Thrift [4]. Some of these services are implemented in PHP, C++ or Java depending on service requirements (some other languages are probably used...)
  • Services implemented in Java don‘t use any usual enterprise application server but rather use Facebook‘s custom application server. At first this can look as wheel reinvented but as these services are exposed and consumed only (or mostly) using Thrift, the overhead of Tomcat, or even Jetty, was probably too high with no significant added value for their need.
  • Persistence is done using MySQL, Memcached [5], Hadoop‘s HBase [6]. Memcached is used as a cache for MySQL as well as a general purpose cache.
  • Offline processing is done using Hadoop and Hive.
  • Data such as logging, clicks and feeds transit using Scribe [7] and are aggregating and stored in HDFS using Scribe-HDFS [8], thus allowing extended analysis using MapReduce
  • BigPipe [9] is their custom technology to accelerate page rendering using a pipelining logic
  • Varnish Cache [10] is used for HTTP proxying. They‘ve prefered it for its high performance and efficiency [11].
  • The storage of the billions of photos posted by the users is handled by Haystack, an ad-hoc storage solution developed by Facebook which brings low level optimizations and append-only writes [12].
  • Facebook Messages is using its own architecture which is notably based on infrastructure sharding and dynamic cluster management. Business logic and persistence is encapsulated in so-called ‘Cell‘. Each Cell handles a part of users ; new Cells can be added as popularity grows [13]. Persistence is achieved using HBase [14].
  • Facebook Messages‘ search engine is built with an inverted index stored in HBase [15]
  • Facebook Search Engine‘s implementation details are unknown as far as I know
  • The typeahead search uses a custom storage and retrieval logic [16]
  • Chat is based on an Epoll server developed in Erlang and accessed using Thrift [17]
  • They‘ve built an automated system that responds to monitoring alerts by launching the appropriated repairing workflow, or escalating to humans if the outage couldn‘t be overcome [18].

About the resources provisioned for each of these components, some information and numbers are known:

  • Facebook is estimated to own more than 60,000 servers [18]. Their recent datacenter in Prineville, Oregon is based on entirely self-designed hardware [19] that was recently unveiled as Open Compute Project [20].
  • 300 TB of data is stored in Memcached processes [21]
  • Their Hadoop and Hive cluster is made of 3000 servers with 8 cores, 32 GB RAM, 12 TB disks that is a total of 24k cores, 96 TB RAM and 36 PB disks [22]
  • 100 billion hits per day, 50 billion photos, 3 trillion objects cached, 130 TB of logs per day as of july 2010 [22]

[1] HipHop for PHPhttp://developers.facebook.com/b...
[2] Making HPHPi Fasterhttp://www.facebook.com/note.php...
[3] The HipHop Virtual Machinehttp://www.facebook.com/note.php...
[4] Thrifthttp://thrift.apache.org/
[5] Memcachedhttp://memcached.org/
[6] HBasehttp://hbase.apache.org/
[7] Scribehttps://github.com/facebook/scribe
[8] Scribe-HDFShttp://hadoopblog.blogspot.com/2...
[9] BigPipehttp://www.facebook.com/notes/fa...
[10] Varnish Cachehttp://www.varnish-cache.org/
[11] Facebook goes for Varnishhttp://www.varnish-software.com/...
[12] Needle in a haystack: efficient storage of billions of photos:http://www.facebook.com/note.php...
[13] Scaling the Messages Application Back Endhttp://www.facebook.com/note.php...
[14] The Underlying Technology of Messageshttps://www.facebook.com/note.ph...
[15] The Underlying Technology of Messages Tech Talk:http://www.facebook.com/video/vi...
[16] Facebook‘s typeahead search architecturehttp://www.facebook.com/video/vi...
[17] Facebook Chathttp://www.facebook.com/note.php...
[18] Who has the most Web Servers?http://www.datacenterknowledge.c...
[19] Building Efficient Data Centers with the Open Compute Project:http://www.facebook.com/note.php...
[20] Open Compute Projecthttp://opencompute.org/
[21] Facebook‘s architecture presentation at Devoxx 2010:http://www.devoxx.com
[22] Scaling Facebook to 500 millions users and beyond:http://www.facebook.com/note.php...

时间: 2024-11-08 01:53:11

Facebook的体系结构分析---外文转载的相关文章

中小企业监控体系构建实战(转载)

中小企业监控体系构建实战 高效运维 | 2015-10-28 07:26 编辑 高浩淼(整理) 作者介绍 赵舜东 江湖人称赵班长,曾在武警某部负责指挥自动化的架构和运维工作,2008年退役后一直从事互联网运维工作.曾带团队负责国内某食品电商的运维工作,即将出版的<saltstack入门与实践>作者之一. 主题简介 我们今天的话题是<中小企业监控体系构建实战>,前期分享了<中小企业自动化部署实战>还没有看到的朋友可以先阅读下,这样也能明白为何要定位中小企业.监控这个话题实

S5pv210中断体系结构分析

我们按照Tiny210官方的裸板程序来梳理S5PV210的中断体系. 关于 S5PV210 的中断体系结构 S5PV210 的中断控制器是由 4 个向量中断控制器(VIC). ARM PrimeCell PL192 和 4 个 TrustZone Interrupt Controller (TZIC)共同组成. S5PV210 共支持 93 个中断源(具体见官方手册). 首先看 Start.S .global _start .global IRQ_handle _start: @ 关闭看门狗 l

长安.ARM体系结构分析

作者:华清远见讲师 有许多同学问我,老师嵌入式行业发展前景如何. 嵌入式从来就不是一个行业,嵌入式只是一类技术的组合,而且这"一类技术",还在随着时间不断变化. 技术是用来解决问题的.你应当首先思考,你准备解决什么问题.而无论是Linux.Android.ARM.C等等我在这里教你的每一项具体的技术,都无法单独解决问题. 停止问我"嵌入式还是Android好"这种无聊的问题吧,也停止思考"我应该是做应用层还是底层"这种浅薄的人生.去发现这个社会的

Java虚拟机体系结构分析

下图是JAVA虚拟机的结构图: 每个Java虚拟机都有一个类装载子系统,它根据给定的全限定名来装入类型(类或接口).同样,每个Java虚拟机都有一个执行引擎,它负责执行那些包含在被装载类的方法中的指令. 当JAVA虚拟机运行一个程序时,它需要内存来存储许多信息,Java虚拟机把这些信息都组织到几个“运行时数据区”中,以便于管理.运行时数据区共包括五个部分(方法区.Java堆区.Java栈区.程序计数器.本地方法栈). 1.类加载子系统 在JAVA虚拟机中,负责查找并加载类型的那部分被称为类加载子

【Oracle】8.Orecle体系结构分析

概念 什么是Oracle数据库? 基本上,有Oracle数据库的两个主要组成部分 - 实例和数据库本身.实例包括一些内存结构和后台进程,而数据库是指磁盘资源.图1会告诉你之间的关系. 实例 该实例分析(SGA + PGA +后台进程)交互过程. 内存结构和后台进程contitute一个实例.内存结构本身由系统全局区(SGA),程序全局区(PGA),以及一个可选的区域.在另一方面,强制性的后台进程数据库写进程(DBWn),日志写入(LGWR),检查点(CKPT)系统监视器(SMON)和进程监视器(

计算机网络体系结构分析

一.基本概念 计算机网络按照体系结构划分可分为七层协议模型及ISO/OSI(国际标准化组织/开放式系统互联).四层协议模型及TCP/IP协议模型.和OSI和TCP/IP相结合的五层协议模型.三者关系入下图所示. 二.网络层级 应用层: 简介:为应用程序提供服务并规定通信程序中通信协议的相关细节 通信单元:报文 典型协议:HTTP.FTP.POP3.IMAP.DNS 表示层: 简介:负责上下层间交换数据的语法和与语义,具体负责数据交换.加密.及压缩 通信单元: 典型协议: 会话层: 简介:负责建立

存储相关的基于Intel体系的计算机体系结构演进

存储相关的基于Intel体系的计算机体系结构演进2 磁盘是怎么记录0和1以及感知的,磁头结构3 HMR PMR HAMR SMRTDMR,以及磁头定位纠偏原理4 磁盘寻道演示及其他5 混合硬盘.冲氦硬盘.磁盘节能相关6 IP硬盘7 内核IO路径.SCSI协议体系结构8 主流Raid类型原理,Raid卡架构,Raid卡电容+Flash保护方案9 NAND Flash组成和读写原理及性能10 主流Flash产品介绍11 Flash控制器内部架构分析12 NVMe及SFF8639接口13 NVRAM.

软件体系结构基本概念汇总

这门课与UML建模,程序设计方法学一样,都是站在比较高的角度来看整个软件结构.并不是对算法,或者语言的关注.如果以后有志于成为软件架构师,就应该好好学这门课.现在我把自己整理的这门课的资料与大家分享. 二.名词解释(每题2分,共20分) 1.B/S(期中) 答:浏览器/服务器风格,是三层应用结构的一种实现方式. 具体结构:浏览器/Web服务器/数据库服务器. 2.C/S(期中) 答:客户/服务器风格,是基于资源不对等,且为共享而提出来的,定义了工作站如何与服务器相连,以实现数据和应用分布到多个处

软件体系结构原理、方法与实践总结

第1章:软件体系结构概论 什么是软件危机,软件危机的具体表现有哪些? 软件危机:落后的软件生产方式无法满足迅速增长的计算机软件需求,从而导致软件开发与维护过程中出现一系列严重问题的现象. 软件危机的表现: 软件成本日益增长,开发进度难以控制,软件质量差,软件维护困难 产生软件危机的原因,如何克服软件危机? 产生软件危机的原因有用户需求不明确,缺乏正确的理论指导,软件规模越来越大,软件复杂度越来越高. 人们面临的不光是技术问题,更重要的是管理问题.要提高软件开发效率,提高软件产品质量,必须采用工程