http://www.tigase.net/blog-entry/1mln-or-more-onli

By admin on May 29, 2011

I have been working on clustering code improvements in the Tigase server for last a few months to make it more reliable and better scale. In article about XMPP Service sharding - Tigase on Intel ATOMs I have presented some preliminary results on a small scale.

In last weeks I had a great opportunity to run several tests over the Tigase cluster of 10 nodes on much better hardware. The goal was to achieve 1mln online users connected to the cluster generating sensible traffic. More tests have been run to see how the cluster behaves with a different number of connections and under a different load.

Below are charts taken from two tests. One test with top 1mln 128k online users and moderate traffic and the second with peak 1mln 685k online users and very reduced traffic.

All tests were carried out until the number of connections reached its maximum and for some time after that to make sure we receive a stable service when connections start dropping.

The test for 1mln online users run with a moderate traffic, that is a message from each online user every 400 seconds and status change every 2800 seconds.

The other test for 1mln 500k online users ran with no additional traffic except user login, roster retrieval, initial presence broadcast and offline presence broadcast on connection close.

The roster size for all online users was 150 elements of which 20% (30) were online during the test and new connections rate was 100/sec.

If you are interested in more details, please continue reading...

I guess the first question which comes to your mind is why so low traffic. Especially looking in presented charts there is for sure room for more.

The CPU would most likely handle more, probably at least as much as twice more traffic and memory usage shouldn‘t grow much either as traffic generates only temporary objects.

Indeed, the average traffic was estimated to a message every 200 seconds and presence broadcast every 20 mins on each user connection.

The "high" traffic was estimated to a message every 100 seconds and presence broadcast every 10minutes.

Unfortunately as always with load tests, the problem was with generating enough traffic. I used Tsung 1.3 for testing which did really good job simulating user connections from 10 other machines, however it just couldn‘t do more than that.

Test environment used for tests

I had 21 identical machines at my disposal for duration of the tests: 2 x Xeon Quad 2.0GHz, 16GB RAM, 750GB SATA HDD, 1Gb ethernet.

One machine running Ubuntu Server 9.04 used as a database with MySQL 5.1 installed and tuned for the test.

10 machines running Ubuntu Server 9.04 with Tigase server installed in cluster mode, with Linux kernel and GC settings tuned for the test. Tigase server in version from SVN with some not yet committed changes.

10 machines running Proxmox 1.3 and Debian 5 in virtual machines. Tsung 1.3 on Erlang R13B01 was used as traffic and load generator.

Results

As we can see on attached charts both tests were quite successful.

Of course nobody wants to run a service for 1mln 600k online users with idle connections. The second test was executed only to check the installation limits. As we can see on the memory chart the server completely used up memory. So with 16GB of RAM not much more is possible. Traffic was on quite stable level as it was only generated by new user connections in the first phase, then by both new connections and closing connections in the second phase, hence the CPU load jump, and by closing connections only in third phase.

Much more interesting charts are for the 1mln online users testwith traffic on each connection. We can clearly see "steps" on the cluster traffic chart and less clear steps on the session manager traffic chart. They are related to presences updates "wave" which was starting every 2800 seconds. The CPU usage stayed at about 60% at peak time with plenty of room for more traffic. Memory consumption was quite high at about 70% at peak number of connections.

Other tests

As I mentioned before I have run several tests to see how the server works under a different conditions. There is for sure no room here to present all charts, however I could post them if there is an interest for that. Please send me a message or add comments to the article if you want to see more charts.

The server was tested under different loads:

  1. A message every 100 seconds and presence broadcast every 700 seconds on each connection.
  2. A message every 200 seconds and presence broadcast every 1400 seconds on each connection.
  3. A message every 400 seconds and presence broadcast every 2800 seconds on each connection.
  4. A message every 800 seconds and presence broadcast every 5600 seconds on each connection.
  5. No traffic except packets related to user login, roster retrieval, initial presence broadcast and offline presence broadcast.

Other tests I have run are listed below:

  1. 250k connections over plain TCP with load 1
  2. 250k connections over SSL with load 1
  3. 500k connections over plain TCP with load 1 and 2
  4. 500k connections over SSL with load 1, 2 and 3
  5. 750k connections over plain TCP with load 2 and 3
  6. 1mln connections over plain TCP with load 2 and 3
  7. 1mln 500k connections over plain TCP with load 5

Please note, given max number of connections is a target number, actual tests usually reached more.

Charts

All charts display plots for all 10 cluster nodes with a different colour for each node. In most cases only one plot (blue) is visible as user distribution was very even, hence load was the same. This is especially confusing for connections chart when all 10 plots look like a single blue line.

While chart plots display values for a particular node, the chart title displays sum for all nodes, the max is the maximum total registered by the monitor.

时间: 2024-08-28 03:15:08

http://www.tigase.net/blog-entry/1mln-or-more-onli的相关文章

django orm总结--解决查询结果不区分大小写问题

目录1.1.1 生成查询1.1.2 创建对象1.1.3 保存修改的对象1.1.4 保存 ForeignKey 和 ManyToManyField 字段1.1.5 检索对象1.1.6 检索所有的对象1.1.7 过滤检索特定对象1.1.8 链接过滤1.1.9 过滤结果集是唯一 1.2.1 结果集是延迟的 1.2.2 其他的QuerySet方法1.2.3 限制 QuerySets1.2.4 字段查找1.2.5 跨关系查询1.2.6 过滤器可参考模型字段1.2.7 缓存查询集1.2.8 比较对象1.2.

一些字符串有关的题目

模板可以在上一篇文章中找到. 因为最近都没有做codeforces,所以这篇文章的主要题目来源就是codeforces啦~ 需要这类题目可以在codeforces上找到hashing.string suffix structures之类的标签. 这些题目都是随便点的,所以有些题目和字符串并没有太大的关系 CF653F Paper Task(非常规比赛) 给一个长度为n的由左右括号做成的字符串,求它子串中不同括号序列的个数. (注意不是求是合法括号序列的子串数量,而是不同括号序列个数) 1<=n<

一场CF的台前幕后(上)——转

前奏 大约4月份的时候,业界毒瘤pyx噔噔噔跑过来说:“酷爱!我YY了一道题!准备当CF的C” 我当时就被吓傻了."Yet another Chinese round?" “区间取模,区间求和” 感觉这题还不错?不过pyx嫌水了…… 好办!当时我刚刚出完动态仙人掌不久,于是一拍脑袋说:把这个问题出到仙人掌上去! 当然被pyx鄙视了…… 后来一直就没啥动静,直到5月底的CTSC. 试机的时候pyx给我看了套他出的神题……里面有一道题……我不小心读成了下面这个样子: “给定n个m维的模2意

select_related

作用:减少DB访问次数 from django.db import models class Blog(models.Model): name = models.CharField(max_length=100) tagline = models.TextField() def __str__(self): # __unicode__ on Python 2 return self.name class Author(models.Model): name = models.CharField(

Codeforces 486(#277 Div 2) 解题报告

A:比较简单 判断奇偶  一个公式即可 1 // File Name: a.cpp 2 // Author: darkdream 3 // Created Time: 2014年11月11日 星期二 22时43分28秒 4 5 #include<vector> 6 #include<list> 7 #include<map> 8 #include<set> 9 #include<deque> 10 #include<stack> 11

Autotools Mythbuster

Preface Diego Elio?"Flameeyes"?Pettenò Author and Publisher?<[email protected]> SRC=https://autotools.io/index.html David J.?"user99"?Cozatt Miscellaneous Editing?<[email protected]> Copyright ? 2009-2013 Diego Elio Pettenò

codeforces规则??for the first time.

转自  http://blog.csdn.net/ouqingliang/article/details/75213814 Codeforces简称: cf(所以谈论cf的时候经常被误会成TX的那款游戏).网址: codeforces.com 这是一个俄国的算法竞赛网站,由来自萨拉托夫州立大学.由Mike Mirzayanov领导的一个团队创立和维护,是一个举办比赛.做题和交流的平台.举办比赛和做题就不说了,"交流"指的是自带blog功能,可以求助/发布题解之类.官方语言是俄语和英语,

Useful Webpages for Competitive Programming

Graph Theory Mo's Algorithm https://www.hackerearth.com/practice/notes/mos-algorithm/ http://www.geeksforgeeks.org/mos-algorithm-query-square-root-decomposition-set-1-introduction/ http://codeforces.com/blog/entry/43230

codeforces之始

很早就听说acmer界的CF嘞!还记得刚开始听到神犇们在讨论CF的时候我还以为是网游CF(穿越火线)呢... 今年刚开学的时候就打算开始打cf的,由于一些事情耽搁了.之后又要准备省赛所以就一直拖到现在(其实还是自己懒) 今年省赛的时候再一次听到强校的神犇们说起了cf,于是结束之后立马就跑去注册啦一个账号. 从今天开始就要正式开始打cf嘞!!! 由于codeforces是一个外国网站,国内访问有种种的限制.在注册账号的时候就遇到了问题,在输入验证码的时候验证码 的图片一直显示不出来,通过度娘的帮忙

Express的使用(一)

新建文件夹:myblog 控制台: //初始化package.json E:\nodep\myblog>cnpm init This utility will walk you through creating a package.json file. It only covers the most common items, and tries to guess sensible defaults. See `npm help json` for definitive documentatio