Exploring Message Brokers: RabbitMQ, Kafka, ActiveMQ, and Kestrel--reference

[This article was originally written by Yves Trudeau.]

http://java.dzone.com/articles/exploring-message-brokers

Message brokers are not regularly covered here but are, nonetheless, important web-related technologies. Some time ago, I was asked by one of our customer to review a selection of OSS message brokers and propose a couple of good candidates. The requirements were fairly simple: behave well when there’s a large backlog of messages, be able to create a cluster and in case of the failure of a node in a cluster, try to protect the data but never blocks the publishers even though that might imply data lost. Nothing fancy regarding queues and topics management. I decided to write my findings here, before I forget…

I don’t consider myself a message broker specialist and I spent only about a day or two on each so, I may have done some big mistakes configuration wise. I’ll take the blame if something is misconfigured or not used correctly.

RabbitMQ

RabbitMQ is well known and popular message broker and it has many powerful features. The documentation on the RabbitMQ web site is excellent and there are many books available. RabbitMQ is written in Erlang, not a widely used programming language but well adapted to such tasks. The company Pivotal develops and maintains RabbitMQ. I reviewed version 3.2.2 on CentOS 6 servers.

The installation was easy, I installed Erlang version R14B from epel and the RabbitMQ rpm. The only small issue I had was that the server is expecting “127.0.0.1″ to be resolved in /etc/hosts and the openstack VMs I used were missing that. Easy to fix. I also installed and enabled the management plugin.

The RabbitMQ configuration is set in the rabbitmq.config file and it has tons of adjustable parameters. I used the defaults. In term of client API, RabbitMQ support a long list of languages and some standards protocols, like STOMP are available with a plugin. Queues and topics can be created either by the web interface or through the client API directly. If you have more than one node, they can be clustered and then, queues and topics, can be replicated to other servers.

I created 4 queues, wrote a ruby client and started inserting messages. I got a publishing rate of about 20k/s using multiple threads but I got a few stalls caused by the vm_memory_high_watermark, from my understanding during those stalls it writing to disk. Not exactly awesome given my requirements. Also, some part is always kept in memory even if a queue is durable so, even though I had plenty of disk space, the memory usage grew and eventually hit the vm_memory_high_watermark setting. The cpu load was pretty high during the load, between 40% and 50% on an 8 cores VM.

Even though my requirements were not met, I setup a replicated queue on 2 nodes and inserted a few millions objects. I killed one of the two nodes and insert were even faster but then… I did a mistake. I restarted the node and asked for a resync. Either I didn’t set it correctly or the resync is poorly implemented but it took forever to resync and it was slowing down as it progressed. At 58% done, it has been running for 17h, one thread at 100%. My patience was exhausted.

So, lots of feature, decent performance but behavior not compatible with the requirements.

Kafka

Kafka has been designed originally by LinkedIn, it is written in Java and it is now under the Apache project umbrella. Sometimes you look at a technology and you just say: wow, this is really done the way it should be. At least I could say that for the purpose I had. What is so special about Kafka is the architecture, it stores the messages in flat files and consumers ask messages based on an offset. Think of it like a MySQL server (producer) saving messages (updates SQL) to its binlogs and slaves (consumers) ask messages based on an offset. The server is pretty simple and just don’t care about the consumers much. That simplicity makes it super fast and low on resource. Old messages can be retained on a time base (like expire_logs_days) and/or on a storage usage base.

So, if the server doesn’t keep track of what has been consumed on each topics, how do can you have multiple consumer. The missing element here is Zookeeper. The Kafka server uses Zookeeper for cluster membership and routing while the consumers can also use Zookeeper or something else for synchronization. The sample consumer provided with the server uses Zookeeper so you can launch many instances and they’ll synchronize automatically. For the ones that doesn’t know Zookeeper, it is a highly-available synchronous distributed storage system. If you know Corosync, it provides somewhat the same functionality.

Feature wise Kafka, isn’t that great. There’s no web frontend builtin although a few are available in the ecosystem. Routing and rules are inexistent and stats are just with JMX. But, the performance… I reached a publishing speed of 165k messages/s over a single thread, I didn’t bother tuning for more. Consuming was essentially disk bound on the server, 3M messages/s… amazing. That was without Zookeeker coordination. Memory and cpu usage were modest.

To test clustering, I created a replicated queue, inserted a few messages, stopped a replica, inserted a few millions more messages and restarted the replica. I took only a few seconds to resync.

So, Kafka is very good fit for the requirements, stellar performance, low resource usage and nice fit with the requirements.

ActiveMQ

ActiveMQ is another big player in the field with an impressive feature set. ActiveMQ is more in the RabbitMQ league than Kafka and like Kafka, it is written in Java. HA can be provided by the storage backend, levelDB supports replication but I got some issues with it. My requirements are not for full HA, just to make sure the publishers are never blocked so I dropped the storage backend replication in favor of a mesh of brokers.

My understanding of the mesh of brokers is that you connect to one of the members and you publish or consume a message. You don’t know on which node(s) the queue is located, the broker you connect to knows and routes your request. To further help, you can specify all the brokers on the connection string and the client library will just reconnect to another if the one you are connected to goes down. That looks pretty good for the requirements.

With the mesh of brokers setup, I got an insert rate of about 5000 msg/s over 15 threads and a single consumer was able to read 2000 msg/s. I let it run for a while and got 150M messages. At this point though, I lost the web interface and the publishing rate was much slower.

So, a big beast, lot of features, decent performance, on the edge with the requirements.

Kestrel

Kestrel is another interesting broker, this time, more like Kafka. Written in scala, the Kestrel broker speaks the memcached protocol. Basically, the key becomes the queue name and the object is the message. Kestrel is very simple, queues are defined in a configuration file but you can specify, per queue, storage limits, expiration and behavior when limits are reached. With a setting like “discardOldWhenFull = true”, my requirement of never blocking the publishers is easily met.

In term of clustering Kestrel is a bit limited but each can publish its availability to Zookeeper so that publishers and consumers can be informed of a missing server and adjust. Of course, if you have many Kestrel servers with the same queue defined, the consumers will need to query all of the broker to get the message back and strict ordering can be a bit hard.

In term of performance, a few simple bash scripts using nc to publish messages easily reached 10k messages/s which is very good. The rate is static over time and likely limited by the reconnection for each message. The presence of consumers slightly reduces the publishing rate but nothing drastic. The only issue I had was when a large number of messages expired, the server froze for some time but that was because I forgot to set maxExpireSweep to something like 100 and all the messages were removed in one pass.

So, fairly good impression on Kestrel, simple but works well.

Conclusion

For the requirements given by the customer, Kafka was like a natural fit. It offers a high guarantee that the service will be available and non-blocking under any circumstances. In addition, messages can easily be replicated for higher data availability. Kafka performance is just great and resource usage modest.

Published at DZone with permission of Peter Zaitsev, author and DZone MVB. (source)

时间: 2024-10-29 13:51:30

Exploring Message Brokers: RabbitMQ, Kafka, ActiveMQ, and Kestrel--reference的相关文章

rabbitMQ、activeMQ、zeroMQ、Kafka、Redis 比较

Kafka作为时下最流行的开源消息系统,被广泛地应用在数据缓冲.异步通信.汇集日志.系统解耦等方面.相比较于RocketMQ等其他常见消息系统,Kafka在保障了大部分功能特性的同时,还提供了超一流的读写性能. 针对Kafka性能方面进行简单分析,相关数据请参考:https://segmentfault.com/a/1190000003985468,下面介绍一下Kafka的架构和涉及到的名词: Topic:用于划分Message的逻辑概念,一个Topic可以分布在多个Broker上. Parti

RabbitMQ 与 ActiveMQ、ZeroMQ以及Kafka的比较

之前写了一篇文章关于Active以及消息队列推拉模式的文章,可以参考:link 关于 Active 与 RabbitMQ以及其他的比较,有如下记录: 这篇文章 link 提到: 基本介绍RabbitMQ:基于AMQP协议(Advanced Message Queue Protocol)ActiveMQ:基于STOMP协议(注:我只知道是基于JMS) rabbitMQ 是 AMQP 用 Erlang 实现的 MQ .用 Erlang的原因(Erlang消息机制与AMQP极度吻合): AMQP 主要

RabbitMq、ActiveMq、ZeroMq、kafka之间的比较,资料汇总

MQ框架非常之多,比较流行的有RabbitMq.ActiveMq.ZeroMq.kafka.这几种MQ到底应该选择哪个?要根据自己项目的业务场景和需求.下面我列出这些MQ之间的对比数据和资料. 第一部分:RabbitMQ,ActiveMq,ZeroMq比较 1. TPS比较 一 ZeroMq 最好,RabbitMq 次之, ActiveMq 最差.这个结论来自于以下这篇文章. http://blog.x-aeon.com/2013/04/10/a-quick-message-queue-benc

RabbitMq、ActiveMq、ZeroMq、kafka之间的比较

MQ框架非常之多,比较流行的有RabbitMq.ActiveMq.ZeroMq.kafka.这几种MQ到底应该选择哪个?要根据自己项目的业务场景和需求.下面我列出这些MQ之间的对比数据和资料. 第一部分:RabbitMQ,ActiveMq,ZeroMq比较 1. TPS比较 一 ZeroMq 最好,RabbitMq 次之, ActiveMq 最差.这个结论来自于以下这篇文章. http://blog.x-aeon.com/2013/04/10/a-quick-message-queue-benc

kafka,activemq rabbitmq.rocketmq的优点和缺点

kafka,activemq rabbitmq.rocketmq的优点和缺点: 特性 ActiveMQ RabbitMQ RocketMQ Kafka 单机吞吐量 万级,吞吐量比RocketMQ和Kafka要低了一个数量级 万级,吞吐量比RocketMQ和Kafka要低了一个数量级 10万级,RocketMQ也是可以支撑高吞吐的一种MQ 10万级别,这是kafka最大的优点,就是吞吐量高. 一般配合大数据类的系统来进行实时数据计算.日志采集等场景 topic数量对吞吐量的影响 topic可以达到

Kafka与RabbitMQ、ActiveMQ协议区别

对于Kafka与RabbitMQ.ActiveMQ协议,它们具体的区别如下: activemq:        activemq支持主从复制.集群.但是集群功能看起来很弱,只有failover功能,即我连一个失败了,可以切换到其他的broker上.这一点貌似不太科学.假设有三个broker,其中一个上面没有consumer,但另外两个挂了,消息会转到这个上面来,堆积起来.看样子activemq还在升级中.        activemq工作模型比较简单.只有两种模式 queue,topics r

消息中间件的技术选型心得-RabbitMQ、ActiveMQ和ZeroMQ

RabbitMQ.ActiveMQ和ZeroMQ都是极好的消息中间件,但是我们在项目中该选择哪个更适合呢?很多开发者面临这个烦恼.下面我会对这三个消息中间件做一个比较,看了后你们就心中有数了. RabbitMQ是AMQP协议领先的一个实现,它实现了代理(Broker)架构,意味着消息在发送到客户端之前可以在中央节点上排队.此特性使得RabbitMQ易于使用和部署,适宜于很多场景如路由.负载均衡或消息持久化等,用消息队列只需几行代码即可搞定.但是,这使得它的可扩展性差,速度较慢,因为中央节点增加了

(转)消息中间件的技术选型心得-RabbitMQ、ActiveMQ和ZeroMQ

RabbitMQ.ActiveMQ和ZeroMQ都是极好的消息中间件,但是我们在项目中该选择哪个更适合呢?很多开发者面临这个烦恼.下面我会对这三个消息中间件做一个比较,看了后你们就心中有数了. RabbitMQ是AMQP协议领先的一个实现,它实现了代理(Broker)架构,意味着消息在发送到客户端之前可以在中央节点上排队.此特性使得RabbitMQ易于使用和部署,适宜于很多场景如路由.负载均衡或消息持久化等,用消息队列只需几行代码即可搞定.但是,这使得它的可扩展性差,速度较慢,因为中央节点增加了

RabbitMQ、ActiveMQ和ZeroMQ

消息中间件的技术选型心得-RabbitMQ.ActiveMQ和ZeroMQ 作者:chszs,转载需注明.博客主页:http://blog.csdn.net/chszs RabbitMQ.ActiveMQ和ZeroMQ都是极好的消息中间件,但是我们在项目中该选择哪个更适合呢?很多开发者面临这个烦恼.下面我会对这三个消息中间件做一个比较,看了后你们就心中有数了. RabbitMQ是AMQP协议领先的一个实现,它实现了代理(Broker)架构,意味着消息在发送到客户端之前可以在中央节点上排队.此特性