OpenVPN没有多处理。人所皆知。我觉得我有点啰嗦了。天天说这个事。为什么没有多处理呢?我们来看下OpenVPN的作者,大牛级别的,早已超越代码的重量级人物,James Yonan(简称JY)是怎么解释的。
一切都在maillist中。有人问过,为何OpenVPN不实现多线程。而且人家给出了实际的測试数据。JY是这么回答的:
OpenVPN 2.0 has no multithreading support, this is the only feature present in
1.x which has been removed from 2.0.
好吧。明白说明了OpenVPN 2.0时代不支持多线程了,此前的1.0时代,多线程是有的,可是并不用于传输数据。即不是用于数据通道的。注意。由于讨论仅仅局限于处理过程的CPU开销。和我之前所想的一样,在1.0时代,由于OpenVPN仅仅是建立一个加密隧道。仅仅有隧道中有数据的时候才会有CPU开销,然而何时有数据是不知道的。所以使用内核的调度机制是不明智的(内核的task entry调度是基于一系列的预測的),因此CPU的开销仅仅是在控制通道的TLS握手阶段(对于非SSL情况也一样,预共享密钥,username/password的验证仅仅是比SSL弱了一些)才干定量计算,因此OpenVPN仅仅是将额外的线程用于这个协商阶段,在传输数据阶段,OpenVPN仅仅使用一个线程。而且内部实现了自己的packet schedule机制。
注意,不要觉得OpenVPN没有实现多线程就不好(这是我之前的误区,对于别人而言,要么喷我,要么根本就不关注此事)。其实,我被折服了。单线程的OpenVPN将这个唯一的线程对资源的利用率维持地如此之高,让人钦佩。关键就是它自己的packet schedule机制。在OpenVPN 2.0时代,甚至连控制通道协商阶段的独立线程都取消了,JY的意思是这样的:
The original rationale for having the TLS thread optimization was to improve
latency during the TLS key negotiation which is very CPU intensive. The 1.x
pthread implementation uses pthreads only for this very special case, which
does not improve overall efficiency on multiprocessor machines, but helps to
keep tunnel-forwarding latency down during the TLS negotiation.
I did some testing on 2.0 to determine the worst-case latency caused by the
TLS negotiation in single threaded mode. On a 2GHz x86, the worst-case
latency was about 160 milliseconds for a 2048 bit key and 40 milliseconds for
a 1024 bit key. Even with 100 users hitting a TLS renegotiate once per hour,
the probability that two or more of these 160 millisecond latency periods
would overlap to make a bigger latency is still quite small.
I think these latency numbers are too small to justify the extra level of
complexity entailed by multithreading. Not to mention whole classes of
potential bugs which arise when you attempt to multithread code, and
incompatibilities that exist between multithread implementations on different
OSes. Bottom line is that I don‘t think multithreading in OpenVPN is worth
the trouble.
收益不足以弥补代价,就这么简单。我想假设OpenVPN仅仅是为Linux定制的,事情会好非常多。
看来好的软件不仅仅是特定平台的最高效,很多其它的是全部平台的可执行。JY是怎么对这件事下结论的呢?首先看一下JY的阐述:
Keep in mind that people use multithreading to:
(1) improve latency, or
(2) improve performance on multithreaded machines
OpenVPN 1.x only tried to hit (1).
With OpenVPN 2.0, my decision was basically that (1) didn‘t justify the
complexification that pthread support would entail and that (2) is satisfied
by different means.
So how do you improve performance on multithreaded machines, to take advantage
of all processors, i.e. if (1) is not worth the effort, then how to
accomplish (2)?
思路超级地清晰。无比的清晰(或许是我找到共鸣了吧),他根本就不把特定的,特殊的。100+ms协商1小时受用的SSL握手过程,username/password校验过程等作为瓶颈,同一时候在数据通道的传输中对称加密的效率是一个定值,那么全部的提高效率的关键点就是:怎样提高传输性能,这个思路不偏不倚。非常客观公正,我为什么这么说呢?对于关注SSL协议的人来讲。他首先关注的是SSL性能。由于他有SSL优化的经验和能力,而实际上,这样的偏爱可能已经把路走偏了。对于关注网络的人来讲,他总是关注什么多线程利用多网卡队列之类的,由于这方面的资料他天天关注,而实际上。这样的偏爱必定也不是正道。
JY客观分析了这样的两种,觉得SSL作为一种仅仅占执行时间一小部分的动作,特殊现象不足以成为瓶颈。没有必要为其独立一个线程而添加复杂性。相同,传输阶段的包调度不属于OpenVPN的控制范畴,多处理相同也不是OpenVPN要考虑的,那么他给出了结论:
Answer: Run multiple server mode daemons on different ports, and have the
client load balance between them by using multiple "remote" entries in the
client side config. This is actually more efficient than multithreading
because each OpenVPN daemon gets its own private virtual memory address
space, so there is no bus contention from multiple processors over the same
address space, as would occur with a multi-threaded execution model.
是的,由外部来做!
我想,JY从一開始就是思路清晰的吧,所以他把数据通道和控制通道分离,这个分离让单进程单线程的处理更加超级紧凑,让特殊的SSL过程(请原谅,我也是SSL关注者,遗憾的是,我关注了两者,不光是SSL,还有传输数据)处理的优化和传输数据的优化能够分开进行。
不要怀疑OpenVPN的低效了,它作为一个单进程单线程的程序,它非常紧凑,在这一个仅有的线程里。它的packet schedule算法可谓最优化,假设想优化它,注意JY的Answer,同一时候注意我的blog吧。我不敢和JY称兄道弟。可是事实证明,我俩的思路是一致的。
我的偏执在于,我实在不想多个OpenVPN侦听多个port。所以我才做了多线程。然而,请看一眼我的多线程版本号就知道。我其实对于包传输没有做不论什么改动,仅仅是共享了multi_instance链表以及IP地址pool而已。我所做的工作也都是外围的工作,我没有改动OpenVPN的源代码。由于我知道它已经够紧凑。所以我仅仅做外围的封装。我稍微改动了协议。但这仅仅是一脬垃圾而已。
复杂性让位于简洁性的一个完美的样例。