




  • checkpatch.pl脚本突出显示了从已接受代码书写风格上的背离.这将鼓励使用此脚本的人去改正格式问题.因此,通过增加风格引导的透明性,我们增加了代码表现上的一致性,而且在一定程度上改善了质量.
  • 内嵌的"lockdep"系统动态估量锁之间的依赖和相关状态(比如当可打断时).它将在锁异常时报告所用发生的事情.异常不只是死锁或者类似问题,而是许多事情,并且死锁可能被移除.因此通过增加锁依赖图的透明性,可以提高质量.
  • 内核包含多种其他的透明性改善,比如定位未使用内存的位置来提高有效访问的透明性,或者在堆栈跟踪时用象征性名字而不是16进制地址使得bug报告更加有用.
  • 在更高层面,用git版本来追踪软件改变,看到每个人何时做了什么.事实是它鼓励在patch上加注释来回答这段代码为什么这样写.这种透明性可以增进对代码的理解,而且随着其他开发者更好被通知而改善质量.











Andrew Morton说:


这个kref结论在linux内核里给出了一个tick和一个明显的设计模式的支持.A tick意味着kref清楚地封装了一种重要设计模式.这里有一些引用计数的使用,kref模型不是特别适用.引用计数不提供希望的函数功能实际上会出错,有时候人们在不该用的时候用来kref,事实上并不有效.






    assert(obj->refcount > 0) ; increment(obj->refcount);


    BUG_ON(atomic_read(&obj->refcnt)) ; atomic_inc(&obj->refcnt);



   1      atomic_dec(&obj->refcnt);

   2      if (atomic_dec_and_test(&obj->refcnt)) { ... do stuff ... }

   3      if (atomic_dec_and_lock(&obj->refcnt, &subsystem_lock)) {
                 ..... do stuff ....


风格2是kref的风格.当一个对象在最后一个外部引用失效时这种风格是合适 的.当引用计数为0,这个对象需要被释放或者做其他处理.因此,需要检测是否未0.




An interesting example of this style of reference that does not use kref, and does not even useatomic_dec_and_test() (though it could and arguably should) are the two ref counts in struct super:s_count and s_active.





  • S_BAIS设置为1
  • grab_super用atomic_inc_not_zero而不是检测S_BIAS



The Linux kernel doesn‘t have a "kcref" object, but that is a name that seems suitable to propose for the next style of reference count. The "c" stands for "cached" as this style is very often used in
caches. So it is a Kernel Cached REFerence.

A kcref uses atomic_dec_and_lock() as given in option 3 above. It does this because, on the last put, it needs to be freed or checked to see if any other special handling is needed. This needs
to be done under a lock to ensure no new reference is taken while the current state is being evaluated.

A simple example here is the i_count reference counter in struct inode. The important part of iput()reads:

    if (atomic_dec_and_lock(&inode->i_count, &inode_lock))

where iput_final() examines the state of the inode and decides if it can be destroyed, or left in the cache in case it could get reused soon.

Among other things, the inode_lock prevents new external references being created from the internal references of the inode hash table. For this reason converting internal references to external
references is only permitted while the inode_lock is held. It is no accident that the function supporting this is called iget_locked() (or iget5_locked()).

A slightly more complex example is in struct dentry, where d_count is managed like a kcref. It is more complex because two locks need to be taken before we can be sure no new reference
can be taken - both dcache_lock and de->d_lock. This requires that either we hold one lock, thenatomic_dec_and_lock() the other (as in prune_one_dentry()), or that we atomic_dec_and_lock() the first, then claim
the second and retest the refcount - as in dput(). This is good example of the fact that you can never assume you have encapsulated all possible reference counting styles. Needing two locks could hardly be foreseen.

An even more complex kcref-style refcount is mnt_count in struct vfsmount. The complexity here is the interplay of the two refcounts that this structure has: mnt_count, which
is a fairly straightforward count of external references, and mnt_pinned, which counts internal references from the process accounting module. In particular it counts the number of accounting files that are open on the filesystem (and as such could
use a more meaningful name). The complexity comes from the fact that when there are only internal references remaining, they are all converted to external references. Exploring the details of this is again left as an exercise for the interested reader.

The "plain" style

The final style for refcounting involves just decrementing the reference count (atomic_dec()) and not doing anything else. This style is relatively uncommon in the kernel, and for good reason.
Leaving unreferenced objects just lying around isn‘t a good idea.

One use of this style is in struct buffer_head, managed by fs/buffer.c and <linux/buffer_head.h>. Theput_bh() function is simply:

    static inline void put_bh(struct buffer_head *bh)

This is OK because buffer_heads have lifetime rules that are closely tied to a page. One or more buffer_heads get allocated to a page to chop it up into smaller pieces (buffers). They tend to remain there
until the page is freed at which point all the buffer_heads will be purged (bydrop_buffers() called from try_to_free_buffers()).

In general, the "plain" style is suitable if it is known that there will always be an internal reference so that the object doesn‘t get lost, and if there is some process whereby this internal reference
will eventually get used to find and free the object.


To wrap up this little review of reference counting as an introduction to design patterns, we will discuss the related concept of an anti-pattern. While design patterns are approaches that have been shown
to work and should be encouraged, anti-patterns are approaches that history shows us do not work well and should be discouraged.

Your author would like to suggest that the use of a "bias" in a refcount is an example of an anti-pattern. A bias in this context is a large value that is added to, or subtracted from, the reference count
and is used to effectively store one bit of information. We have already glimpsed the idea of a bias in the management of s_count for superblocks. In this case the presence of the bias indicates that the value of s_active is non-zero, which
is easy enough to test directly. So the bias adds no value here and only obscures the true purpose of the code.

Another example of a bias is in the management of struct sysfs_dirent, in fs/sysfs/sysfs.h andfs/sysfs/dir.c. Interestingly, sysfs_dirent has two refcounts just like
superblocks, also called s_countand s_active. In this case s_active has a large negative bias when the entry is being deactivated. The same bit of information could be stored just as effectively and much more clearly in the flag
words_flags. Storing single bits of information in flags is much easier to understand that storing them as a bias in a counter, and should be preferred.

In general, using a bias does not add any clarity as it is not a common pattern. It cannot add more functionality than a single flag bit can provide, and it would be extremely rare that memory is so tight
that one bit cannot be found to record whatever would otherwise be denoted by the presence of the bias. For these reasons, biases in refcounts should be considered anti-patterns and avoided if at all possible.


This brings to a close our exploration of the various design patterns surrounding reference counts. Simply having terminology such a "kref" versus "kcref" and "external" versus "internal" references can
be very helpful in increasing the visibility of the behaviour of different references and counts. Having code to embody this as we do with kref and could with kcref, and using this code at every opportunity, would be a great help both to developers who might
find it easy to choose the right model first time, and to reviewers who can see more clearly what is intended.

The design patterns we have covered in this article are:

  • kref: When the lifetime of an object extends only to the moment that the last external reference is dropped, a kref is appropriate. If there are any internal reference to the object, they can only be promoted to external references with atomic_inc_not_zero().
    Examples:s_active and s_count in struct super_block.
  • kcref: With this the lifetime of an object can extend beyond the dropping of the last external reference, the kcref with its atomic_dec_and_lock() is appropriate. An internal reference can only be converted to an external reference
    will the subsystem lock is held. Examples: i_countin struct inode.
  • plain: When the lifetime of an object is subordinate to some other object, the plain reference pattern is appropriate. Non-zero reference counts on the object must be treated as internal reference to the parent object, and converting internal
    references to external references must follow the same rules as for the parent object. Examples: b_count in struct buffer_head.
  • biased-reference: When you feel the need to use add a large bias to the value in a reference count to indicate some particular state, don‘t. Use a flag bit elsewhere. This is an anti-pattern.

Next week we will move on to another area where the Linux kernel has proved some successful design patterns and explore the slightly richer area of complex data structures. (Part 2 and part 3of
this series are now available).


As your author has been reminded while preparing this series, there is nothing like a directed study of code to clarify understanding of these sorts of issues. With that in mind, here are some exercises
for the interested reader.

  1. Replace s_active and s_count in struct super with krefs, discarding S_BIAS in the process. Compare the result with the original using the trifecta of Correctness, Maintainability, and Performance.
  2. Choose a more meaningful name for mnt_pinned and related functions that manipulate it.
  3. Add a function to the kref library that makes use of atomic_inc_not_zero(), and using it (or otherwise) remove the use of atomic_dec_and_lock() on a kref in net/sunrpc/svcauth.c - a usage which violates the kref abstraction.
  4. Examine the _count reference count in struct page (see mm_types.h for example) and determine whether it behaves most like a kref or a kcref (hint: it is not "plain"). This should involve identifying any and all internal references
    and related locking rules. Identify why the page cache (struct address_space.page_tree) owns a counted reference or explain why it should not. This will involve understanding page_freeze_refs() and its usage in__remove_mapping(),
    as well as page_cache_{get,add}_speculative().

Bonus credit: provide a series of minimal self-contained patches to implement any changes that the above investigations proved useful.


时间: 2024-08-29 03:32:18



原文来自: 选择感兴趣内容简单翻译了下: 在内核社区一直以来的兴趣是保证质量.我们需要保证和改善质量是显而易见的.但是如何做到却不是那么简单.一个广泛的办法是找到一些成功之处来增加内核在多方面的透明性.这将使得这些方面的质量变得更加明朗,因此将改变内核质量. 采用多种形式增加透明性: checkpatch.pl脚本突出显示了从已接受代码书写风格上的背离.这将鼓励使用此脚本的人去改正格式问题.因此,通过增加风格引导的透明性,我们增加了代

Linux内核中的GPIO系统之(3):pin controller driver代码分析--devm_kzalloc使用【转】

转自: 一.前言 对于一个嵌入式软件工程师,我们的软件模块经常和硬件打交道,pin control subsystem也不例外,被它驱动的硬件叫做pin controller(一般ARM soc的datasheet会把pin controller的内容放入GPIO controller的章节中),主要功能包括: (1)pin multiplexing.基于ARM core


这篇是计算机中操作系统Linux类的优质预售推荐<精通Linux内核网络>. 最详尽的Linux内核网络专著,深入剖析IPsec.Wireless.InfiniBand等重要内核网络子系统. 编辑推荐 专注于各网络协议实现技术的精髓及其遵循的指导方针和原则. 重点讲解数据包在Linux内核网络栈中的传输过程,阐述其与网络各层及各子系统之间的交互. 从网络开发者视角,配合清晰图表,深入剖析Linux内核网络子系统的内部细节及核心实现 内容简介 本书讨论Linux 内核网络栈的实现及其原理,深入而


一.查看Linux内核版本命令(两种方法): /proc/version [[email protected]CentOS home]# cat /proc/versionLinux version 2.6.32-431.el6.x86_64 ([email protected]) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013 2.uname -a [

Linux 内核中逻辑地址/虚拟地址/线性地址三者的区别

本博文引自我的知乎回答:Linux 线性地址,逻辑地址和虚拟地址的关系? 为了防止歧义,以下术语都用英文.部分术语不做解释了,不然答案就太长了. 以下讲解都是以代码段为例 在 Intel 平台下,逻辑地址(logical address)是 selector:offset 这种形式,selector 是 CS 寄存器的值,offset 是 EIP 寄存器的值.如果用 selector 去 GDT( 全局描述符表 ) 里拿到 segment base address(段基址) 然后加上 offse


一.Linux内核简介linux kernel map: linux 系统体系结构: linux kernel体系结构: arm有7种工作模式,x86也实现了4个不同级别RING0-RING3,RING0级别最高, 这样linux用户代码运行在RING3下,内核运行在RING0,这样系统本身就得到了 充分的保护 用户空间(用户模式)转到内核空间(系统模式)方法: ·系统调用 ·硬件中断 linux kernel 体系结构: 虚拟文件系统VFS: VFS(虚拟文件系统)隐藏各种文件系统的具体细节,


操作机:Ubuntu 15.10(内核版本4.2.0) chocobo_root:是本次试验的POC文件,通过执行它来验证漏洞 漏洞简介 此漏洞可用于从未授权进程中执行内核代码,攻击者只需要本地普通权限,就能利用该漏洞导致拒绝服务(系统奔溃)或提升到管理员权限. 这个漏洞最早出现于2011年4月19日的代码中:[代码地址][[] 直到2

Linux内核编译 Ubuntu 14.04.3 server 升级至3.19.8

读书笔记:<Linux内核设计与实现>,原书第3版,陈莉君 康华 译 第2章:从内核出发     2.3节:编译内核 实验: ============================================================ 系统环境:VM虚拟机 Ubuntu 14.04.3 LTS server版 任务:编译安装新的内核 注意:不要跨大版本,我在3.19版本内 耗时:2小时 所有版本的内核:


技巧 -Linux内核参数调整办法 ulimit设置 ulimit -n 要调整为100000甚至更大. 命令行下执行 ulimit -n 100000即可修改.如果不能修改,需要设置 /etc/security/limits.conf,加入 * soft nofile 262140 * hard nofile 262140 root soft nofile 262140 root hard nofile 262140 * soft core unlimited * hard core unli