What is RCU, Fundamentally?
https://lwn.net/Articles/262464/
If you can fill the unforgiving second
with sixty minutes worth of distance run,
“Highly scalable” your code will be reckoned,
And—which is more—you‘ll have parallel fun!
With apologies to Rudyard Kipling.
SMP Scalability Papers
- Linux-Kernel Memory Ordering: Help Arrives At Last!, with Jade Alglave, Luc Maranget, Andrea Parri, and Alan Stern, Linux Kernel Summit Track. (Additional litmus tests here.) November 2016.
- Linux-Kernel Memory Ordering: Help Arrives At Last!, with Jade Alglave, Luc Maranget, Andrea Parri, and Alan Stern, LinuxCon EU. October 2016.
- High-Performance and Scalable Updates: The Issaquah Challenge, guest lecture to the Distributed Operating Systems class at TU Dresden (video), June 2016.
- Practical Experience With Formal Verification Tools at Beaver BarCamp, Corvallis, Oregon USA, April 2016.
- Practical Experience With Formal Verification Tools, Verified Trustworthy Software Systems Specialist Meeting, April 2016.
- Linux-Kernel Community Validation Practices, The Royal Society Verified Trustworthy Software Systems Meeting, “Verification in Industry” discussion, April 2016.
- Formal Verification and Linux-Kernel Concurrency, guest lecture to the CS569 class at Oregon State University, June 2015.
- Formal Verification and Linux-Kernel Concurrency, guest lecture to the CS362 class at Oregon State University, June 2015. (AKA “what would have to happen for me to add formal verification to Linux-kernel RCU‘s regression test suite?”)
- High-Performance and Scalable Updates: The Issaquah Challenge, guest lecture to the Distributed Operating Systems class at TU Dresden (video), June 2015.
- Formal Verification and Linux-Kernel Concurrency at Beaver BarCamp, Corvallis, Oregon USA, April 2015.
- Creating scalable APIs, in Linux Weekly News, August 2014.
- High-Performance and Scalable Updates: The Issaquah Challenge at linux.conf.au in Auckland, January 2015.
- Bare-Metal Multicore Performance in a General-Purpose Operating System (Adventures in Ubiquity) at linux.conf.au in Auckland, January 2015.
- Use Cases for Thread-Local Storage ISO SC22 WG21 (C++ Language), November 2014. (revised N4376 2015-02-06).
- Linux-Kernel Memory Model ISO SC22 WG21 (C++ Language), November 2014. Official version: N4216 (revised N4374 2015-02-06).
- Axiomatic validation of memory barriers and atomic instructions, in Linux Weekly News, August 2014.
- Out-of-Thin-Air Execution is Vacuous ISO SC22 WG21 (C++ Language), May 2014. Official version: N4216 (revised N4323 2014-11-20, revised N4375 2015-02-06).
- Reordering and Verification at the Linux Kernel REORDER workshop in Vienna Summer of Logic, July 2014.
- But What About Updates? Guest lecture to Portland State University CSE510 (Concurrency), Prof. Jonathan Walpole, June 2014.
- N4037: Non-Transactional Implementation of Atomic Tree Move ISO SC22 WG21 (C++ Language), May 2014.
- Bare-Metal Multicore Performance in a General-Purpose Operating System (Now With Added Energy Efficiency!) at Beaver BarCamp, Corvallis, OR, USA, April 2014.
- Bare-Metal Multicore Performance in a General-Purpose Operating System (Now With Added Energy Efficiency!) at Linux Collaboration Summit, Napa, CA, USA, March 2014.
- But What About Updates? at Linux Collaboration Summit, Napa, CA, USA, March 2014.
- Bare-Metal Multicore Performance in a General-Purpose Operating System (Now With Added Energy Efficiency!) at linux.conf.au in Perth, January 2014.
- Advances in Validation of Concurrent Software at linux.conf.au in Perth, January 2014.
- Scaling Talks at Linux Kernel Summit Scaling microconference October 2013.
- But What About Updates? at Linux Plumbers Conference Scaling microconference, New Orleans, LA, USA. September 2013.
- Bare-Metal Multicore Performance in a General-Purpose Operating System (Now With Added Energy Efficiency!) at Linux Plumbers Conference, New Orleans, LA, USA. September 2013.
- Advances in Validation of Concurrent Software at Linux Plumbers Conference, New Orleans, LA, USA. September 2013.
- Beyond Expert-Only Parallel Programming? at LinuxCon North America, New Orleans, LA, USA. September 2013.
- Bare-Metal Multicore Performance in a General-Purpose Operating System at Linux Foundation Enterprise End User Summit, May 2013.
- Bare-Metal Multicore Performance in a General-Purpose Operating System at Multicore World, February 2013. (Updated for Oregon State University BarCamp, April 2013.)
- January 2013 Validating Core Parallel Software? at linux.conf.au Open Programming Miniconference.
- Beyond Expert-Only Parallel Programming? (presentation), at the Workshop on Relaxing Synchronization for Multicore and Manycore Software (RACES‘12), October 2012.
- Scheduling and big.LITTLE Architecture, at Scheduling Microconference, Linux Plumbers Conference, August 2012.
- Signed overflow optimization hazards in the kernel, in Linux Weekly News, August 2012.
- Validating Core Parallel Software, at Linux Collaboration Summit, San Francisco, CA, USA, April 2012.
- Validating Memory Barriers and Atomic Instructions, in Linux Weekly News, December 2011.
- Validating Core Parallel Software, at TU Dresden, Germany, October 2011.
- Validating Core Parallel Software, at the 2011 China Linux Kernel Developer Conference, Nanjing, China, October 2011. (Invited)
- Is Parallel Programming Hard, And If So, What Can You Do About It?, at the 2011 Android System Developer Forum, Taipei, Taiwan, April 2011. (Invited)
- Verifying Parallel Software: Can Theory Meet Practice?, at Verification of Concurrent Data Structures (Verico), Austin, TX, USA, January 2011. (Invited)
- Concurrent code and expensive instructions, Linux Weekly News, January 2011.
- Is Parallel Programming Hard, And, If So, Why?, linux.conf.au January 2011.
- Verifying Parallel software: Can Theory Meet Practice?, linux.conf.au January 2011.
- Multi-Core Memory Models and Concurrency Theory: A View from the Linux Community, Dagstuhl workshop January 2011.
- N1525: Memory-Order Rationale with Blaine Garst (revised). ISO SC22 WG14 (C Language), November 2010.
- Omnibus Memory Model and Atomics Paper, ISO SC22 WG21 (C++ Language), with Mark Batty, Clark Nelson, Hans Boehm, Anthony Williams, Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber, Michael Wong, Lawrence Crowl, and Benjamin Kosnik. August 2010. Updated November 2010,
- Scalable concurrent hash tables via relativistic programming, August 2010, with Josh Triplett and Jonathan Walpole.
- Why the grass may not be greener on the other side: a comparison of locking vs. transactional memory, August 2010, with Maged M. Michael, Josh Triplett, and Jonathan Walpole.
- Synchronization and Scalability in the Macho Multicore Era, Scuola Superiore Sant‘Anna, Pisa, Italy, July 2010.
- Additional Atomics Errata. ISO SC22 WG14 (C Language), May 2010.
- Additional Atomics Errata, complete with typo in title. ISO SC22 WG14 (C Language), May 2010.
- Rationale for C-Language Dependency Ordering. ISO SC22 WG14 (C Language), May 2010.
- Updates to C++ Memory Model Based on Formalization. ISO SC22 WG14 (C Language), April 2010. Updated May 2010.
- Explicit Initializers for Atomics. ISO SC22 WG14 (C Language), April 2010. Updated May 2010.
- Dependency Ordering for C Memory Model. ISO SC22 WG14 (C Language), April 2010.
- Explicit Initializers for Atomics. ISO SC22 WG21 (C++ Language) March 2010.
- Updates to C++ Memory Model Based on Formalization. ISO SC22 WG21 (C++ Language) February 2010. Updated March 2010.
- Dependency Ordering for C Memory Model. ISO SC22 WG14 (C Language) November 2009.
- Updates to C++ Memory Model Based on Formalization. ISO SC22 WG14 (C Language) October 2009.
- Performance, Scalability, and Real-Time Response From the Linux Kernel short course for ACACES 2009.
- Is Parallel Programming Hard, and If So, Why?, presented at January 2009 linux.conf.au, along with corresponding Portland State University technical report.
- Example POWER Implementation for C/C++ Memory Model, revision of ISO WG21 N2745. ISO SC22 WG21 (C++ Language) September 2008. This mapping was proven to be pointwise locally optimal in 2012 byBatty, Memarian, Owens, Sarkar, and Sewell of University of Cambridge. In other words, to improve on this mapping, it is necessary to consider successive atomic operations: Taken one at a time, each is optimal.
- Concurrency and Race Conditions at Linux Plumbers Conference Student Day, September 2008.
- After 25 Years, C/C++ Understands Concurrency at linux.conf.au 2008 Mel8ourne. February 2008.
- Comparison of locking and transactional memory and presentation at PLOS 2007 with Maged Michael and Jon Walpole. October 2007. (revised presentation.) (Official version of paper.)
- C++0x memory model user FAQ with Hans Boehm, August 2007.
- C++ Data-Dependency Ordering: Atomics (Updated), C++ Data-Dependency Ordering: Memory Model (Updated), and C++ Data-Dependency Ordering: Function Annotation (Updated). August 2007. (Updated version of the May 2007 paper.)
- C++ Data-Dependency Ordering. May 2007.
- A simple and efficient memory model for weakly ordered architectures. Makes case for weakly ordered primitives in programming languages. Updated May 2007.
- Overview of Linux-Kernel Reference Counting. January 2007.
- Memory Ordering in Modern Microprocessors, appearing in two parts in the August and September 2005 Linux Journal (revised April 2009).
- Storage Improvements for 2.6 and 2.7 in August 2004 Linux Journal.
- Linux Kernel Scalability: Using the Right Tool for the Job. Presentation on scalability given at the 2004 Ottawa Linux Symposium and revised for the 2005 linux.conf.au.
- Issues with Selected Scalability Features of the 2.6 Kernel OLS paper describing scalability, DoS, and realtime limitations of the Linux kernel at that time. With Dipankar Sarma.
- Fairlocks--a High-Performance Fair Locking Scheme Bit-vector fair locking scheme for NUMA systems. Revision of paper that appeared in 2002 Parallel and Distributed Computing and Systems, with Swaninathan Sivasubramanian, Jack F. Vogel, and John Stultz. Of course, it is even better to design your software so that lock contention is low enough that fancy locking techniques don‘t help! We implemented a number of variations on this theme.
- Practical Performance Estimation On Shared-Memory Multiprocessors (bibtex). The silver lining of the memory-latency dark cloud--programs whose run time is dominated by memory latency are often amenable to simple performance-estimation methods. Some of these methods are applicable at design time. Revision of PDCS‘99 paper.
- Differential Profiling (bibtex). Revised version of the MASCOTS‘95 and the ‘99 SP&E papers.
- Experience With an Efficient Parallel Kernel Memory Allocator (bibtex). Revised version of the W‘93 USENIX and 2001 SP&E papers.
- Selecting Locking Designs for Parallel Programs (bibtex). Revised version of the PLoPD-II paper.
- Selecting Locking Primitives for Parallel Programs (bibtex). Revised version of the October ‘96 CACM paper.
- Efficient Demultiplexing of Incoming TCP Packets (bibtex). Analytic comparison of a number of demultiplexing techniques. The winner is hashing.
- Stochastic Fairness Queueing (bibtex). High-speed approximate implementation of Fair Queueing.
- High-Speed Event-Counting and -Classification Using a Dictionary Hash Technique (bibtex). Revised version of the ICPP‘89 paper.
- Bibtex for other papers
-
Introduction to RCU
The best introduction to RCU is my Linux Weekly News three-part series, with update:
- What is RCU, Fundamentally? with Jonathan Walpole (bibtex).
- What is RCU? Part 2: Usage (bibtex).
- RCU part 3: the RCU API (bibtex).
- The RCU API, 2010 Edition.
These expand on the older “What is RCU?” introduction. The Wikipedia article also has some good information, as does the ACM Queue article. In addition, Linux Weekly News has a long list of RCU-related articles.
There is also some research on the general family of algorithms of which RCU is a member (bibtex) and an annotated bibliography. Alexey Gotsman, Noam Rinetzky, and Hongseok Yang have produced aformalization of RCU based on separation logic.
How much is RCU used in the Linux kernel?
Implementing RCU
The following papers describe how to implement RCU, in roughly increasing order of accessibility:
- Lockdep-RCU.
- RCU: The Bloatwatch Edition (optimized for uniprocessor operation) (bibtex).
- Sleepable Read-Copy Update (SRCU), revision of Linux Weekly News article (bibtex).
- The classic PDF revision of PDCS‘98 paper on DYNIX/ptx‘s RCU implementation (bibtex).
- The February 2012 IEEE TPDS paper (bibtex) is the best source of information on what RCU is, how to implement it in userspace, and how it performs. The pre-publication accepted version of this paper may be found here (main paper) and here (supplementary materials). Some of the material in this paper came from Mathieu Desnoyers‘s Ph.D. dissertation (bibtex).
- Using Promela and Spin to verify parallel algorithms at Linux Weekly News (bibtex). Includes description of QRCU implementation.
- My Ph.D. dissertation on RCU, which includes descriptions of a number of early implementations (bibtex).
- The design of preemptable read-copy update (Linux Weekly News article) (bibtex). Please be warned: this is a detailed design document of the most complex known RCU implementation. This implementation has since been replaced by a faster, simpler, and more scalable implementation, and an update of the documentation is pending.
There is an RCU to-do list that is updated sporadically.
Read-Copy Update (RCU) Papers
A more-complete list in reverse chronological order:
- October 2016 Tracing and Linux-Kernel RCU at Tracing Summit.
- September 2016 A lock-free concurrency toolkit for deferred reclamation and optimistic speculation at CPPCON, with Michael Wong and Maged Michael (video).
- September 2016 RCU and C++ at CPPCON (video).
- September 2016 Beyond the Issaquah Challenge: High-Performance Scalable Complex Updates at CPPCON.
- June 2016 High-Performance and Scalable Updates: The Issaquah Challenge, at ACM Applicative Conference.
- February 2016 What Happens When 4096 Cores All Do synchronize_rcu_expedited()?, at linux.conf.au.
- February 2016 Mutation Testing and RCU, at linux.conf.au Kernel Miniconf.
- September 2015 C++ Atomics: The Sad Story of memory_order_consume A Happy Ending At Last?at CPPCON.
- July-August 2015 Requirements for RCU part 1: the fundamentals, RCU requirements part 2 — parallelism and software engineering, and RCU requirements part 3, Linux Weekly News.
- May 2015 Dagstuhl Seminar 15191 “Compositional Verification Methods for Next-Generation Concurrency”:
- November 2014 Recent read-mostly research, Linux Weekly News.
- November 2014 Read-Copy Update (RCU) Validation and Verification for Linux Galois Tech Talk.
- September 2014 C++ Memory Model Meets High-Update-Rate Data Structures CPPCON.
- September 2014 The RCU API, 2014 Edition, Linux Weekly News.
- May 2014 Towards Implementation and Use of memory_order_consume ISO SC22 WG21 (C++ Language) Official version: (N4036) (revised N4215 2014-10-05, revised N4321 2014-11-20).
- May 2014 Non-Transactional Implementation of Atomic Tree Move (4037) ISO SC22 WG21 (C++ Language).
- May 2014 What Is RCU?, presented to TU Dresden Distributed OS class (Instructor Carsten Weinhold).
- November 2013 User-space RCU, Linux Weekly News, with Mathieu Desnoyers, Lai Jiangshan, and Josh Triplett. Subparts of this article are: URCU-protected hash tables, The URCU hash table API, URCU-protected queues and stacks, The URCU stack/queue API, User-space RCU: Atomic-operation and utility API, User-space RCU: Memory-barrier menagerie, The user-space RCU API, The RCU-protected list API, The RCU-barrier menagerie,
- November 2013 What Is RCU?, guest lecture to University of Cambridge (Prof. Peter Sewell).
- October 2013 Introduction to RCU Concepts: Liberal application of procrastination for accommodation of the laws of physics — for more than two decades, LinuxCon Europe 2013 (part of Mathieu Desnoyers‘s Hands-On Tutorial on Scalability with Userspace RCU).
- May 2013 What Is RCU?, presented to TU Dresden Distributed OS class (Prof. Hermann Härtig).
- May 2013 What Is RCU? (video), presented to Indian Institute of Science (IISc) (Prof. K. Gopinath).
- May 2013 Structured Deferral: Synchronization via Procrastination, ACM Queue.
- January 2013 What is RCU?, The SIGPLAN Programming Languages Mentoring Workshop.
- August 2012 Real-Time Response on Multicore Systems: It Is Bigger Than You Think, Scaling Microconference, Linux Plumbers Conference.
- May 2012 What Is RCU? presented to TU Dresden Distributed OS class (Prof. Hermann Härtig).
- February 2012 Making RCU Safe For Battery-Powered Devices presented to the Embedded Linux Conference.
- February 2012 User-Level Implementations of Read-Copy Update (bibtex) covering what RCU is, how to implement it in userspace, and how it performs. The pre-publication accepted version of this paper may be found here (main paper) and here (supplementary materials).
- July 2011 3.0 and RCU: what went wrong.
- December 2010 The RCU API, 2010 Edition.
- August 2010 Scalable Concurrent Hash Tables via Relativistic Programming (bibtex).
- February 2010 Lockdep-RCU describing software-engineering enhancements to the Linux-kernel RCU implementations (bibtex).
- January 2010 Simplicity Through Optimization (presentation) (bibtex).
- January 2009 Using a Malicious User-Level RCU to Torture RCU-Based Algorithms, at linux.conf.au (bibtex). Describes several user-level RCU implementations, and describes how they can be used to validate kernel-level code using RCU.
- November 2008 Hierarchical RCU, in Linux Weekly News (bibtex). Describes a Linux-kernel RCU implementation designed to scale to thousands of CPUs.
- July 2008 Introducing technology into the Linux kernel: a case study in ACM SIGOPS Operating System Review, with Jon Walpole (updated to include RCU changes through the 2.6.36 Linux kernel) (bibtex).
- May 2008 The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux in IBM Systems Journal, with Dinakar Guniguntala, Josh Triplett, and Jon Walpole (bibtex).
- February 2008 Introducing Technology into Linux (bibtex), or "Introducing your technology into Linux will require intoducing a LOT of Linux into your technology!!!" at the 2008 Linux Developer Symposium - China (revised). (Chinese translation of original.)
- January 2008 RCU part 3: the RCU APIat Linux Weekly News (bibtex).
- December 2007 What is RCU? Part 2: Usageat Linux Weekly News (bibtex).
- December 2007 What is RCU, Fundamentally?at Linux Weekly News with Jonathan Walpole (bibtex).
- December 2007 Performance of Memory Reclamation for Lockless Synchronization in the Journal of Parallel and Distributed Computing, with Tom Hart, Angela Demke Brown, and Jonathan Walpole (bibtex). (Journal version of the IPDPS‘06 paper.)
- October 2007 The design of preemptable read-copy update at Linux Weekly News (bibtex).
- August 2007 Using Promela and Spin to verify parallel algorithms at Linux Weekly News (bibtex). Includes proof of correctness for QRCU.
- February 2007 "Priority-Boosting RCU Read-Side Critical Sections", revision of earlier Linux Weekly News version (bibtex).
- October 2006 "Sleepable Read-Copy Update", revision of earlier Linux Weekly News version (bibtex).
- July 2006 "Extending RCU for Realtime and Embedded Workloads" with Dipankar Sarma, Ingo Molnar, and Suparna Bhattacharya at OLS‘2006 (bibtex), and corresponding presentation.
- April 2006 "Making Lockless Synchronization Fast: Performance Implications of Memory Reclamation", with Tom Hart and Angela Demke. IPDPS 2006 Best Paper. Paper (bibtex). Presentation.
- July 2005 Abstraction, Reality Checks, and RCU presented at University of Toronto‘s "Cider Seminar" series (abstract).
- April 2005 paper (revised) and presentation describing Linux realtime and yet more modifications to RCU to enable even more aggressive realtime response (bibtex). Presented at the 2005 linux.conf.au.
- January 2005 RCU Semantics: A First Attempt with Jon Walpole (bibtex). Technical report: engineering math meets RCU semantics.
- December 2004 James Morris‘s Recent Developments in SELinux Kernel Performance paper describes how RCU helped scalability of the SELinux audit vector cache (AVC). (I didn‘t have any involvement in creating this paper, but believe that it is well worth bringing to your attention.)
- June 2004 paper describing modifications to the Linux RCU implementation to make it safe for realtime use (bibtex).
- May 2004 dissertation and presentation from Ph.D. defense (bibtex). Also some advice for others who are embarking on a part-time Ph.D. program, and the announcement.
- January 2004 paper and presentation for RCU performance on different CPUs at linux.conf.au in Adelaide, Australia (bibtex).
- January 2004 Scaling dcache with RCU (bibtex).
- October 2003 Linux Journal introduction to RCU (bibtex).
- PDF revision of FREENIX‘03 paper (focusing on use of RCU in Linux‘s System-V IPC implementation) and corresponding presentation (bibtex).
- Enabling Autonomic Behavior in Systems Software With Hot Swapping (bibtex): describes how a RCU (AKA "generations") is used in K42 to enable hot-swapping of implementations of kernel algorithms.
- PDF revision of OLS‘02 paper (focusing on Linux-kernel infrastructure) and corresponding presentation (bibtex).
- PDF revision of OLS‘01 paper (oriented to Linux kernel) and corresponding presentation (bibtex).
- PDF revision of RPE paper (more theoretical).
- PDF revision of PDCS‘98 paper (DYNIX/ptx) (bibtex).
- Read-Copy Update: Using Execution History to Implement Low-Overhead Solutions to Concurrency Problems. Introduction to read-copy update.
- (slightly outdated) HTML version.
Linux RCU Work
The best summary of Linux RCU work is graphical, and may be found here.
Some selected RCU patches:
- RCU was accepted into the Linux 2.5.43 kernel. Patches to RCU were applied to the 2.5.44 and 2.5.45 kernels. RCU was thus fully functional in 2.5.45 and later Linux kernels, just in time for the Halloween functionality freeze. ;-)
- Patch to the System V IPC implementation using RCU was accepted into the Linux 2.5.46 kernel.
- Patch providing a lock-free IPv4 route cache was accepted into the Linux 2.5.53 kernel.
- Patch providing lock-free handler traversal for IPMI handling, added to the Linux 2.5.58 kernel.
- Patch providing lock-free lookup of directory entries in the dcache subsystem, added to the Linux 2.5.62 kernel.
- Patches to replace many uses of brlock with RCU in the 2.5.69 kernel, with brlock being entirely eliminated in the 2.5.70 kernel.
- NMI handling for oprofile uses RCU in the 2.5.73 kernel.
- Fix ppc64 {pte,pmd}_free vs. hash_page race with RCU in the 2.6.2 kernel.
- Additional patches to the Linux kernel apply RCU to FD-set management, task-list traversal, and i_shared_sem contention reduction.
- Yet more patches change RCU‘s API to conserve memory and stack space.
- Another patch to monitor RCU grace period.
- Another patch to apply RCU to fasync_lock, perhaps for the 2.7 timeframe.
- Another set of patches apply modifications to the RCU infrastructure to make it safe for soft-realtime use (0/2, 1/2, 2/2).
- The Reiser4 filesystem uses RCU to defer freeing of jnodes.
- An auditing patch uses RCU to guard the lists of auditing rules.
- An SELinux scalability patch uses RCU to guard the audit vector cache, with 500x improvement in write() throughput on 32 CPUs, and about 50% improvement on 2 CPUs.
K42 RCU Work
- K42 is a research OS at IBM that uses RCU pervasively as an existence lock. K42 developed RCU independently, as described in the Gamsa paper.
- K42 also uses RCU as a basis for hot-swapping: overview and infrastructure, and implementation details and results.
-