The Myths about Transactions (ACID) and NoSQL

There has been widespread characterization of one of the major distinctions between NoSQL and traditional DBMSs by saying that the former don’t care for ACID semantics or that transactions aren’t needed. This is an oversimplification to say the least. As long as the NoSQL system supports incremental updates by concurrent set of users (as opposed to only single-threaded bulk or batch updates), even if multi-API-calls transactions are not supported, at least within the internals of such a system some notion of transaction is essential to retain a certain level of sanity of the internal design and keep things consistent. This is even more important if the system supports replication and/or the updating of multiple data structures within the system even in a single API call (e.g., if there are multiple access paths which have to be updated). Similar points apply to locking and recovery semantics and functionality.

The above sorts of issues are real and were quite tricky to handle in Lotus Notes, which used very ad hoc ways of dealing with the associated complications, until log-based recovery and transaction support were added in R5 (http://bit.ly/LNotes). From Day 1 in 1989, Notes has supported replication and disconnected operations with the consequent issues of potentially conflicting parallel updates having to be dealt with. Even RDBMSs were late in dealing with that kind of functionality.

Even if at the individual object level, high concurrency isn’t important given the nature of a NoSQL application, it might still be important from the viewpoint of the internal data structures of the NoSQL system to support high concurrency or fine granularity locking/latching (e.g., for dealing with concurrent accesses to the space management related data structures - see http://bit.ly/CMSpMg).

Vague discussions about NoSQL systems and ACID semantics make many people think that RDBMSs enforce strong ACID semantics all the time. This is completely wrong if by that people imply serializability as the correctness property for handling concurrent execution of transactions. Even from the very beginning, RDBMSs (System R and products that came from it) have supported different degrees of isolation, in some cases even the option of of being able to read uncommitted data, and different granularities of locking (http://bit.ly/CMQuCC). Even with respect to durability, in-memory RDBMSs like TimeTen and SolidDB which came much later, allowed soft commits, etc., trading off durability guarantees for improved performance.

In my last 2 posts on NoSQL (http://bit.ly/NoSQLt http://bit.ly/NoSQL2), I gave a lot of information on my background to make it clear to the readers that this whole space of data management is a tricky business. The devil is in the details and it isn’t for the faint hearted :-) I wanted to make it clear that I don’t believe in quick and dirty approaches to handling intrinsically complicated issues and that I am not somebody who takes frequent elevator rides with VCs :-) At the same time, I am not an ivory tower researcher either! When I hear many presentations on “my kind of topics” at various conferences and meetings like the Hadoop User Group (HUG), I have a tough time making sense of what is going on given the high level nature of what is being presented with no serious attempts being made to compare what is proposed with what has been done before and about which more is known.

Of course, NoSQL systems aren’t the only context in which such things have happened in the past. A great number of people have talked about optimistic concurrency control and recovery without much of the details really being worked out (see my discussions on this topic in http://bit.ly/CMOpCC). Even now some of the NewSQL people make some tall claims about how traditional recovery isn’t needed and that they can get away without logging while still supporting SQL, etc. One has to quiz them quite a bit to discover that they do in fact do some bookkeeping that they choose not to describe as logging and/or that they don’t support statement-level atomicity even though they support SQL and SQL requires it!

For some people, it might be very tempting to think that the NoSQL applications are so much different from traditional database applications that simple things are sufficient (“good enough” being the often used phrase to describe such things) and that overnight mastery of the relevant material is possible. Even in the Web 2.0 space, if the application programmers are not to go crazy, more of the burden has to be taken up by the designers of the NoSQL systems. A case in point is how the Facebook messaging system designers decided eventual consistency semantics is too painful to deal with. To begin with, if the NoSQL systems have vague semantics of what they support and subsequently, as they evolve, if such things keep changing, users will be in big trouble! Also, with no standards in place for these systems, if users want to change systems for any number of reasons, applications might require significant rewriting to keep end user semantics consistent over time.

时间: 2024-11-05 11:50:10

The Myths about Transactions (ACID) and NoSQL的相关文章

Nosql里典型的数据库

Nosql里典型的数据库 Redis 对网站服务器进行写入 传统关系式数据库无法过多的写入 对数据库要求: 数据库高并发读写需求 解决方案: (1:读写分离 两台主如果同时写入会发生冲突 (2:水平分割: 关系式数据库 数据之间有操作 海量数据的高效率存储和访问的需求 用户如果在海量数据中查询某一条数据 记录 数据库的高扩展性和高可用性 ############################################### 任何一个领域,如果不能通过自己的努力 去获取或者超出其他人的竞争

NOSQL基础概念

NoSql是一个很老的概念了,但对自己来说,仍然是一个短板,果断补上. 首先通过几个简单的例子来了解NOSQL在国内的情况(2013年左右的数据,有些过时),比如新浪微博,其就有200多台物理机运行着Redis,其结合NOSQL和MySQL一起使用,关系型数据,通过索引保存在MYSQL中,K/V数据保存在Redis中.淘宝的Oceanbase用于处理线上事务,Tair用于K/V存储,于2010上线(自己落后时代不少啊).优酷的现在评论业务使用mongoDB存储,运营数据分析及挖掘处理使用Hado

Redis 中 5 种数据结构的使用场景介绍

这篇文章主要介绍了Redis中5种数据结构的使用场景介绍,本文对Redis中的5种数据类型String.Hash.List.Set.Sorted Set做了讲解,需要的朋友可以参考下 一.redis 数据结构使用场景 原来看过 redisbook 这本书,对 redis 的基本功能都已经熟悉了,从上周开始看 redis 的源码.目前目标是吃透 redis 的数据结构.我们都知道,在 redis 中一共有5种数据结构,那每种数据结构的使用场景都是什么呢? String——字符串 Hash——字典

redis 数据类型详解 以及 redis适用场景场合

1.  MySql+Memcached架构的问题 实际MySQL是适合进行海量数据存储的,通过Memcached将热点数据加载到cache,加速访问,很多公司都曾经使用过这样的架构,但随着业务数据量的不断增加,和访问量的持续增长,我们遇到了很多问题: 1.MySQL需要不断进行拆库拆表,Memcached也需不断跟着扩容,扩容和维护工作占据大量开发时间. 2.Memcached与MySQL数据库数据一致性问题. 3.Memcached数据命中率低或down机,大量访问直接穿透到DB,MySQL无

Redis 数据结构使用场景

Redis 数据结构使用场景 redis共有5种数据结构,每种的使用场景都是什么? 一.redis 数据结构使用场景 原来看过 redisbook 这本书,对 redis 的基本功能都已经熟悉了,从上周开始看 redis 的源码.目前目标是吃透 redis 的数据结构.我们都知道,在 redis 中一共有5种数据结构,那每种数据结构的使用场景都是什么呢? String——字符串 Hash——字典 List——列表 Set——集合 Sorted Set——有序集合 下面我们就来简单说明一下它们各自

redis中5种数据结构的使用

一.redis 数据结构使用场景 原来看过 redisbook 这本书,对 redis 的基本功能都已经熟悉了,从上周开始看 redis 的源码.目前目标是吃透 redis 的数据结构.我们都知道,在 redis 中一共有5种数据结构,那每种数据结构的使用场景都是什么呢? String——字符串Hash——字典List——列表Set——集合Sorted Set——有序集合 下面我们就来简单说明一下它们各自的使用场景: 1. String——字符串 String 数据结构是简单的 key-valu

redis 体系结构

程序strings   key-value 类型 ,value不仅是String,也可以是数字.使用strings 类型可以完全实现目前 Memcache 的功能,并且效率更高,还可以享受redis的定时持久化,操作日志及replication 等功能. 除了提供与 Memcache一样的get ,set,incr,decr 等操作外,redis还提供下面的操作 1.获取字符串的长度 2.往字符串append的内容 3.设置和获取字符串的一段内容 4.设置及获取字符串的某一位(bit) 5.批量

Redis中5种数据结构的使用场景

一.redis 数据结构使用场景 原来看过 redisbook 这本书,对 redis 的基本功能都已经熟悉了,从上周开始看 redis 的源码.目前目标是吃透 redis 的数据结构.我们都知道,在 redis 中一共有5种数据结构,那每种数据结构的使用场景都是什么呢? String--字符串Hash--字典List--列表Set--集合Sorted Set--有序集合 下面我们就来简单说明一下它们各自的使用场景: 1. String--字符串 String 数据结构是简单的 key-valu

redis数据类型及使用场景

Redis数据类型  String: Strings 数据结构是简单的key-value类型,value其实不仅是String,也可以是数字. 常用命令:  set,get,decr,incr,mget 等. 应用场景:String是最常用的一种数据类型,普通的key/ value 存储都可以归为此类.即可以完全实现目前 Memcached 的功能,并且效率更高.还可以享受Redis的定时持久化,操作日志及 Replication等功能.除了提供与 Memcached 一样的get.set.in