如何使用Cassandra来存储time-series类型的数据

Cassandra非常适合存储时序类型的数据，本文我们将使用一个气象站的例子，该气象站每分钟需要存储一条温度数据。

一、方案1，每个设备占用一行

这个方案的思路就是给每个数据源创建一行，比如这里一个气象站的温度就占用一行，然后每个分钟要采集一个温度，那么就让每个时刻的时标将作为列名，而温度值就是列值。

（1）创建表的语句如下：

CREATE TABLE temperature (

weatherstation_id text,

event_time timestamp,

temperature text,

PRIMARY KEY (weatherstation_id,event_time)
);

（2）然后插入如下数据。

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03 07:01:00‘,‘72F‘);

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03 07:02:00‘,‘73F‘);

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03 07:03:00‘,‘73F‘);

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03 07:04:00‘,‘74F‘);

（3）如果要查询这个气象站的所有数据，则如下

SELECT event_time,temperature
FROM temperature
WHERE weatherstation_id=‘1234ABCD‘;

（4）如果要查询某个时间范围的数据，则如下：

SELECT temperature
FROM temperature
WHERE weatherstation_id=‘1234ABCD‘
AND event_time > ‘2013-04-03 07:01:00‘

二、方案2，每个设备的每天的数据占用一行

有时候把一个设备的所有数据存储在一行可能有点困难，比如放不下（这种情况应该很少见），此时我们就可以对上一个方案做拆分，在row key中增加一个表示，比如可以限制把每个设备每一天的数据放在单独一行，这样一行的数量大小就可控了。

（1）创建表

CREATE TABLE temperature_by_day (

weatherstation_id text,

date text,

event_time timestamp,

temperature text,

PRIMARY KEY ((weatherstation_id,date),event_time)
);

（2）插入数据

INSERT INTO
temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03‘,‘2013-04-03 07:01:00‘,‘72F‘);

INSERT INTO
temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03‘,‘2013-04-03 07:02:00‘,‘73F‘);

INSERT INTO
temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-04‘,‘2013-04-04 07:01:00‘,‘73F‘);

INSERT INTO
temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-04‘,‘2013-04-04 07:02:00‘,‘74F‘);

（3）查询某个设备某一天的数据

SELECT *
FROM temperature_by_day
WHERE weatherstation_id=‘1234ABCD‘
AND date=‘2013-04-03‘;

三、方案3，存储带时效性的数据，过期就自动删除

对于时序的数据的另外一种典型应用就是要做循环存储，想象一下，比如我们要在一个dashboard展示最新的10条温度数据，老的数据就没用了，可以不用理会。如果使用其他的数据库，我们往往需要设置一个后台的job去对历史数据做定时清理，我们现在使用pg的时候就是这么干的。但是使用Cassandra，我们可以使用Cassandra的一个叫做过期列（expiring colmn）的新特性，只要超过指定的时间，这个列就自动消失了。

（1）创建表

CREATE TABLE latest_temperatures (

weatherstation_id text,

event_time timestamp,

temperature text,

PRIMARY KEY (weatherstation_id,event_time),

) WITH CLUSTERING ORDER BY (event_time DESC);

（2）插入数据

INSERT INTO
latest_temperatures(weatherstation_id,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03 07:03:00‘,‘72F‘) USING TTL 20;

INSERT INTO
latest_temperatures(weatherstation_id,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03 07:02:00‘,‘73F‘) USING TTL 20;

INSERT INTO
latest_temperatures(weatherstation_id,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03 07:01:00‘,‘73F‘) USING TTL 20;

INSERT INTO
latest_temperatures(weatherstation_id,event_time,temperature)
VALUES (‘1234ABCD‘,‘2013-04-03 07:04:00‘,‘74F‘) USING TTL 20;

（3）观察

在插入数据之后，你可以不断的使用查询语句来看这些数据，我们可以看到他们一条一条的消失，直到最后所有都没了。

总结：

time-series是Cassandra最有竞争力的数据模型之一，

原文摘要：

1） Cassandra can store up to 2 billion columns per row

参考资料：

见附件，http://docs.datastax.com/en/tutorials/Time_Series.pdf

附件列表

时间： 2024-08-26 17:20:14

如何使用Cassandra来存储time-series类型的数据

附件列表

如何使用Cassandra来存储time-series类型的数据的相关文章

Cassandra存储time series类型数据时的内部数据结构？

c#学习基础（2）存储、值类型和引用类型、变量

此声明没有存储类或类型说明符

C# Winform中执行post操作并获取返回的XML类型的数据

javaScript-数据类型和数据类型转换

Caffe3——ImageNet数据集创建lmdb类型的数据

【WebService】CXF处理javaBean等复合类型以及Map等复杂类型的数据

使用Hive或Impala执行SQL语句，对存储在HBase中的数据操作

Caffe2——cifar10数据集创建lmdb或leveldb类型的数据