视图与冗余物理表的查询性能测试

如果有30张表，他们各自有各自的自定义字段，也有一部分公有字段；需要将公有字段归一起来统一查询，一般而言有两种方式：

1.公共字段物理表数据在入库/更新时，更新自己的表的数据，同时亦将数据更新入公共表中

2.视图数据在入库/更新时，更新自己的表的数据，视图因为是逻辑上的表所以查询时可以查询到更新

两种方式各有优劣：

1.公共字段方式

优点：查询公共表时占优势，sql比较简单，pg在解析和优化sql时耗时都比较短；而且数据在一张表中，不需要扫描多张表

缺点：APP表和公共字段表必须同时更新，需要事务[事务由于需要写预写日志等，会降低写性能]；如果表上有索引时，增更删时均需要更新索引，也会降低写性能，尤其是表不断变大的情况下

2.视图的方式

优点：只需要更新各自的APP表即可，各APP表的数据量均会小于总数据量，写占优

缺点：表数量上升，涉及到多表union all，查询时会慢

--当然以上的查询均是指用到公共字段的查询时，APP内部只用自己的APP表，所以查询性能并不受上述影响

所以这两种方式类似于一个南拳北腿，各有擅长的领域，并不好粗暴的进行比较。

不过，对于视图的查询性能差到怎样一个地步，我们还是颇有兴趣的。

环境：单机实体机，8个逻辑核，32G内存，CentOS7.0操作系统，PostgreSQL9.4

表：

APP表：tb_test_1/.../tb_test_30[总共30张表，分别有starttime,endtime以及col[表id]_1/col[表id]_2/.../col[表id]_45总共47个字段

如下图所示：

公共字段表：tb_test_common,分别有startime,endtime以及col0_1/.../col0_40(对应app表的前40个字段)总共42个字段

视图：v_evt，分别有startime,endtime以及col0_1/.../col0_40(对应app表的前40个字段)总共42个字段

APP表每张表100万数据量，30个APP表，对应到tb_test_common中总共3000万数据量

在每张表的starttime和endtime上均建立索引，增快按照starttime和endtime进行where查询时的速度【当然，前面提过，索引一方面消耗存储空间，一方面insert/update时均需要更新索引，降低写速度】

我们查一下执行规划，可以清晰看到select语句是走了bitmap index scan的。

数据准备好了，我们设计了5种比对方式，分别为：

执行结果（第一次执行结果，后面视图查询速度会明显加快，因为有执行规划的缓存）

我用个表格展示，会更加清晰：

	公共字段表	视图
where条件过滤	0.0371432304382	0.379799127579
过滤条件+order by limit 10	11.1989729404	44.5428109169
聚合函数	50.8744649887	50.5313501358
不进行过滤，全表order by取limit	0.122601985931	0.368887901306
count	43.4130840302	67.3434300423

上面的时间均是以秒作为单位。大家应该对性能大概有了个感性的认识。

====华丽分割线，以下是构建数据的脚本======================

#coding:utf-8
import psycopg2
import traceback
import random
import time

HOST="localhost"
DBNAME=""
USERNAME=""
PASSWORD=""

#数据库时间为2016-05-23当天
starttime = 1463932800
endtime = 1464019199

def create_tables(nums):
    arr = list()
    for i in range(0,nums):
        tablename = "tb_test_"+str(i)
        if i == 0:
            tablename = 'tb_test_common'
        sql = "create table " + tablename + "("            + "starttime integer,"            + "endtime integer,"
        colnum = 46
        #公共表时间字段+40个字段，普通APP表时间字段+45个字段
        if i == 0:
            colnum = 41
        for j in range(1,colnum):
            sql += "col"+str(i)+"_"+str(j)+" varchar(1024)"
            if j != colnum - 1:
                sql += ","
        sql += ")"
        arr.append(sql)

    return ";".join(arr)

#写一个一百万数据的文件
def write_million_data(filepath):
    print "create data file start ..."
    f = open(filepath,"w")
    for i in range(0,1000000):
        curstart = random.randint(starttime,endtime)
        f.write(str(curstart)+",")
        f.write(str(random.randint(curstart,endtime))+",")
        for j in range(0,45):
            f.write("hello")
            if j != 44:
                f.write(",")
        f.write("\n")
    f.close()
    print "create data file end ..."

def create_view(nums):
    sql = "create view v_evt as "
    for i in range(1,nums):
        sql += "select starttime,endtime,"
        for j in range(1,41):
            sql += "col"+str(i)+"_"+str(j)+" as col_0_"+str(j)
            if j != 40:
                sql += ","
        sql += " from tb_test_"+str(i)+" "
        if i != nums-1:
            sql += " union all "
    return sql

def init_all():
    filepath = "/tmp/view_data"
    write_million_data(filepath)
    conn = psycopg2.connect(host=HOST,dbname=DBNAME,user=USERNAME,password=PASSWORD)
    cur = conn.cursor()
    #新建31张表；第一张为公有字段表，其他30张为app表
    cur.execute(create_tables(31))
    conn.commit()
    print "create tables"
    #将数据copy入第1-30的APP表和公共表
    for i in range(1,31):
        cur.copy_from(open(filepath,"r"),"tb_test_"+str(i),sep=",")
        print "copy data into table"
    conn.commit()
    for i in range(1,31):
        insql = "insert into tb_test_common select starttime,endtime,"
        for j in range(1,41):
            insql += "col"+str(i)+"_"+str(j)
            if j != 40:
                insql += ','
        insql += " from tb_test_"+str(i)
        print insql
        cur.execute(insql)
    conn.commit()
    #在starttime和endtime字段建立索引
    for i in range(1,31):
        cur.execute("create index index_tb_test_"+str(i)+"_starttime on tb_test_"+str(i)+"(starttime)")
        cur.execute("create index index_tb_test_"+str(i)+"_endime on tb_test_"+str(i)+"(endtime)")
        conn.commit()
        print "create index on index_tb_test_"+str(i)
    cur.execute("create index index_tb_test_common_starttime on tb_test_common(starttime)")
    cur.execute("create index index_tb_test_common_endime on tb_test_common(endtime)")
    conn.commit()
    print "create index on tb_test_common"
    #建立视图
    vsql = create_view(31)
    cur.execute(vsql)
    conn.commit()
    print "create view"

    #将数据写入40张表
    conn.close()

if __name__ == '__main__':
    #init_all()

    '''冗余表的优点在于查询，而缺点在于增删改[需要事务，同时需要写两份]'''
    '''视图的优点在于增删改只需要改原始物理表，而缺点在于查询'''
    '''所以以下测试是用冗余表之长对决视图之短'''

    '''为了查询性能非常优秀，我为公共表和普通表的字段startime和endtime字段建立索引，并在查询条件中使用它'''
    '''但众所周知的，有索引意味着insert和update均需要更新索引，所以写入速度是影响的'''

    #进行性能测试
    conn = psycopg2.connect(host=HOST,dbname=DBNAME,user=USERNAME,password=PASSWORD)
    cur = conn.cursor()

    #第零组，验证 where条件过滤 的速度
    stime = time.time()
    cur.execute("select count(*) from tb_test_common where starttime=1463932800")
    etime = time.time()
    print "query tb_test_common, waste time:",etime-stime,"s;count:",cur.fetchone()

    stime = time.time()
    cur.execute("select count(*) from v_evt where starttime=1463932800")
    etime = time.time()
    print "query v_evt, waste time:",etime-stime,"s;count:",cur.fetchone()

    print "*************************************************************"

    #第一组，验证 过滤条件+order by limit 10 的速度
    stime = time.time()
    cur.execute("select * from tb_test_common where starttime>=1463932800 and starttime<=1463936400 order by starttime desc")
    etime = time.time()
    print "query tb_test_common, waste time:",etime-stime,"s;count:",len(cur.fetchall())

    stime = time.time()
    cur.execute("select * from v_evt where starttime>=1463932800 and starttime<=1463936400 order by starttime desc")
    etime = time.time()
    print "query v_event, waste time:",etime-stime,"s;count:",len(cur.fetchall())

    print "*************************************************************"

    #第二组，验证 聚合函数 的速度
    stime = time.time()
    cur.execute("select to_char(to_timestamp(starttime),'YYYY-MM-DD') as mydate,count(*) as num from tb_test_common where starttime>=1463932800 and starttime<=1463940000 group by mydate order by mydate desc")
    etime = time.time()
    print "query tb_test_common, waste time:",etime-stime,"s;count:",len(cur.fetchall())

    stime = time.time()
    cur.execute("select to_char(to_timestamp(starttime),'YYYY-MM-DD') as mydate,count(*) as num from v_evt where starttime>=1463932800 and starttime<=1463940000 group by mydate order by mydate desc")
    etime = time.time()
    print "query v_event, waste time:",etime-stime,"s;count:",len(cur.fetchall())
    print "*************************************************************"

    #第三组，验证 不进行过滤，全表order by 返回limit的速度
    stime = time.time()
    cur.execute("select * from tb_test_common order by starttime desc limit 100")
    etime = time.time()
    print "query tb_test_common, waste time:",etime-stime,"s;count:",len(cur.fetchall())

    stime = time.time()
    cur.execute("select * from v_evt order by starttime desc limit 100")
    etime = time.time()
    print "query v_event, waste time:",etime-stime,"s;count:",len(cur.fetchall())
    print "*************************************************************"

    #第四组，验证 count 的速度
    stime = time.time()
    cur.execute("select count(*) from tb_test_common")
    etime = time.time()
    print "query tb_test_common, waste time:",etime-stime,"s;count:",cur.fetchone()

    stime = time.time()
    cur.execute("select count(*) from v_evt")
    etime = time.time()
    print "query v_event, waste time:",etime-stime,"s;count:",cur.fetchone()

    cur.close()
    conn.close()

另外附一张二次查询的结果，可以看到视图的查询性能提升明显（因为查询规划被缓存了）

时间： 2024-11-08 22:53:07

视图与冗余物理表的查询性能测试的相关文章

oracle（sql）基础篇系列（二）——多表连接查询、子查询、视图

多表连接查询内连接(inner join) 目的:将多张表中能通过链接谓词或者链接运算符连接起来的数据查询出来. 等值连接(join...on(...=...)) --选出雇员的名字和雇员所在的部门名字 --(1)必须明确的指出重复字段属于哪个表 select ename,dname dept.deptno from emp,dept where emp.deptno = dept.deptno; --(2)新语法:join...on(...=...) select ename,

Mysql 表连接查询

1.内联接(典型的联接运算,使用像 = 或 <> 之类的比较运算符).包括相等联接和自然联接. 内联接使用比较运算符根据每个表共有的列的值匹配两个表中的行.例如,检索 students和courses表中学生标识号相同的所有行. 2.外联接.外联接可以是左向外联接.右向外联接或完整外部联接. 在 FROM子句中指定外联接时,可以由下列几组关键字中的一组指定: 1)LEFT JOIN或LEFT OUTER JOIN 左向外联接的结果集包括 LEF

【Paddy】如何将物理表分割成动态数据表与静态数据表

前言一般来说,物理表的增.删.改.查都受到数据量的制约,进而影响了性能. 很多情况下,你所负责的业务关键表中,每日变动的数据库与不变动的数据量比较,相差非常大. 这里我们将变动的数据称为动态数据,不变动的数据称为静态数据. 举个例子,1张1000W的表,每日动态数据只有1W条,999W条的数据都为静态.往往select或者重复改变的数据都在动态数据中.比如订单表. 所以,如果将动态数据库从表中剥离出来,分割两张表,一张动态数据表,一张静态数据表,从数据量的角度来看,性能是不是就会自然提高了?

Mysql表连接查询

原文地址: https://www.cnblogs.com/qiuqiuqiu/p/6442791.html 1.内联接(典型的联接运算,使用像 = 或 <> 之类的比较运算符).包括相等联接和自然联接. 内联接使用比较运算符根据每个表共有的列的值匹配两个表中的行.例如,检索 students和courses表中学生标识号相同的所有行. 2.外联接.外联接可以是左向外联接.右向外联接或完整外部联接. 在 FROM子句中指定外联接时,可以由下列几组关键字中的一组指定: 1)LEFT JOIN或L

oracle创建视图以及如何创建表

数据库对象视图视图称为虚表,在数据库中不存在实体. 试图本质上对物理表的一种数据保护,让开发者或者用户只能看到局部数据. 创建视图形式: create or replace view as selecr e.empno,e.ename,e.job,e.mgr,e.hiredate,e.deptno from emp e; 使用视图也可以进行一些数据的增删改查 ,但通过视图添加的数据,数据最终添加到物理表中,因为视图是虚表. 视图一般只是基表的部分数据,通过视图向基表添加数据时,基

表连接查询与where后使用子查询的性能分析。

子查询就是在一条查询语句中还有其它的查询语句,主查询得到的结果依赖于子查询的结果. 子查询的子语句可以在一条sql语句的FROM,JOIN,和WHERE后面,本文主要针对在WHERE后面使用子查询与表连接查询的性能做出一点分析. 对于表连接查询和子查询性能的讨论众说纷纭,普遍认为的是表连接查询的性能要高于子查询.本文将从实验的角度,对这两种查询的性能做出验证,并就实验结果分析两种查询手段的执行流程对性能的影响. 首先准备两张表 1,访问日志表mm_log有150829条记录(相关sql文件已放在

三种方法实现从“一个（组）查询过程中返回两个表的查询结果”

还记得開始做机房的时候,遇到了要从一个函数中返回两个表的查询结果.当时的解决方法非常"冲动"也非常"无拘无束",直接在实体类里边加入了其它表的实体,效果是达到了,但总认为不伦不类. 如今介绍三种解决上述问题的方法(代码为VB.net.系统使用三层架构). 题设要求:如果我如今要从卡表和学生表里返回查询信息(卡表的comment,money,status和学生表的所有信息),卡表和学生表例如以下: 图一学生表图二卡表方法一:视图. 比較简单,相信这样的

20.视图是可视化的表。

SQL CREATE VIEW 语句什么是视图? 在 SQL 中,视图是基于 SQL 语句的结果集的可视化的表. 视图包含行和列,就像一个真实的表.视图中的字段就是来自一个或多个数据库中的真实的表中的字段.我们可以向视图添加 SQL 函数.WHERE 以及 JOIN 语句,我们也可以提交数据,就像这些来自于某个单一的表. 注释:数据库的设计和结构不会受到视图中的函数.where 或 join 语句的影响. SQL CREATE VIEW 语法 CREATE VIEW view_name AS

RDIFramework.NET 中多表关联查询分页实例

RDIFramework.NET 中多表关联查询分页实例 RDIFramework.NET,基于.NET的快速信息化系统开发.整合框架,给用户和开发者最佳的.Net框架部署方案.该框架以SOA范式作为指导思想,作为异质系统整合与互操作性.分布式应用提供了可行的解决方案. 分页非常的常见,基本任何项目都会涉及到分页,这没什么好谈的,大多数我们分页对单表的分页比较多,对多表的分页我们可以通过视图来实现,当然还有其他的方式,在这儿,我以一个实例展示下使用我们的RDIFramework.NET来实现多表