Solr实现SQL的查询与统计--转载

原文地址:http://shiyanjun.cn/archives/78.html

Cloudera公司已经推出了基于Hadoop平台的查询统计分析工具Impala,只要熟悉SQL,就可以熟练地使用Impala来执行查询与分析的功能。不过Impala的SQL和关系数据库的SQL还是有一点微妙地不同的。
下面,我们设计一个表,通过该表中的数据,来将SQL查询与统计的语句,使用Solr查询的方式来与SQL查询对应。这个翻译的过程,是非常有趣的,你可以看到Solr一些很不错的功能。
用来示例的表结构设计,如图所示:

下面,我们通过给出一些SQL查询统计语句,然后对应翻译成Solr查询语句,然后对比结果。

查询对比

  • 条件组合查询

SQL查询语句:

1 SELECT log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type
2 FROM v_i_event
3 WHERE prov_id = 1 AND net_type = 1 AND area_id = 10304 AND time_type = 1 AND time_id >= 20130801 AND time_id <= 20130815
4 ORDER BY log_id LIMIT 10;

查询结果,如图所示:

Solr查询URL:

1 http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=prov_id:1 AND net_type:1 AND area_id:10304 AND time_type:1 AND time_id:[20130801 TO 20130815]&sort=log_id asc&start=0&rows=10

查询结果,如下所示:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">4</int>
    </lst>
    <result name="response" numFound="77" start="0">
        <doc>
            <int name="log_id">6827</int>
            <long name="start_time">1375072117</long>
            <long name="end_time">1375081683</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">11002</int>
            <int name="cnt">0</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6827</int>
            <long name="start_time">1375072117</long>
            <long name="end_time">1375081683</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">11000</int>
            <int name="cnt">0</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6851</int>
            <long name="start_time">1375142158</long>
            <long name="end_time">1375146391</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">14001</int>
            <int name="cnt">5</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6851</int>
            <long name="start_time">1375142158</long>
            <long name="end_time">1375146391</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">11002</int>
            <int name="cnt">23</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6851</int>
            <long name="start_time">1375142158</long>
            <long name="end_time">1375146391</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">10200</int>
            <int name="cnt">55</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6851</int>
            <long name="start_time">1375142158</long>
            <long name="end_time">1375146391</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">14000</int>
            <int name="cnt">4</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6851</int>
            <long name="start_time">1375142158</long>
            <long name="end_time">1375146391</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">11000</int>
            <int name="cnt">1</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6851</int>
            <long name="start_time">1375142158</long>
            <long name="end_time">1375146391</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">10201</int>
            <int name="cnt">31</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6851</int>
            <long name="start_time">1375142158</long>
            <long name="end_time">1375146391</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">8002</int>
            <int name="cnt">8</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6851</int>
            <long name="start_time">1375142158</long>
            <long name="end_time">1375146391</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10304</int>
            <int name="idt_id">8000</int>
            <int name="cnt">30</int>
            <int name="net_type">1</int>
        </doc>
    </result>
</response>

对比上面结果,除了根据idt_id排序方式不同以外(Impala是升序,Solr是降序),其他是相同的。

  • 单个字段分组统计

SQL查询语句:

1 SELECT prov_id, SUM(cnt) AS sum_cnt, AVG(cnt) AS avg_cnt, MAX(cnt) AS max_cnt, MIN(cnt) ASmin_cnt, COUNT(cnt) AS count_cnt
2 FROM v_i_event
3 GROUP BY prov_id;

查询结果,如图所示:

Solr查询URL:

1 http://slave1:8888/solr-cloud/i_event/select?q=*:*&stats=true&stats.field=cnt&rows=0&indent=true

查询结果,如下所示:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">2</int>
    </lst>
    <result name="response" numFound="4088" start="0"></result>
    <lst name="stats">
        <lst name="stats_fields">
            <lst name="cnt">
                <double name="min">0.0</double>
                <double name="max">1258.0</double>
                <long name="count">4088</long>
                <long name="missing">0</long>
                <double name="sum">32587.0</double>
                <double name="sumOfSquares">9170559.0</double>
                <double name="mean">7.971379647749511</double>
                <double name="stddev">46.69344567709268</double>
                <lst name="facets" />
            </lst>
        </lst>
    </lst>
</response>

对比查询结果,Solr提供了更多的统计项,如标准差(stddev)等,与SQL查询结果是一致的。

  • IN条件查询

SQL查询语句:

1 SELECT log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_typ
2 FROM v_i_event
3 WHERE prov_id = 1 AND net_type = 1 AND city_id IN(106,103) AND idt_idIN(12011,5004,6051,6056,8002) AND time_type = 1 AND time_id >= 20130801 AND time_id <= 20130815
4 ORDER BY log_id, start_time DESC LIMIT 10;

查询结果,如图所示:

Solr查询URL:

http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id, cnt,net_type&fq=prov_id:1 AND net_type:1 AND (city_id:106 OR city_id:103) AND (idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002) AND time_type:1 AND time_id:[20130801 TO 20130815]&sort=log_id asc ,start_time desc&start=0&rows=10

或者:

http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id, cnt ,net_type&fq=prov_id:1&fq=net_type:1&fq=(city_id:106 OR city_id:103)&fq=(idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002)&fq=time_type:1&fq=time_id:[20130801 TO 20130815]&sort=log_id asc,start_time desc&start=0&rows=10

查询结果,如下所示:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">6</int>
    </lst>
    <result name="response" numFound="63" start="0">
        <doc>
            <int name="log_id">6553</int>
            <long name="start_time">1374054184</long>
            <long name="end_time">1374054254</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">12011</int>
            <int name="cnt">0</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6553</int>
            <long name="start_time">1374054184</long>
            <long name="end_time">1374054254</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">5004</int>
            <int name="cnt">2</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6555</int>
            <long name="start_time">1374055060</long>
            <long name="end_time">1374055158</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">70104</int>
            <int name="idt_id">5004</int>
            <int name="cnt">3</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6555</int>
            <long name="start_time">1374055060</long>
            <long name="end_time">1374055158</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">70104</int>
            <int name="idt_id">12011</int>
            <int name="cnt">0</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6595</int>
            <long name="start_time">1374292508</long>
            <long name="end_time">1374292639</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">5004</int>
            <int name="cnt">4</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6611</int>
            <long name="start_time">1374461233</long>
            <long name="end_time">1374461245</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">5004</int>
            <int name="cnt">1</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6612</int>
            <long name="start_time">1374461261</long>
            <long name="end_time">1374461269</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">5004</int>
            <int name="cnt">1</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6612</int>
            <long name="start_time">1374461261</long>
            <long name="end_time">1374461269</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">12011</int>
            <int name="cnt">0</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6613</int>
            <long name="start_time">1374461422</long>
            <long name="end_time">1374461489</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">6056</int>
            <int name="cnt">1</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6613</int>
            <long name="start_time">1374461422</long>
            <long name="end_time">1374461489</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">6051</int>
            <int name="cnt">1</int>
            <int name="net_type">1</int>
        </doc>
    </result>
</response>

对比查询结果,是一致的。

  • 开区间范围条件查询

SQL查询语句:

1 SELECT log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type
2 FROM v_i_event
3 WHERE net_type = 1 AND idt_id IN(12011,5004,6051,6056,8002) AND time_type = 1 ANDstart_time >= 1373598465 AND end_time < 1374055254
4 ORDER BY log_id, start_time, idt_id DESC LIMIT 30;

查询结果,如图所示:

Solr查询URL:

1 http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=net_type:1 AND (idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002) AND time_type:1 AND start_time:[1373598465 TO 1374055254]&fq =-start_time:1374055254&sort=log_id asc,start_time asc,idt_id desc&start=0&rows=30

1 http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=net_type:1 AND (idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002) AND time_type:1 AND start_time:[1373598465 TO 1374055254] AND -start_time:1374055254&sort=log_id asc,start_time asc,idt_id desc&start=0&rows=30

1 http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=net_type:1&fq=idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002&fq =time_type:1&fq=start_time:[1373598465 TO 1374055254]&fq =-start_time:1374055254&sort=log_id asc,start_time asc,idt_id desc&start=0&rows=30

查询结果,如下所示:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">5</int>
    </lst>
    <result name="response" numFound="4" start="0">
        <doc>
            <int name="log_id">6553</int>
            <long name="start_time">1374054184</long>
            <long name="end_time">1374054254</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">12011</int>
            <int name="cnt">0</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6553</int>
            <long name="start_time">1374054184</long>
            <long name="end_time">1374054254</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">10307</int>
            <int name="idt_id">5004</int>
            <int name="cnt">2</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6555</int>
            <long name="start_time">1374055060</long>
            <long name="end_time">1374055158</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">70104</int>
            <int name="idt_id">12011</int>
            <int name="cnt">0</int>
            <int name="net_type">1</int>
        </doc>
        <doc>
            <int name="log_id">6555</int>
            <long name="start_time">1374055060</long>
            <long name="end_time">1374055158</long>
            <int name="prov_id">1</int>
            <int name="city_id">103</int>
            <int name="area_id">70104</int>
            <int name="idt_id">5004</int>
            <int name="cnt">3</int>
            <int name="net_type">1</int>
        </doc>
    </result>
</response>
  • 多个字段分组统计(只支持count函数)

SQL查询语句:

1 SELECT city_id, area_id, COUNT(cnt) AS count_cnt
2 FROM v_i_event
3 WHERE prov_id = 1 AND net_type = 1
4 GROUP BY city_id, area_id;

查询结果,如图所示:

Solr查询URL:

1 http://slave1:8888/solr-cloud/i_event/select?q=*:*&facet=true&facet.pivot=city_id,area_id&fq=prov_id:1 AND net_type:1&rows=0&indent=true

查询结果,如下所示:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">72</int>
    </lst>
    <result name="response" numFound="1171" start="0"></result>
    <lst name="facet_counts">
        <lst name="facet_queries" />
        <lst name="facet_fields" />
        <lst name="facet_dates" />
        <lst name="facet_ranges" />
        <lst name="facet_pivot">
            <arr name="city_id,area_id">
                <lst>
                    <str name="field">city_id</str>
                    <int name="value">103</int>
                    <int name="count">678</int>
                    <arr name="pivot">
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10307</int>
                            <int name="count">298</int>
                        </lst>
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10315</int>
                            <int name="count">120</int>
                        </lst>
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10317</int>
                            <int name="count">86</int>
                        </lst>
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10304</int>
                            <int name="count">67</int>
                        </lst>
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10310</int>
                            <int name="count">49</int>
                        </lst>
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">70104</int>
                            <int name="count">48</int>
                        </lst>
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10308</int>
                            <int name="count">6</int>
                        </lst>
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">0</int>
                            <int name="count">2</int>
                        </lst>
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10311</int>
                            <int name="count">2</int>
                        </lst>
                    </arr>
                </lst>
                <lst>
                    <str name="field">city_id</str>
                    <int name="value">0</int>
                    <int name="count">463</int>
                    <arr name="pivot">
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">0</int>
                            <int name="count">395</int>
                        </lst>
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10307</int>
                            <int name="count">68</int>
                        </lst>
                    </arr>
                </lst>
                <lst>
                    <str name="field">city_id</str>
                    <int name="value">106</int>
                    <int name="count">10</int>
                    <arr name="pivot">
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10304</int>
                            <int name="count">10</int>
                        </lst>
                    </arr>
                </lst>
                <lst>
                    <str name="field">city_id</str>
                    <int name="value">110</int>
                    <int name="count">8</int>
                    <arr name="pivot">
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">0</int>
                            <int name="count">8</int>
                        </lst>
                    </arr>
                </lst>
                <lst>
                    <str name="field">city_id</str>
                    <int name="value">118</int>
                    <int name="count">8</int>
                    <arr name="pivot">
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">10316</int>
                            <int name="count">8</int>
                        </lst>
                    </arr>
                </lst>
                <lst>
                    <str name="field">city_id</str>
                    <int name="value">105</int>
                    <int name="count">4</int>
                    <arr name="pivot">
                        <lst>
                            <str name="field">area_id</str>
                            <int name="value">0</int>
                            <int name="count">4</int>
                        </lst>
                    </arr>
                </lst>
            </arr>
        </lst>
    </lst>
</response>

对比上面结果,Solr查询结果,需要从上面的各组中进行合并,得到最终的统计结果,结果和SQL结果是一致的。

  • 多个字段分组统计(支持count、sum、max、min等函数)

一次对多个字段进行独立分组统计,Solr可以很好的支持。这相当于执行两个带有GROUP BY子句的SQL,这两个GROUP BY分别只对一个字段进行汇总统计。
SQL查询语句:

1 SELECT city_id, area_id, COUNT(cnt) AS count_cnt
2 FROM v_i_event
3 WHERE prov_id = 1 AND net_type = 1
4 GROUP BY city_id;
5  
6 SELECT city_id, area_id, COUNT(cnt) AS count_cnt
7 FROM v_i_event
8 WHERE prov_id = 1 AND net_type = 1
9 GROUP BY area_id;

查询结果,不再显示。
Solr查询URL:

1 >http://slave1:8888/solr-cloud/i_event/select?q=*:*&stats=true&stats.field=cnt&f.cnt.stats.facet=city_id&&f.cnt.stats.facet=area_id&fq=prov_id:1 AND net_type:1&rows=0&indent=true

查询结果,如下所示:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">6</int>
    </lst>
    <result name="response" numFound="1171" start="0"></result>
    <lst name="stats">
        <lst name="stats_fields">
            <lst name="cnt">
                <double name="min">0.0</double>
                <double name="max">167.0</double>
                <long name="count">1171</long>
                <long name="missing">0</long>
                <double name="sum">3701.0</double>
                <double name="sumOfSquares">249641.0</double>
                <double name="mean">3.1605465414175917</double>
                <double name="stddev">14.260812879164407</double>
                <lst name="facets">
                    <lst name="city_id">
                        <lst name="0">
                            <double name="min">0.0</double>
                            <double name="max">167.0</double>
                            <long name="count">463</long>
                            <long name="missing">0</long>
                            <double name="sum">2783.0</double>
                            <double name="sumOfSquares">238819.0</double>
                            <double name="mean">6.010799136069115</double>
                            <double name="stddev">21.92524420257807</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="110">
                            <double name="min">0.0</double>
                            <double name="max">1.0</double>
                            <long name="count">8</long>
                            <long name="missing">0</long>
                            <double name="sum">3.0</double>
                            <double name="sumOfSquares">3.0</double>
                            <double name="mean">0.375</double>
                            <double name="stddev">0.5175491695067657</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="106">
                            <double name="min">0.0</double>
                            <double name="max">0.0</double>
                            <long name="count">10</long>
                            <long name="missing">0</long>
                            <double name="sum">0.0</double>
                            <double name="sumOfSquares">0.0</double>
                            <double name="mean">0.0</double>
                            <double name="stddev">0.0</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="105">
                            <double name="min">0.0</double>
                            <double name="max">0.0</double>
                            <long name="count">4</long>
                            <long name="missing">0</long>
                            <double name="sum">0.0</double>
                            <double name="sumOfSquares">0.0</double>
                            <double name="mean">0.0</double>
                            <double name="stddev">0.0</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="103">
                            <double name="min">0.0</double>
                            <double name="max">55.0</double>
                            <long name="count">678</long>
                            <long name="missing">0</long>
                            <double name="sum">915.0</double>
                            <double name="sumOfSquares">10819.0</double>
                            <double name="mean">1.3495575221238938</double>
                            <double name="stddev">3.7625525739676986</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="118">
                            <double name="min">0.0</double>
                            <double name="max">0.0</double>
                            <long name="count">8</long>
                            <long name="missing">0</long>
                            <double name="sum">0.0</double>
                            <double name="sumOfSquares">0.0</double>
                            <double name="mean">0.0</double>
                            <double name="stddev">0.0</double>
                            <lst name="facets" />
                        </lst>
                    </lst>
                    <lst name="area_id">
                        <lst name="10308">
                            <double name="min">0.0</double>
                            <double name="max">1.0</double>
                            <long name="count">6</long>
                            <long name="missing">0</long>
                            <double name="sum">1.0</double>
                            <double name="sumOfSquares">1.0</double>
                            <double name="mean">0.16666666666666666</double>
                            <double name="stddev">0.408248290463863</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="10310">
                            <double name="min">0.0</double>
                            <double name="max">5.0</double>
                            <long name="count">49</long>
                            <long name="missing">0</long>
                            <double name="sum">40.0</double>
                            <double name="sumOfSquares">108.0</double>
                            <double name="mean">0.8163265306122449</double>
                            <double name="stddev">1.2528878206593208</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="0">
                            <double name="min">0.0</double>
                            <double name="max">167.0</double>
                            <long name="count">409</long>
                            <long name="missing">0</long>
                            <double name="sum">2722.0</double>
                            <double name="sumOfSquares">238550.0</double>
                            <double name="mean">6.6552567237163816</double>
                            <double name="stddev">23.243931908854</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="10311">
                            <double name="min">0.0</double>
                            <double name="max">0.0</double>
                            <long name="count">2</long>
                            <long name="missing">0</long>
                            <double name="sum">0.0</double>
                            <double name="sumOfSquares">0.0</double>
                            <double name="mean">0.0</double>
                            <double name="stddev">0.0</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="10304">
                            <double name="min">0.0</double>
                            <double name="max">55.0</double>
                            <long name="count">77</long>
                            <long name="missing">0</long>
                            <double name="sum">370.0</double>
                            <double name="sumOfSquares">9476.0</double>
                            <double name="mean">4.805194805194805</double>
                            <double name="stddev">10.064318107786017</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="70104">
                            <double name="min">0.0</double>
                            <double name="max">3.0</double>
                            <long name="count">48</long>
                            <long name="missing">0</long>
                            <double name="sum">51.0</double>
                            <double name="sumOfSquares">117.0</double>
                            <double name="mean">1.0625</double>
                            <double name="stddev">1.1560433254047038</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="10307">
                            <double name="min">0.0</double>
                            <double name="max">12.0</double>
                            <long name="count">366</long>
                            <long name="missing">0</long>
                            <double name="sum">274.0</double>
                            <double name="sumOfSquares">768.0</double>
                            <double name="mean">0.7486338797814208</double>
                            <double name="stddev">1.2418218134151426</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="10315">
                            <double name="min">0.0</double>
                            <double name="max">4.0</double>
                            <long name="count">120</long>
                            <long name="missing">0</long>
                            <double name="sum">143.0</double>
                            <double name="sumOfSquares">359.0</double>
                            <double name="mean">1.1916666666666667</double>
                            <double name="stddev">1.2588899560996694</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="10316">
                            <double name="min">0.0</double>
                            <double name="max">0.0</double>
                            <long name="count">8</long>
                            <long name="missing">0</long>
                            <double name="sum">0.0</double>
                            <double name="sumOfSquares">0.0</double>
                            <double name="mean">0.0</double>
                            <double name="stddev">0.0</double>
                            <lst name="facets" />
                        </lst>
                        <lst name="10317">
                            <double name="min">0.0</double>
                            <double name="max">5.0</double>
                            <long name="count">86</long>
                            <long name="missing">0</long>
                            <double name="sum">100.0</double>
                            <double name="sumOfSquares">262.0</double>
                            <double name="mean">1.1627906976744187</double>
                            <double name="stddev">1.3093371930442208</double>
                            <lst name="facets" />
                        </lst>
                    </lst>
                </lst>
            </lst>
        </lst>
    </lst>
</response>
  • 多个字段联合分组统计(支持count、sum、max、min等函数)

SQL查询语句:

1 SELECT city_id, area_id, SUM(cnt) AS sum_cnt, AVG(cnt) AS avg_cnt, MAX(cnt) AS max_cnt,MIN(cnt) AS min_cnt, COUNT(cnt) AS count_cnt
2 FROM v_i_event
3 WHERE prov_id = 1 AND net_type = 1
4 GROUP BY city_id, area_id;

查询结果,如图所示:

Solr目前不能简单的支持这种查询,如果想要满足这种查询统计,需要在schema的设计上,将一个字段设置为多值,然后通过多个值进行分组统计。如果应用中查询统计分析的模式比较固定,预先知道哪些字段会用于联合分组统计,完全可以在设计的时候,考虑设置多值字段来满足这种需求。

参考链接

时间: 2024-11-05 13:03:46

Solr实现SQL的查询与统计--转载的相关文章

sql语句查询经纬度范围(转载,源链接失效)

MySQL性能调优 – 使用更为快速的算法进行距离 最近遇到了一个问题,通过不断的尝试最终将某句原本占据近1秒的查询优化到了0.01秒,效率提高了100倍. 问题是这样的,有一张存放用户居住地点经纬度信息的MySQL数据表,表结构可以简化 为:id(int),longitude(long),latitude()long. 而业务系统中有一个功能是查找离某个用户最近的其余数个用户,通过代码分析,可以确定原先的做法基本是这样的: //需要查询的用户的坐标 $lat=20; $lon=20;//执行查

Solr实现SQL的查询语句

http://www.aboutyun.com/thread-7742-1-1.html //查询 http://localhost/order/select?q=item_main_title:澳洲&wt=json&indent=true&fl=id,order_number,src,src_order http://localhost/order/select?q=*:*&wt=json&indent=true&fq=order_date:[144155

thinkphp区间查询、统计查询、SQL直接查询

区间查询 $data['id']=array(array('gt',4),array('lt',10));//默认关系是(and)并且的关系 //SELECT * FROM `tp_user` WHERE ( (`id` > 4) AND (`id` < 10) ) $data['id']=array(array('gt',4),array('lt',10),'or') //关系就是(or)或者的关系 $data['name']=array(array('like','%2%'),array(

SQL Server创建存储过程(转载)

什么是存储过程? q       存储过程(procedure)类似于C语言中的函数 q       用来执行管理任务或应用复杂的业务规则 q       存储过程可以带参数,也可以返回结果 q       存储过程可以包含数据操纵语句.变量.逻辑 控制语句等 存储过程的优点 (1)执行速度快. 存储过程创建是就已经通过语法检查和性能优化,在执行时无需每次编译. 存储在数据库服务器,性能高. (2)允许模块化设计. 只需创建存储过程一次并将其存储在数据库中,以后即可在程序中调用该过程任意次.存储

solr4.5分组查询、统计功能介绍

说到分组统计估计大家都不会陌生,就是数据库的group by语句,但是当我们采用solr4.5全文检索时,数据库提供再好的sql语句都没有任何的意义了,那么在solr4.5中我们如何做到分组统计呢?其实很简单,下面我们来看看怎么做. 示例场景: 现在有个电子商务网站的产品搜索功能,不同的商家发布不同的产品,我们想通过关键词"手机"去查找不同商家下面有多少有关手机的产品.假设索引库的结构是产品id(id).产品标题(title).产品价格(price).商家id(companyId).

SQL Server 查询分析器提供的所有键盘快捷方式(转)

下表列出 SQL Server 查询分析器提供的所有键盘快捷方式. 活动 快捷方式 书签:清除所有书签. CTRL-SHIFT-F2 书签:插入或删除书签(切换). CTRL+F2 书签:移动到下一个书签. F2 功能键 书签:移动到上一个书签. SHIFT+F2 取消查询. ALT+BREAK 连接:连接. CTRL+O 连接:断开连接. CTRL+F4 连接:断开连接并关闭子窗口. CTRL+F4 数据库对象信息. ALT+F1 编辑:清除活动的编辑器窗格. CTRL+SHIFT+DEL 编

python 3 mysql sql逻辑查询语句执行顺序

python 3 mysql sql逻辑查询语句执行顺序 一 .SELECT语句关键字的定义顺序 SELECT DISTINCT <select_list> FROM <left_table> <join_type> JOIN <right_table> ON <join_condition> WHERE <where_condition> GROUP BY <group_by_list> HAVING <havin

SQL数据库查询方法

SQL数据库查询方法 简单查询: 一.投影 select * from 表名 select 列1,列2... from 表名 select distinct 列名 from 表名 二.筛选 select top 数字 列|* from 表名 (一)等值与不等值 select * from 表名 where 列名=值 select * from 表名 where 列名!=值 select * from 表名 where 列名>值 select * from 表名 where 列名<值 selec

SQL Server-深入剖析统计信息

转自: http://www.cnblogs.com/zhijianliutang/p/4190669.html   概念理解 关于SQL Server中的统计信息,在联机丛书中是这样解释的 查询优化的统计信息是一些对象,这些对象包含与值在表或索引视图的一列或多列中的分布有关的统计信息.查询优化器使用这些统计信息来估计查询结果中的基数或行数.通过这些基数估计,查询优化器可以创建高质量的查询计划.例如,查询优化器可以使用基数估计选择索引查找运算符而不是耗费更多资源的索引扫描运算符,从而提高查询性能