自关联去掉组内重复数据

数据库环境:SQL SERVER 2005

  现有一个表的数据如下,id是主键,p1,p2是字符串类型,如果当前行的p1,p2字段的值分别等于其它行

的字段p2,p1的值,则视这2行记录为一组。比如,id=1和id=5就属于同一组数据。同一组数据只显示id最小

的那行记录,没有组的数据全部显示。

实现思路:

  将表进行自关联左联,假设表的别名是a,b,根据id进行关联,对关联后的结果集进行过滤。如果b.id是空的,则保留,

如果b.id不为空,则只保留a.id比b.id小的记录。

实现的SQL脚本:

/*1.数据准备*/
WITH    x0
          AS ( SELECT   1 AS id ,
                        ‘A‘ AS p1 ,
                        ‘B‘ AS p2
               /*UNION ALL
               SELECT   0 AS id ,
                        ‘A‘ AS p1 ,
                        ‘B‘ AS p2*/
               UNION ALL
               SELECT   2 AS id ,
                        ‘C‘ AS p1 ,
                        ‘D‘ AS p2
               UNION ALL
               SELECT   3 AS id ,
                        ‘E‘ AS p1 ,
                        ‘F‘ AS p2
               UNION ALL
               SELECT   4 AS id ,
                        ‘D‘ AS p1 ,
                        ‘C‘ AS p2
               UNION ALL
               SELECT   5 AS id ,
                        ‘B‘ AS p1 ,
                        ‘A‘ AS p2
               UNION ALL
               SELECT   6 AS id ,
                        ‘H‘ AS p1 ,
                        ‘J‘ AS p2
               UNION ALL
               SELECT   7 AS id ,
                        ‘T‘ AS p1 ,
                        ‘U‘ AS p2
               UNION ALL
               SELECT   8 AS id ,
                        ‘J‘ AS p1 ,
                        ‘H‘ AS p2
               /*UNION ALL
               SELECT   9 AS id ,
                        ‘I‘ AS p1 ,
                        ‘L‘ AS p2
               UNION ALL
               SELECT   10 AS id ,
                        ‘J‘ AS p1 ,
                        ‘K‘ AS p2*/
             ),/*2.去重*/
        x1
          AS ( SELECT   id ,
                        p1 ,
                        p2
               FROM     ( SELECT    id ,
                                    p1 ,
                                    p2 ,
                                    ROW_NUMBER() OVER ( PARTITION BY p1, p2 ORDER BY id ) AS rn
                          FROM      x0
                        ) t
               WHERE    rn = 1
             )
    /*3.求值*/
    SELECT  a.id ,
            a.p1 ,
            a.p2
    FROM    x1 a
            LEFT JOIN x1 b ON b.p1 = a.p2
                              AND b.p2 = a.p1
    WHERE   b.id IS NULL
            OR a.id < b.id

最终实现的效果如图:

也有网友提出通过ASCII来实现,他的实现SQL脚本如下:

WITH    c1
          AS ( SELECT   1 AS id ,
                        ‘A‘ AS p1 ,
                        ‘B‘ AS p2
               /*UNION ALL
               SELECT   0 AS id ,
                        ‘A‘ AS p1 ,
                        ‘B‘ AS p2*/
               UNION ALL
               SELECT   2 AS id ,
                        ‘C‘ AS p1 ,
                        ‘D‘ AS p2
               UNION ALL
               SELECT   3 AS id ,
                        ‘E‘ AS p1 ,
                        ‘F‘ AS p2
               UNION ALL
               SELECT   4 AS id ,
                        ‘D‘ AS p1 ,
                        ‘C‘ AS p2
               UNION ALL
               SELECT   5 AS id ,
                        ‘B‘ AS p1 ,
                        ‘A‘ AS p2
               UNION ALL
               SELECT   6 AS id ,
                        ‘H‘ AS p1 ,
                        ‘J‘ AS p2
               UNION ALL
               SELECT   7 AS id ,
                        ‘T‘ AS p1 ,
                        ‘U‘ AS p2
               UNION ALL
               SELECT   8 AS id ,
                        ‘J‘ AS p1 ,
                        ‘H‘ AS p2
               /*UNION ALL
               SELECT   9 AS id ,
                        ‘I‘ AS p1 ,
                        ‘L‘ AS p2
               UNION ALL
               SELECT   10 AS id ,
                        ‘J‘ AS p1 ,
                        ‘K‘ AS p2*/
             ),
        c2
          AS ( SELECT   MIN(id) AS min_id
               FROM     c1
               GROUP BY ASCII(p1) + ASCII(p2)
             )
    SELECT  c1.*
    FROM    c1
            JOIN c2 ON id = min_id

咋一看,似乎也可以实现同样的需求。实际上,这种写法存在2个问题:

  1.如果p1,p2是多个字符,ASCII的方式只会取第一个字符的ASCII

  2.ASCII(‘A‘)+ASCII(‘D‘)=ASCII(‘B‘)+ASCII(‘C‘),对于这样的数据,用ASCII的方式无法区分

(本文完)

时间: 2024-10-25 10:31:36

自关联去掉组内重复数据的相关文章

多表关联时视图查出重复数据问题

多表关联时本身有一条数据,但是视图查出重复数据用distinct可以解决. 如: 视图如下 SELECT DISTINCT t1.station_id as station_id, t1.tick_sn as tick_sn, t1.order_id as order_id, t2.station_name as station_name, t3.game_id as game_id, FROM electric_lottery_report_info t1LEFT JOIN electric_

sql查询表内重复数据

SELECT * FROM 表名 a WHERE (a.重复字段1,a.重复字段2,a.重复字段3) IN (SELECT 重复字段1,重复字段2,重复字段3 FROM 表名 GROUP BY 重复字段1,重复字段2,重复字段3 HAVING COUNT(*) > 1) 原文地址:https://www.cnblogs.com/yanchaohui/p/10846151.html

【数据库】 关联多表删除重复数据

delete table from table a, ( select b.val1 ,b.val2 ,COUNT(1) as cnt from table b group by b.val1 ,b.val2 haval1g COUNT(1) > 1 )b where a.val1 = b.val1 and a.val2 = b.val2 and a.id not in ( select MAX(id) as id from table a group by val1 ,val2 haval1g

模拟QQ分组(具有伸缩功能) SimpleExpandableListAdapter 适配器的用法,并且可添加组及其组内数据。

1 package com.lixu.qqfenzu; 2 3 import java.util.ArrayList; 4 import java.util.HashMap; 5 import java.util.List; 6 import java.util.Map; 7 8 import android.app.Activity; 9 import android.content.Context; 10 import android.graphics.Color; 11 import an

sql server迁移数据(文件组之间的互相迁移与 文件组内文件的互相迁移)

转自:https://www.cnblogs.com/lyhabc/p/3504380.html?utm_source=tuicool SQLSERVER将数据移到另一个文件组之后清空文件组并删除文件组 总结: (1)如果是一个文件组内只有一个文件 ~~把所有在该文件组内的表删除聚集索引,然后新建聚集索引至新的文件组 (2)如果是一个文件组内多个文件 [1]把某个文件清空转移到其他文件:使用DBCC SHRINKFILE(要移动数据的数据文件逻辑名称,EMPTYFILE) [2]把该文件组内所有

springboot JPA 一对多关联查询 ,动态复杂查询 去除重复数据 in语句使用

目的:根据图书的发布地区查询图书信息实现步骤:1 实体配置one: 图书信息 bookmany: 地区信息 bookarea实体映射,单向映射 book 中增加 area 的集合 并设置 @JoinColumn(name="bookid")@OneToMany bookarea中不需要设置关系 编写查询语句Repository 继承 JpaSpecificationExecutor 重写findAll 并实现 Specification接口的 public Predicate toPr

group by分组后对组内数据进行排序

查询 每个班级英语成绩最高的前两名的记录 原文:https://www.cnblogs.com/hxfcodelife/p/10226934.html select a.Classid,a.English from (select Classid,English,row_number() over(partition by Classid order by English desc) as n from CJ) a where n<=2 另外一种情况:取组内最新的数据select max(cre

转:去掉DataTable重复数据(程序示例比较)

using System; using System.Collections.Generic; using System.Data; using System.Linq; using System.Text; using System.Threading.Tasks; namespace RemoveDupRowDemoTest { class Program { static void Main(string[] args) { DataTable _dt = new DataTable();

数据去重2---高性能重复数据检测与删除技术研究一些零碎的知识

高性能重复数据检测与删除技术研究 这里介绍一些零碎的有关数据重删的东西,以前总结的,放上可以和大家交流交流. 1 数据量的爆炸增长对现有存储系统的容量.吞吐性能.可扩展性.可靠性.安全性. 可维护性和能耗管理等各个方面都带来新的挑战, 消除冗余信息优化存储空间效率成为 缓解存储容量瓶颈的重要手段,现有消除信息冗余的主要技术包括数据压缩[8]和数据去 重. 2 数据压缩是通过编码方法用更少的位( bit)表达原始数据的过程,根据编码 过程是否损失原始信息量,又可将数据压缩细分为无损压缩和有损压缩.