SSIS 对数据排序

SSIS 对数据排序有两种方式,一种是使用Sort组件,一种是使用sql command的order by clause进行排序。

一,使用Sort组件进行排序

SortType:升序 ascending,降序 descending

SortOrder:排序列的位置,从1开始依次递增,

Remove wors with duplicate sort values:如果排序列重复,是否删除重复的行,这不同于distinct,distinct是输出的所有列不重复,选中该选项,只是保证排序列(输出列的一部分)不重复。

该属性可以从Sort Transformation Advanced Editor中查看和设置

二,使用sql command的order by clause对数据进行排序

Step1,使用OLEDB提供排序的数据,必须是经过排序的数据

select *
from dbo.course c with(nolock)
order by c.cid asc,c.score desc

Step2,打开OLEDB的Advanced Editor,查看Input and Output Properties选项卡

1,点击OLEDB Source Ouput,设置IsSorted属性为True,该属性设置为true不会对数据排序,只是告知下游组件,该输出数据已经排序。

如果将IsSorted属性设置为True,实际数据并没有排序,在package 运行时会出错,所以必须提供已经排序的数据(在sql 子句中使用order by进行排序)

2,点击Output Columns,逐个设置排序列(Order by Column_List)的SortKeyPosition属性

SortKeyPosition属性有Sort Position和Direction 两个metadata:

正整数表示按照升序排序,0表示不是排序列,负整数表示按照降序排序,数字代表排序列的序号

例如以下的sql语句

select Col_1,Col_2,Col_3,Col_4
from dbo.TableName
order Col_1 asc, Col_2 desc,Col_3 desc

在Output Columns中需要逐个设置,Col_1,Col_2,Col_3,Col_4的SortKeyPosition
由于Col_1,Col_2,Col_3是排序列,序号从1依次递增,而Col_4不是排序列,所以SortKeyPosition的配置如下

Col_1 的SortKeyPosition是 1,第一个排序列,且按照升序排序

Col_2 的SortKeyPosition是 -2,第二个排序列,且按照降序排序

Col_3 的SortKeyPosition是 3,第三个排序列,且按照升序排序

Col_4 的SortKeyPosition是 0,不是排序列

MSDN官方文档

Sort Data for the Merge and Merge Join Transformations

In Integration Services, the Merge and Merge Join transformations require sorted data for their inputs. The input data must be sorted physically, and sort options must be set on the outputs and the output columns in the source or in the upstream transformation. If the sort options indicate that the data is sorted, but the data is not actually sorted, the results of the merge or merge join operation are unpredictable.

You can sort this data by using one of the following methods:

  • In the source, use an ORDER BY clause in the statement that is used to load the data.
  • In the data flow, insert a Sort transformation before the Merge or Merge Join transformation.

If the data is string data, both the Merge and Merge Join transformations expect the string values to have been sorted by using Windows collation. To provide string values to the Merge and Merge Join transformations that are sorted by using Windows collation, use the following procedure.

To provide string values that are sorted by using Windows collation

  • Use a Sort transformation to sort the data.

    The Sort transformation uses Windows collation to sort string values.

    —or—

  • Use the Transact-SQL CAST operator to first cast varchar values to nvarchar values, and then use the Transact-SQL ORDER BY clause to sort the data.
                               Important                        

    You cannot use the ORDER BY clause alone because the ORDER BY clause uses a SQL Server collation to sort string values. The use of the SQL Server collation might result in a different sort order than Windows collation, which can cause the Merge or Merge Join transformation to produce unexpected results.

Setting Sort Options on the Data

There are two important sort properties that must be set for the source or upstream transformation that supplies data to the Merge and Merge Join transformations:

  • The IsSorted property of the output that indicates whether the data has been sorted. This property must be set to True.

                             Important                      

    Setting the value of the IsSorted property to True does not sort the data. This property only provides a hint to downstream components that the data has been previously sorted.

  • The SortKeyPosition property of output columns that indicates whether a column is sorted, the column‘s sort order, and the sequence in which multiple columns are sorted. This property must be set for each column of sorted data.

If you use a Sort transformation to sort the data, the Sort transformation sets both of these properties as required by the Merge or Merge Join transformation. That is, the Sort transformation sets the IsSorted property of its output to True, and sets theSortKeyPosition properties of its output columns.

However, if you do not use a Sort transformation to sort the data, you must set these sort properties manually on the source or the upstream transformation. To manually set the sort properties on the source or upstream transformation, use the following procedure.

To manually set sort attributes on a source or transformation component

  1. In SQL Server Data Tools (SSDT), open the Integration Services project that contains the package you want.
  2. In Solution Explorer, double-click the package to open it.
  3. On the Data Flow tab, locate the appropriate source or upstream transformation, or drag it from the Toolbox to the design surface.
  4. Right-click the component and click Show Advanced Editor.
  5. Click the Input and Output Properties tab.
  6. Click <component name> Output, and set the IsSorted property to True.
                               Note                        

    If you manually set the IsSorted property of the output to True and the data is not sorted, there might be missing data or bad data comparisons in the downstream Merge or Merge Join transformation when you run the package.

  7. Expand Output Columns.
  8. Click the column that you want to indicate is sorted and set its SortKeyPosition property to a nonzero integer value by following these guidelines:
    • The integer value must represent a numeric sequence, starting with 1 and incremented by 1.
    • A positive integer value indicates an ascending sort order.
    • A negative integer value indicates a descending sort order. (If set to a negative number, the absolute value of the number determines the column‘s position in the sort sequence.)
    • The default value of 0 indicates that the column is not sorted. Leave the value of 0 for output columns that do not participate in the sort.

    As an example of how to set the SortKeyPosition property, consider the following Transact-SQL statement that loads data in a source:

    SELECT * FROM MyTable ORDER BY ColumnA, ColumnB DESC, ColumnC

    For this statement, you would set the SortKeyPosition property for each column as follows:

    • Set the SortKeyPosition property of ColumnA to 1. This indicates that ColumnA is the first column to be sorted and is sorted in ascending order.
    • Set the SortKeyPosition property of ColumnB to -2. This indicates that ColumnB is the second column to be sorted and is sorted in descending order
    • Set the SortKeyPosition property of ColumnC to 3. This indicates that ColumnC is the third column to be sorted and is sorted in ascending order.
  9. Repeat step 8 for each sorted column.
  10. Click OK.
  11. To save the updated package, click Save Selected Items on the File menu.
时间: 2024-10-09 21:45:37

SSIS 对数据排序的相关文章

已知s.txt文件中有一个这样的字符串 请编写程序读取数据内容,把数据排序后写入 ss.txt文件

package cn.idcast5; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.util.Arrays; /* * 需求:已知s.txt文件中有一个这样

SQL从入门到基础 - 04 SQLServer基础2(数据删除、数据检索、数据汇总、数据排序、通配符过滤、空值处理、多值匹配)

一.数据删除 1. 删除表中全部数据:Delete from T_Person. 2. Delete 只是删除数据,表还在,和Drop Table(数据和表全部删除)不同. 3. Delete 也可以带where子句来删除一部分数据:Delete from T_Person where FAge>20. 二.数据检索 1. 执行备注中的代码创建测试数据表. 2. 简单的数据检索:select *from T_Employee(*表示所有字段) 3. 只检索需要的列:select FNumber

Python学习——数据排序方法

Python对数据排序又两种方法: 1. 原地排序:采用sort()方法,按照指定的顺序排列数据后用排序后的数据替换原来的数据(原来的顺序丢失),如: >>> data1=[4,2,6,432,78,43,22,896,42,677,12] >>> data1.sort() >>> data1 #原来的顺序被替换 [2, 4, 6, 12, 22, 42, 43, 78, 432, 677, 896] 2. 复制排序:采用sorted()内置函数,按照

21、任务十九——可视化数据排序

0.题目 基于任务18 限制输入的数字在10-100 队列元素数量最多限制为60个,当超过60个时,添加元素时alert出提示 队列展现方式变化如图,直接用高度表示数字大小 实现一个简单的排序功能,如冒泡排序(不限制具体算法),用可视化的方法表达出来 1.解答过程 <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>task19</title> &

namenode示例 数据排序

2.0 数据排序 "数据排序"是许多实际任务执行时要完成的第一项工作,比如学生成绩评比.数据建立索引等.这个实例和数据去重类似,都是先对原始数据进行初步处理,为进一步的数据操作打好基础. 2.1 实例描述 对输入文件中数据进行排序.输入文件中的每行内容均为一个数字,即一个数据.要求在输出中每行有两个间隔的数字,其中,第一个代表原始数据在原始数据集中的位次,第二个代表原始数据. a.txt 7 4 3 b.txt 4 2 样例输出: 1 2 2 3 3 4 4 7 2.2 设计思路 这个

Berkeley DB使用SecondKey给数据排序的实现方法

Berkeley DB使用SecondKey给数据排序的 实现方法是本文我们主要要介绍的内容,在做项目的时候用到了nosql数据库BDB,借此机会研究了一下它的用法.它的官方示例和文档比较丰富,感觉比较 容易学习.在开发过程中出现了一个需求,要把数据根据插入时间遍历,个人认为通过第二主键(SecondKey)比较容易实现. 以下是我的基本实现过程: 1.在ValueBean中加入insertTime属性 public class ValueBean{ private String insertT

设计简单的后台管理的数据排序

场景: 前端需要展示一个列表如首页的轮播图后台需要能够控制列表数据的排序. 需求 自动规则 + 运营干预干预包括降低排名提升排名设定位置和新增item 实现 在Web后台管理系统中列表形式的数据排序功能是很常见的需求.要实现这类功能给数据表增加一个排序字段order越大排名越靠前数字相同时按照自动规则如id排序这样就可以实现了 id name order 1 tom 2 2 jack 3 3 bob 1 实际的排序显示为 jack tom bob 既然是后台那么主要的用户是公司的运营人员本着提高

DataSet 中的数据排序 及 DataRow装成DataTable

1.DataSet 中的数据排序 DataSet ds = new DataSet(); // 获取当前排口的数据 ds = _xiaobill.GetHistoryData(yinZiBianm, zhanDian, beginDate, endDate, dNum); DataTable dt = ds.Tables[0]; DataRow[] dt2 = dt.Select("1=1","数据时间 ASC "); DataRow[]装成DataTable Da

Hadoop mapreduce 数据去重 数据排序小例子

数据去重: 数据去重,只是让出现的数据仅一次,所以在reduce阶段key作为输入,而对于values-in没有要求,即输入的key直接作为输出的key,并将value置空.具体步骤类似于wordcount: Tip:输入输出路径配置. import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop