ETL definition

ETL stands for Extract, Transform and Load, which is a process used to collect data from various sources, transform the data depending on business rules/needs and load the data into a destination database. The need to use ETL arises from the fact that in modern computing business data resides in multiple locations and in many incompatible formats. For example business data might be stored on the file system in various formats (Word docs, PDF, spreadsheets, plain text, etc), or can be stored as email files, or can be kept in a various database servers like MS SQL Server, Oracle and MySQL for example. Handling all this business information efficiently is a great challenge and ETL plays an important role in solving this problem. In a word: Creating  Sources and Targets  Repositories. Mapping Source and Target Repositories.

  • Extract – The first step in the ETL process is extracting the data from various sources. Each of the source systems may store its data in completely different format from the rest. The sources are usually flat files or RDBMS, but almost any data storage can be used as a source for an ETL process.
  • Transform – Once the data has been extracted and converted in the expected format, it’s time for the next step in the ETL process, which is transforming the data according to set of business rules. The data transformation may include various operations including but not limited to filtering, sorting, aggregating, joining data, cleaning data, generating calculated data based on existing values, validating data, etc.
  • Load – The final ETL step involves loading the transformed data into the destination target, which might be a database or data warehouse.

References:

https://en.wikipedia.org/wiki/Extract,_transform,_load

http://www.sql-tutorial.net/ETL.asp

时间: 2024-10-13 11:40:49

ETL definition的相关文章

What is your definition of a DBA?

What is your definition of a DBA?     你对DBA的定义是什么? Today we have a guest(特邀的) editorial(社论)as Steve is traveling to the UK. 今天我们有一篇特邀社论是Steve游历英国. OK,I know the easy answer is a Database Administrator,but what role is that?If you had to describe it t

【翻译】Android Interface Definition Language (AIDL)

参考地址:https://developer.android.com/guide/components/aidl.html Android Interface Definition Language (AIDL) AIDL (Android Interface Definition Language) is similar to other IDLs you might have worked with. It allows you to define the programming inter

客户视角:Oracle ETL工具ODI

客户视角:Oracle ETL工具ODI 数据集成已成为企业在追求市场份额中的关键技术组件,与依靠手工编码的方式不同,越来越多的企业选择完整的数据集成解决方案来支持其IT战略,从大数据分析到云平台的集成.Dao Research最近进行的一项研究,比较全球领先的几个数据集成解决方案之间的差异,及这些产品技术对现实企业的影响.他们采访了IBM,Informatica的,和甲骨文的客户.此外,他们也阅读了来自这三个供应商的公开可用的解决方案文档.该研究发现,甲骨文在数据集成领域具有某些方面的领先地位

Cannot retrieve definition for form bean allDisServForm on action /allDisSer

看到一个问题,报:Cannot retrieve definition for form bean allDisServForm on action /allDisSer 这种低级错误就是在struts 配置文件里面提示form bean 里面没有添加,可能你只写配置action,或者说form 大小写了.或者说form 你写错了. Cannot retrieve definition for form bean allDisServForm on action /allDisSer,布布扣,b

翻译(一)What is your definition of a DBA?

原文链接:http://www.sqlservercentral.com/articles/Editorial/160538/ 作者:By Ben Kubicek, 2017/09/05 你对DBA的定义是什么                                                                                    By Ben Kubicek,    2017/09/05    我知道这个问题简单的回答是数据库管理员,但是他扮演的角色

如此强大的开源ETL工具竟然被我发现了

初识 Talend,感觉功能很强大,可以同步多种数据库,同时可以清洗.筛选.java代码处理数据.数据导入导出. Talend是一款针对数据集成工具市场的ETL(数据的提取Extract.传输Transform.载入Load)开源软件.Talend以它的技术和商业双重模式为ETL服务提供了一个全新的远景.它打破了传统的独有封闭服务,提供了一个针对所有规模公司,公开的.创新的.强大的.灵活的软件解决方案.最终,由于Talend的出现,数据整合方案不再是被大公司所独享. Talend可以帮助您节省大

ETL的数据来源,处理,保存

ETL的数据来源,处理,保存 1.ETL 数据来源:HDFS 处理方式:Mapreduce 数据保存:HBase 2.为什么保存在Hbase中 数据字段格式不唯一/不相同/不固定,采用hbase的动态列的功能非常适合 因为我们的分析一般情况下,是对于部分事件数据进行分析的,不需要所有的数据,希望将数据的初步过滤放到服务器上进行操作,所以采用hbase的regionserver来过滤初步的条件(scan的filter机制) 3.数据处理 MapReduce 数据流: hbase -> mapred

【转载】DataStage(ETL)技术总结

数据整合的核心内容是从数据源中抽取数据,然后对这些数据进行转化,最终加载的目标数据库或者数据仓库中去,这也就是我们通常所说的 ETL 过程(Extract,Transform,  Load).    IBM WebSphere DataStage(下面简称为DataStage)为整个 ETL 过程提供了一个图形化的开发环境,它是一套专门对多种操作数据源的数据抽取.转换和维护过程进行简化和自动化,并将其输入数据集或数据仓库的集成工具.    通常数据抽取工作分抽取.清洗.转换.装载几个步骤:   

Spring - Bean Definition Inheritance

A bean definition can contain a lot of configuration information, including constructor arguments, property values, and container-specific information such as initialization method, static factory method name, and so on. A child bean definition inher