ETL definition

ETL stands for Extract, Transform and Load, which is a process used to collect data from various sources, transform the data depending on business rules/needs and load the data into a destination database. The need to use ETL arises from the fact that in modern computing business data resides in multiple locations and in many incompatible formats. For example business data might be stored on the file system in various formats (Word docs, PDF, spreadsheets, plain text, etc), or can be stored as email files, or can be kept in a various database servers like MS SQL Server, Oracle and MySQL for example. Handling all this business information efficiently is a great challenge and ETL plays an important role in solving this problem. In a word: Creating Sources and Targets Repositories. Mapping Source and Target Repositories.

Extract – The first step in the ETL process is extracting the data from various sources. Each of the source systems may store its data in completely different format from the rest. The sources are usually flat files or RDBMS, but almost any data storage can be used as a source for an ETL process.
Transform – Once the data has been extracted and converted in the expected format, it’s time for the next step in the ETL process, which is transforming the data according to set of business rules. The data transformation may include various operations including but not limited to filtering, sorting, aggregating, joining data, cleaning data, generating calculated data based on existing values, validating data, etc.
Load – The final ETL step involves loading the transformed data into the destination target, which might be a database or data warehouse.

References:

https://en.wikipedia.org/wiki/Extract,_transform,_load

http://www.sql-tutorial.net/ETL.asp

时间： 2024-10-13 11:40:49

ETL definition的相关文章

What is your definition of a DBA?

What is your definition of a DBA? 你对DBA的定义是什么? Today we have a guest(特邀的) editorial(社论)as Steve is traveling to the UK. 今天我们有一篇特邀社论是Steve游历英国. OK,I know the easy answer is a Database Administrator,but what role is that?If you had to describe it t

【翻译】Android Interface Definition Language (AIDL)

参考地址:https://developer.android.com/guide/components/aidl.html Android Interface Definition Language (AIDL) AIDL (Android Interface Definition Language) is similar to other IDLs you might have worked with. It allows you to define the programming inter

客户视角：Oracle ETL工具ODI

客户视角:Oracle ETL工具ODI 数据集成已成为企业在追求市场份额中的关键技术组件,与依靠手工编码的方式不同,越来越多的企业选择完整的数据集成解决方案来支持其IT战略,从大数据分析到云平台的集成.Dao Research最近进行的一项研究,比较全球领先的几个数据集成解决方案之间的差异,及这些产品技术对现实企业的影响.他们采访了IBM,Informatica的,和甲骨文的客户.此外,他们也阅读了来自这三个供应商的公开可用的解决方案文档.该研究发现,甲骨文在数据集成领域具有某些方面的领先地位

Cannot retrieve definition for form bean allDisServForm on action /allDisSer

看到一个问题,报:Cannot retrieve definition for form bean allDisServForm on action /allDisSer 这种低级错误就是在struts 配置文件里面提示form bean 里面没有添加,可能你只写配置action,或者说form 大小写了.或者说form 你写错了. Cannot retrieve definition for form bean allDisServForm on action /allDisSer,布布扣,b

翻译（一）What is your definition of a DBA?

原文链接:http://www.sqlservercentral.com/articles/Editorial/160538/ 作者:By Ben Kubicek, 2017/09/05 你对DBA的定义是什么 By Ben Kubicek, 2017/09/05 我知道这个问题简单的回答是数据库管理员,但是他扮演的角色

如此强大的开源ETL工具竟然被我发现了

初识 Talend,感觉功能很强大,可以同步多种数据库,同时可以清洗.筛选.java代码处理数据.数据导入导出. Talend是一款针对数据集成工具市场的ETL(数据的提取Extract.传输Transform.载入Load)开源软件.Talend以它的技术和商业双重模式为ETL服务提供了一个全新的远景.它打破了传统的独有封闭服务,提供了一个针对所有规模公司,公开的.创新的.强大的.灵活的软件解决方案.最终,由于Talend的出现,数据整合方案不再是被大公司所独享. Talend可以帮助您节省大

ETL的数据来源，处理，保存

ETL的数据来源,处理,保存 1.ETL 数据来源:HDFS 处理方式:Mapreduce 数据保存:HBase 2.为什么保存在Hbase中数据字段格式不唯一/不相同/不固定,采用hbase的动态列的功能非常适合因为我们的分析一般情况下,是对于部分事件数据进行分析的,不需要所有的数据,希望将数据的初步过滤放到服务器上进行操作,所以采用hbase的regionserver来过滤初步的条件(scan的filter机制) 3.数据处理 MapReduce 数据流: hbase -> mapred

【转载】DataStage（ETL）技术总结

数据整合的核心内容是从数据源中抽取数据,然后对这些数据进行转化,最终加载的目标数据库或者数据仓库中去,这也就是我们通常所说的 ETL 过程(Extract,Transform, Load). IBM WebSphere DataStage(下面简称为DataStage)为整个 ETL 过程提供了一个图形化的开发环境,它是一套专门对多种操作数据源的数据抽取.转换和维护过程进行简化和自动化,并将其输入数据集或数据仓库的集成工具. 通常数据抽取工作分抽取.清洗.转换.装载几个步骤:

Spring - Bean Definition Inheritance

A bean definition can contain a lot of configuration information, including constructor arguments, property values, and container-specific information such as initialization method, static factory method name, and so on. A child bean definition inher

猜你喜欢

十 SSH

一 Struts 1. 定义:该框架使用 MVC 设计模式开发程序 2. 框架概览: 二 Hibernate 1. 作用:提供了利用面向对象的思想来操作关系型数据的接口 2. 框架图示: 三 Spri ...

Linux_arm驱动之按键模拟脉冲实现定时器的精确计时

http://wenku.baidu.com/link?url=-9_IHk-7BFRSAxPAeutaQ8Ifhs0Rs9Qg3yAG9LxsqYqRauQRBwjVJ_xnmQ6R-CKbwVDS ...

图片展示失效容错处理

用于当css样式失效时,能以文字显示内容提示,起个容错的处理] <a class="bg" href="#">淘宝网</a> 方法一:设 ...

脸如何瘦

http://jingyan.baidu.com/article/948f5924584f2bd80ff5f908.html 9.洗脸改变平常洗脸的方式,用温水冷水交互洗脸,来促进血液循环及新陈代谢 ...

apply和call

这里推荐一本设计模式的javascript书. <JavaScript设计模式与开发实践>.作者是腾讯大牛曾探. 我每天都会在里面抽出我受到的理解,作为我的读书笔记.今天就昨天讲的this ...

部落天天送活动

勾选22点后领奖,小白将在22点以后才领奖. 如果不勾选,将每1小时检测一次奖品,有奖就领取了.

1>0

现在几点了? namescape consoleApplication { class Program { static void Main(string[] args) { Console.Writ ...

小蝌蚪感觉还行刺棒南星才能到费

go.ly.com/user/ba6d1f6c03dab8b29d67fd59b974342e/ go.ly.com/user/2dfacbb52cfd8cde7e0d60ebfec13106/ ...

poj1458

1 //Accepted 4112 KB 16 ms 2 //最长公共子串 3 #include <cstdio> 4 #include <cstring> 5 #includ ...

msfpayload的用法

[email protected] ~/msf/metasploit-framework $ ruby msfpayload windows/exec CMD=calc.exe C WARNING: ...

git svn rebase出现了checksum mismatch的错误

http://stackoverflow.com/questions/3156744/git-svn-rebase-checksum-mismatch This solution was the on ...

深度学习文献阅读笔记（4）

31.卷积神经网络及其在及其视觉中的应用(Convolutional Networks and applications in Vision)(英文,会议论文,2010年,IEEE检索) 文章对CNN ...

680. Valid Palindrome II 有效的回文2

Given a non-empty string s, you may delete at most one character. Judge whether you can make it a pa ...

oracle开发之<<SQL Cookbook>>学习笔记整理：第一章检索记录

1.写程序时列出需要的列比直接写SELECT * 更直观.采用这种方式,行数据检索性能相同,列数据检索性能提高:选取部分需要的列时,降低IO和网络传输时间,提高性能. 2.WHERE子句筛选记录配合 ...

字符编码详解及由来

转自 http://blog.csdn.net/hguisu/article/details/7106394 真空管时代的计算机尽管已经步入了现代计算机的范畴,但其体积之大.能耗之高.故障之多.价格之 ...

Tame Big Data using Oracle Data Integration

http://www.oracle.com/webfolder/technetwork/tutorials/obe/fmw/odi/odi_12c/DI_BDL_Guide/BigDataIntegr ...

老男孩教育每日一题-第76天-说说/etc/profile /etc/bashrc .bashrc .bash_profile的区别

参考答案: 每个文件的含义 /etc/profile 主要用是系统的环境变量,同时我们也放些别名/etc/bashrc 主要用来存放系统的别名和自己定义的函数(都可以放到 /etc/pro ...

一个对象构造两次，析构两次

1 #include<iostream> 2 int n=0; 3 using namespace std; 4 class Cbox{ 5 int a ; 6 int b ; 7 int ...

获取两个字符串的最大相同子串

/** 获取两个字符串的最大相同子串. String s1 = "也许成湖科技是今天最大的赢家"; String s2 = "可能成湖科技未必成为今天最大的赢家吧&quo ...

PHP内核探索：一次请求的开始与结束

PHP开始执行以后会经过两个主要的阶段: 处理请求之前的开始阶段请求之后的结束阶段开始阶段有两个过程: 第一个过程是模块初始化阶段(MINIT), 在整个SAPI生命周期内(例如Apache启动以 ...

专题

随机推荐

© 2024 憋错料 | info#biecuoliao.com | 10 q. 0.019 s.