Building the Unstructured Data Warehouse: Architecture, Analysis, and Design

Building the Unstructured Data Warehouse:
Architecture, Analysis, and Design

earn essential techniques from data warehouse legend Bill
Inmon on how to build the reporting environment your business needs now!

Answers for many valuable business questions hide in text.
How well can your existing reporting environment extract the necessary text from
email, spreadsheets, and documents, and put it in a useful format for analytics
and reporting? Transforming the traditional data warehouse into an efficient
unstructured data warehouse requires additional skills from the analyst,
architect, designer, and developer. This book will prepare you to successfully
implement an unstructured data warehouse and, through clear explanations,
examples, and case studies, you will learn new techniques and tips to
successfully obtain and analyze text.

Master these ten objectives:

  • Build an unstructured data warehouse using the 11-step approach

  • Integrate text and describe it in terms of homogeneity, relevance, medium,
    volume, and structure

  • Overcome challenges including blather, the Tower of Babel, and lack of
    natural relationships

  • Avoid the Data Junkyard and combat the Spider‘s Web

  • Reuse techniques perfected in the traditional data warehouse and Data
    Warehouse 2.0, including iterative development

  • Apply essential techniques for textual Extract, Transform, and Load (ETL)
    such as phrase recognition, stop word filtering, and synonym replacement

  • Design the Document Inventory system and link unstructured text to
    structured data

  • Leverage indexes for efficient text analysis and taxonomies for useful
    external categorization

  • Manage large volumes of data using advanced techniques such as backward
    pointers

  • Evaluate technology choices suitable for unstructured data processing,
    such as data warehouse appliances

The following outline briefly describes each
chapter‘s content:

  • Chapter 1 defines unstructured data and explains why text is the main
    focus of this book.

  • Chapter 2 addresses the challenges one faces when managing unstructured
    data.

  • Chapter 3 discusses the DW 2.0 architecture, which leads into the role of
    the unstructured data warehouse. The unstructured data warehouse is defined
    and benefits are given. There are several features of the conventional data
    warehouse that can be leveraged for the unstructured data warehouse, including
    ETL processing, textual integration, and iterative development.

  • Chapter 4 focuses on the heart of the unstructured data warehouse: Textual
    Extract, Transform, and Load (ETL).

  • Chapter 5 describes the 11 steps required to develop the unstructured data
    warehouse.

  • Chapter 6 describes how to inventory documents for maximum analysis value,
    as well as link the unstructured text to structured data for even greater
    value.

  • Chapter 7 goes through each of the different types of indexes necessary to
    make text analysis efficient. Indexes range from simple indexes, which are
    fast to create and are good if the analyst really knows what needs to be
    analyzed before the indexing process begins, to complex combined indexes,
    which can be made up of any and all of the other kinds of indexes.

  • Chapter 8 explains taxonomies and how they can be used within the
    unstructured data warehouse.

  • Chapter 9 explains ways of coping with large amounts of unstructured data.
    Techniques such as keeping the unstructured data at its source and using
    backward pointers are discussed. The chapter explains why iterative
    development is so important.

  • Chapter 10 focuses on challenges and some technology choices that are
    suitable for unstructured data processing. In addition, the data warehouse
    appliance is discussed.

  • Chapters 11, 12, and 13 put all of the previously discussed techniques and
    approaches in context through three case studies.

Building the Unstructured Data Warehouse: Architecture,
Analysis, and Design,布布扣,bubuko.com

Building the Unstructured Data Warehouse: Architecture,
Analysis, and Design

时间: 2024-09-29 20:26:15

Building the Unstructured Data Warehouse: Architecture, Analysis, and Design的相关文章

对数据集“dsArea”执行查询失败。 (rsErrorExecutingCommand),Query execution failed for dataset 'dsArea'. (rsErrorExecutingCommand),Manually process the TFS data warehouse and analysis services cube

错误提示: 处理报表时出错. (rsProcessingAborted)对数据集“dsArea”执行查询失败. (rsErrorExecutingCommand)Team System 多维数据集或者不存在,或者未经处理. 解决方法: Manually process the TFS data warehouse and analysis services cube When you need the freshest data in your reports, when errors have

Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 1

转自: http://www.confluent.io/blog/stream-data-platform-1/ These days you hear a lot about "stream processing", "event data", and "real-time", often related to technologies like Kafka, Storm, Samza, or Spark's Streaming module.

System Center 2012 R2 POC部署之Services Manager Data Warehouse部署

System Center 2012 R2 POC部署之Services Manager Data Warehouse部署 1. 载入安装光盘,运行安装程序,选择Service Manager数据仓库管理服务器 2. 设置产品注册信息 3. 设置安装路径 4. 检查软硬件环境 5. 配置数据仓库数据库,输入数据库服务器,选择实例 6. 配置附加数据仓库数据市场,输入数据库服务器 7. 配置数据仓库管理组,输入组名称,选择管理组管理员 8. 配置数据仓库报表服务器,输入报表服务器名称 9. 配置服

Data Warehouse Definition

Data Warehouse Definition Different people have different definitions for a data warehouse. The most popular definition came from Bill Inmon, who provided the following: A data warehouse is a subject-oriented(面向主题), integrated(集成的), time-variant(随时间变

DataBase vs Data Warehouse

Database https://en.wikipedia.org/wiki/Database A database is an organized collection of data.[1] A relational database, more restrictively, is a collection of schemas, tables, queries, reports, views, and other elements. Database designers typically

使用PowerShell在Azure China创建Data Warehouse

微软的Azure Data Warehouse是基于MPP架构的分布式系统: Control Node负责管理系统和接受用户的请求,Compute Node负责计算. 目前在国内Azure Data Warehouse已经落地了.可以使用新的Portal页面进行管理,也可以使用PowerShell进行管理. 本文将介绍用PowerShell的管理方式.包括创建.Scale out.Suspend和Resume. 1 环境准备 登陆Azure China,并创建Resource Group $my

混合 Data Warehouse 和 Big Data 倉庫的新架構

(讀書筆記)許多公司,儘管想導入 Big Data,仍必須繼續用 Data Warehouse 來管理結構化的營運數據.系統記錄.而 Big Data 的出現,為 Data Warehouse 提供了一個互補的機會,而不是取代後者. 高度結構化的營運資料 (data,數據),仍然可保留在 Data Warehouse 中:而分散式 (distributed) 的資料,以及會即時改變的資料,則可交由基於 Hadoop 的架構來控制. 圖 1 傳統的 Data Warehouse 和 Data Ma

Data Warehouse

Knowledge Discovery Process OLTP & OLAP 联机事务处理(OLTP, online transactional processing)系统:涵盖组织机构大部分的日常操作,purchasing, inventory, banking,manufacturing, payroll, registration, accounting 联机分析处理(OLAP, online analytical processing)系统:以不同的格式组织和提供数据,以满足不同用户的

BI 底座——数据仓库技术(Data Warehouse)

在开始喷这个主题之前,让我们先看看数据仓库的官方定义: 数据仓库(Data Warehouse)是一个面向主题的(Subject Oriented).集成的(Integrate).相对稳定的(Non-Volatile).反映历史变化(Time Variant)的数据集合,用于支持管理决策.以上是数据仓库的官方定义. "操作型数据库"如银行里记账系统数据库,每一次业务操作(比如你存了5元钱),都会立刻记录到这个数据库中,长此以往,满肚子积累的都是零碎的数据,这种干脏活累活还不得闲的数据库