Building the Unstructured Data Warehouse: Architecture, Analysis, and Design

Building the Unstructured Data Warehouse:
Architecture, Analysis, and Design

earn essential techniques from data warehouse legend Bill
Inmon on how to build the reporting environment your business needs now!

Answers for many valuable business questions hide in text.
How well can your existing reporting environment extract the necessary text from
email, spreadsheets, and documents, and put it in a useful format for analytics
and reporting? Transforming the traditional data warehouse into an efficient
unstructured data warehouse requires additional skills from the analyst,
architect, designer, and developer. This book will prepare you to successfully
implement an unstructured data warehouse and, through clear explanations,
examples, and case studies, you will learn new techniques and tips to
successfully obtain and analyze text.

Master these ten objectives:

Build an unstructured data warehouse using the 11-step approach

Integrate text and describe it in terms of homogeneity, relevance, medium,
volume, and structure

Overcome challenges including blather, the Tower of Babel, and lack of
natural relationships

Avoid the Data Junkyard and combat the Spider‘s Web

Reuse techniques perfected in the traditional data warehouse and Data
Warehouse 2.0, including iterative development

Apply essential techniques for textual Extract, Transform, and Load (ETL)
such as phrase recognition, stop word filtering, and synonym replacement

Design the Document Inventory system and link unstructured text to
structured data

Leverage indexes for efficient text analysis and taxonomies for useful
external categorization

Manage large volumes of data using advanced techniques such as backward
pointers

Evaluate technology choices suitable for unstructured data processing,
such as data warehouse appliances

The following outline briefly describes each
chapter‘s content:

Chapter 1 defines unstructured data and explains why text is the main
focus of this book.

Chapter 2 addresses the challenges one faces when managing unstructured
data.

Chapter 3 discusses the DW 2.0 architecture, which leads into the role of
the unstructured data warehouse. The unstructured data warehouse is defined
and benefits are given. There are several features of the conventional data
warehouse that can be leveraged for the unstructured data warehouse, including
ETL processing, textual integration, and iterative development.

Chapter 4 focuses on the heart of the unstructured data warehouse: Textual
Extract, Transform, and Load (ETL).

Chapter 5 describes the 11 steps required to develop the unstructured data
warehouse.

Chapter 6 describes how to inventory documents for maximum analysis value,
as well as link the unstructured text to structured data for even greater
value.

Chapter 7 goes through each of the different types of indexes necessary to
make text analysis efficient. Indexes range from simple indexes, which are
fast to create and are good if the analyst really knows what needs to be
analyzed before the indexing process begins, to complex combined indexes,
which can be made up of any and all of the other kinds of indexes.

Chapter 8 explains taxonomies and how they can be used within the
unstructured data warehouse.

Chapter 9 explains ways of coping with large amounts of unstructured data.
Techniques such as keeping the unstructured data at its source and using
backward pointers are discussed. The chapter explains why iterative
development is so important.

Chapter 10 focuses on challenges and some technology choices that are
suitable for unstructured data processing. In addition, the data warehouse
appliance is discussed.

Chapters 11, 12, and 13 put all of the previously discussed techniques and
approaches in context through three case studies.

Building the Unstructured Data Warehouse: Architecture,
Analysis, and Design,布布扣,bubuko.com

Building the Unstructured Data Warehouse: Architecture,
Analysis, and Design

时间： 2024-09-29 20:26:15

Building the Unstructured Data Warehouse: Architecture, Analysis, and Design的相关文章

对数据集“dsArea”执行查询失败。 (rsErrorExecutingCommand),Query execution failed for dataset 'dsArea'. (rsErrorExecutingCommand),Manually process the TFS data warehouse and analysis services cube

错误提示: 处理报表时出错. (rsProcessingAborted)对数据集“dsArea”执行查询失败. (rsErrorExecutingCommand)Team System 多维数据集或者不存在,或者未经处理. 解决方法: Manually process the TFS data warehouse and analysis services cube When you need the freshest data in your reports, when errors have

Building the Unstructured Data Warehouse: Architecture, Analysis, and Design

Building the Unstructured Data Warehouse: Architecture, Analysis, and Design的相关文章

对数据集“dsArea”执行查询失败。 (rsErrorExecutingCommand),Query execution failed for dataset 'dsArea'. (rsErrorExecutingCommand),Manually process the TFS data warehouse and analysis services cube

Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 1

System Center 2012 R2 POC部署之Services Manager Data Warehouse部署

Data Warehouse Definition

DataBase vs Data Warehouse

使用PowerShell在Azure China创建Data Warehouse

混合 Data Warehouse 和 Big Data 倉庫的新架構

Data Warehouse

BI 底座——数据仓库技术(Data Warehouse)