scikit-learn:4.4. Unsupervised dimensionality reduction(降维)

参考:http://scikit-learn.org/stable/modules/unsupervised_reduction.html

对于高维features,常常需要在supervised之前unsupervised dimensionality reduction。

下面三节的翻译会在之后附上。

4.4.1. PCA: principal component analysis

decomposition.PCA looks
for a combination of features that capture well the variance of the original features. See Decomposing
signals in components (matrix factorization problems)
. 翻译文章参考:http://blog.csdn.net/mmc2015/article/details/46867597

Examples

4.4.2. Random projections

The module: random_projection provides several toolsfor data reduction
by random projections
. See the relevant section of the documentation: Random
Projection
.

Examples

4.4.3. Feature agglomeration(特征集聚)

cluster.FeatureAgglomeration applies Hierarchical
clustering
 to group together features that behave similarly.

Examples

Feature scaling

Note that if features have very different scaling or statistical properties, cluster.FeatureAgglomeration may
not be able to capture the links between related features. Using a preprocessing.StandardScaler can
be useful in these settings.

Pipelining:The unsupervised data reduction and the supervised estimator can be chained in one step. See Pipeline:
chaining estimators
.

版权声明:本文为博主原创文章,未经博主允许不得转载。

时间: 2024-09-29 17:42:02

scikit-learn:4.4. Unsupervised dimensionality reduction(降维)的相关文章

Machine Learning - XIV. Dimensionality Reduction降维

http://blog.csdn.net/pipisorry/article/details/44705051 机器学习Machine Learning - Andrew NG courses学习笔记 Dimensionality Reduction降维 Motivation Data Compression数据压缩 Motivation Visualization可视化 Principal Component Analysis Problem Formulation主成分分析问题的构想 Pri

Stanford机器学习笔记-10. 降维(Dimensionality Reduction)

10. Dimensionality Reduction Content  10. Dimensionality Reduction 10.1 Motivation 10.1.1 Motivation one: Data Compression 10.2.2 Motivation two: Visualization 10.2 Principal Component Analysis 10.2.1 Problem formulation 10.2.2 Principal Component An

数据降维(Dimensionality reduction)

数据降维(Dimensionality reduction) 应用范围 无监督学习 图片压缩(需要的时候在还原回来) 数据压缩 数据可视化 数据压缩(Data Compression) 将高维的数据转变为低维的数据, 这样我们存储数据的矩阵的列就减少了, 那么我们需要存储的数据就减少了 数据可视化 数据可视化是非常重要的, 通过可视化数据可以发现数据的规律, 但是大多数时候我们到的数据是高维度的, 可视化很困难, 采用数据降维可以将数据降到二维进行数据可视化 加快机器学习算法的速度 维度少了程序

可视化MNIST之降维探索Visualizing MNIST: An Exploration of Dimensionality Reduction

At some fundamental level, no one understands machine learning. It isn’t a matter of things being too complicated. Almost everything we do is fundamentally very simple. Unfortunately, an innate human handicap interferes with us understanding these si

单细胞数据高级分析之初步降维和聚类 | Dimensionality reduction | Clustering

Dimensionality reduction. Throughout the manuscript we use diffusion maps, a non-linear dimensionality reduction technique37. We calculate a cell-to-cell distance matrix using 1 - Pearson correlation and use the diffuse function of the diffusionMap R

Dimensionality Reduction

--Hands-on Machine Learning with Scikit-Learn and TensorFlow -Chapter 8 Introduction 降维 pros:有助于加快训练速度:有助于数据可视化.cons:可能会导致重要信息丢失. Two main approaches to dimensionality:projection and manifold learning Three popular dimensionality reduction techniques

dimensionality reduction动机---data compression

data compression可以使数据占用更少的空间,并且能使算法提速 什么是dimensionality reduction(维数约简)    例1:比如说我们有一些数据,它有很多很多的features,取其中的两个features,如上图所示,一个为物体的长度用cm来度量的,一个也是物体的长度是用inches来度量的,显然这两上features是相关的,画到上图中,近似于一条直线,之所以点不在一条直线上,是因为我们在对物体测量长度是会取整(对cm进行取整,对inches进行取整),这样的

Query意图分析:记一次完整的机器学习过程(scikit learn library学习笔记)

所谓学习问题,是指观察由n个样本组成的集合,并根据这些数据来预测未知数据的性质. 学习任务(一个二分类问题): 区分一个普通的互联网检索Query是否具有某个垂直领域的意图.假设现在有一个O2O领域的垂直搜索引擎,专门为用户提供团购.优惠券的检索:同时存在一个通用的搜索引擎,比如百度,通用搜索引擎希望能够识别出一个Query是否具有O2O检索意图,如果有则调用O2O垂直搜索引擎,获取结果作为通用搜索引擎的结果补充. 我们的目的是学习出一个分类器(classifier),分类器可以理解为一个函数,

Python之扩展包安装(scikit learn)

scikit learn 是Python下开源的机器学习包.(安装环境:win7.0 32bit和Python2.7) Python安装第三方扩展包较为方便的方法:easy_install + packages name 在官网 https://pypi.python.org/pypi/setuptools/#windows-simplified 下载名字为 的文件. 在命令行窗口运行 ,安装后,可在python2.7文件夹下生成Scripts文件夹.把路径D:\Python27\Scripts