Case Studies: Finding Similar Documents
Learning Outcomes: By the end of this course, you will be able to:(通过本章的学习,你将掌握)
-Create a document retrieval system using k-nearest neighbors.用K近邻构建文本检索系统
-Identify various similarity metrics for text data.文本相似性矩阵
-Reduce computations in k-nearest neighbor search by using KD-trees.使用KD树降低k近邻搜索计算复杂度
-Produce approximate nearest neighbors using locality sensitive hashing.基于局部敏感哈希生成最近邻
-Compare and contrast supervised and unsupervised learning tasks.比对监督和无监督学习任务
-Cluster documents by topic using k-means.基于k均值的文档话题聚类
-Describe how to parallelize k-means using MapReduce.使用MapReduce并行化k均值
-Examine probabilistic clustering approaches using mixtures models.混合模型聚类
-Fit a mixture of Gaussian model using expectation maximization (EM).使用EM拟合高斯混合模型
-Perform mixed membership modeling using latent Dirichlet allocation (LDA).基于LDA的
-Describe the steps of a Gibbs sampler and how to use its output to draw inferences.Gibbs抽样
-Compare and contrast initialization techniques for non-convex optimization objectives.比对非凸优化技术
-Implement these techniques in Python用Python实现以上内容
========================================================================================================
############chapter2:Nearest Neighbor Search#############
========================================================================================================
Introduction to nearest neighbor search and algorithms近邻搜索和算法介绍
The importance of data representations and distance metrics数据表示和距离度量的重要性
Programming Assignment 1编程任务1
Scaling up k-NN search using KD-trees基于KD树实现k近邻搜索
Locality sensitive hashing for approximate NN search基于局部敏感哈希实现近邻搜索
Programming Assignment 2编程任务2
Summarizing nearest neighbor search小结
========================================================================================================
############chapter3:Clustering with k-means#############
========================================================================================================
Introduction to clustering聚类简介
Clustering via k-meansk均值聚类
Programming Assignment编程任务
MapReduce for scaling k-means
Summarizing clustering with k-means小结
========================================================================================================
############chapter4:Mixture Models#############
========================================================================================================
Motivating and setting the foundation for mixture models混合模型基础
Mixtures of Gaussians for clustering高斯混合模型
Expectation Maximization (EM) building blocks期望最大化
The EM algorithm EM算法
Summarizing mixture models小结
Programming Assignment 1
Programming Assignment 2
========================================================================================================
############chapter5:Mixed Membership Modeling via Latent Dirichlet Allocation#############
========================================================================================================
Introduction to latent Dirichlet allocation LDA介绍
Bayesian inference via Gibbs sampling基于Gibbs抽样的贝叶斯推断
Collapsed Gibbs sampling for LDA LDA的Gibbs抽样
Summarizing latent Dirichlet allocation小结
Programming Assignment
========================================================================================================
############chapter6:Hierarchical Clustering & Closing Remarks#############
========================================================================================================
What we‘ve learned
Hierarchical clustering and clustering for time series segmentation层次聚类和基于时间序列分割的聚类
Programming Assignment
Summary and what‘s ahead in the specialization小结