用10张图来看机器学习Machine learning in 10 pictures

I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating.

1. Test and training error: Why lower training error is not always a good thing: ESL Figure 2.11. Test and training error as a function of model complexity.

2. Under and overfitting: PRML Figure 1.4. Plots of polynomials having various orders M, shown as red curves, fitted to the data set generated by the green curve.

3. Occam‘s razor: ITILA Figure 28.3. Why Bayesian inference embodies Occam’s razor. This figure gives the basic intuition for why complex models can turn out to be less probable. The horizontal axis represents the space of possible data sets D. Bayes’ theorem rewards models in proportion to how much they predicted the data that occurred. These predictions are quantified by a normalized probability distribution on D. This probability of the data given model Hi, P (D | Hi), is called the evidence for Hi. A simple model H1 makes only a limited range of predictions, shown by P(D|H1); a more powerful model H2, that has, for example, more free parameters than H1, is able to predict a greater variety of data sets. This means, however, that H2 does not predict the data sets in region C1 as strongly as H1. Suppose that equal prior probabilities have been assigned to the two models. Then, if the data set falls in region C1, the less powerful model H1 will be the more probable model.

4. Feature combinations: (1) Why collectively relevant features may look individually irrelevant, and also (2) Why linear methods may fail. From Isabelle Guyon‘s feature extraction slides.

5. Irrelevant features: Why irrelevant features hurt kNN, clustering, and other similarity based methods. The figure on the left shows two classes well separated on the vertical axis. The figure on the right adds an irrelevant horizontal axis which destroys the grouping and makes many points nearest neighbors of the opposite class.

6. Basis functions: How non-linear basis functions turn a low dimensional classification problem without a linear boundary into a high dimensional problem with a linear boundary. From SVM tutorial slides by Andrew Moore: a one dimensional non-linear classification problem with input x is turned into a 2-D problem z=(x, x^2) that is linearly separable.

7. Discriminative vs. Generative: Why discriminative learning may be easier than generative: PRML Figure 1.27. Example of the class-conditional densities for two classes having a single input variable x (left plot) together with the corresponding posterior probabilities (right plot). Note that the left-hand mode of the class-conditional density p(x|C1), shown in blue on the left plot, has no effect on the posterior probabilities. The vertical green line in the right plot shows the decision boundary in x that gives the minimum misclassification rate.

8. Loss functions: Learning algorithms can be viewed as optimizing different loss functions: PRML Figure 7.5. Plot of the ‘hinge’ error function used in support vector machines, shown in blue, along with the error function for logistic regression, rescaled by a factor of 1/ln(2) so that it passes through the point (0, 1), shown in red. Also shown are the misclassification error in black and the squared error in green.

9. Geometry of least squares: ESL Figure 3.2. The N-dimensional geometry of least squares regression with two predictors. The outcome vector y is orthogonally projected onto the hyperplane spanned by the input vectors x1 and x2. The projection yˆ represents the vector of the least squares predictions.

10. Sparsity: Why Lasso (L1 regularization or Laplacian prior) gives sparse solutions (i.e. weight vectors with more zeros): ESL Figure 3.11. Estimation picture for the lasso (left) and ridge regression (right). Shown are contours of the error and constraint functions. The solid blue areas are the constraint regions |β1| + |β2| ≤ t and β12 + β22 ≤ t2, respectively, while the red ellipses are the contours of the least squares error function.

from: http://www.denizyuret.com/2014/02/machine-learning-in-5-pictures.html

时间: 2024-10-13 16:53:30

用10张图来看机器学习Machine learning in 10 pictures的相关文章

机器学习(Machine Learning)&深度学习(Deep Learning)资料

机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.Deep Learning. <Deep Learning in Neural Networks: An Overview> 介绍:这是瑞士人工智能实验室Jurgen Schmidhuber写的最新版本

机器学习(Machine Learning)&amp;深入学习(Deep Learning)资料

<Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost 到随机森林.Deep Learning. <Deep Learning in Neural Networks: An Overview> 介绍:这是瑞士人工智能实验室 Jurgen Schmidhuber 写的最新版本<神经网络与深度学习综述>本综述的特点是以时间排序,从 1940 年开始讲起,到

机器学习(Machine Learning)&amp;amp;深度学习(Deep Learning)资料

机器学习(Machine Learning)&深度学习(Deep Learning)资料 機器學習.深度學習方面不錯的資料,轉載. 原作:https://github.com/ty4z2008/Qix/blob/master/dl.md 原作作者會不斷更新.本文更新至2014-12-21 <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍非常全面.从感知机.神经网络.决策树.SVM.Adaboost到随机森林.Deep L

基于Windows 机器学习(Machine Learning)的图像分类(Image classification)实现

原文:基于Windows 机器学习(Machine Learning)的图像分类(Image classification)实现 今天看到一篇文章  Google’s Image Classification Model is now Free to Learn  说是狗狗的机器学习速成课程(Machine Learning Crash Course)现在可以免费学习啦,因为一开始年初的时候是内部使用的,后来开放给大众了.大家有谁对不作恶家的机器学习感兴趣的话,可以点击连接去看看. 但是以上不是

轻松看懂机器学习十大常用算法 (Machine Learning Top 10 Commonly Used Algorithms)

原文出处: 不会停的蜗牛 通过本篇文章可以对ML的常用算法有个常识性的认识,没有代码,没有复杂的理论推导,就是图解一下,知道这些算法是什么,它们是怎么应用的,例子主要是分类问题. 每个算法都看了好几个视频,挑出讲的最清晰明了有趣的,便于科普.以后有时间再对单个算法做深入地解析. 今天的算法如下: 决策树 随机森林算法 逻辑回归 SVM 朴素贝叶斯 K最近邻算法 K均值算法 Adaboost 算法 神经网络 马尔可夫 1. 决策树 根据一些 feature 进行分类,每个节点提一个问题,通过判断,

机器学习 Machine Learning(by Andrew Ng)----第二章 单变量线性回归(Linear Regression with One Variable)

第二章 单变量线性回归(Linear Regression with One Variable) <模型表示(Model Representation)>                                                             <代价函数(Cost Function)>                                                          <梯度下降(Gradient Descent)

Day1 机器学习(Machine Learning, ML)基础

一.机器学习的简介 定义 Tom Mitchell给出的机器学习定义: 对于某类任务T和性能度量P,如果计算机程序在T上以P衡量的性能随着经验E而自我完善,那么就称这个计算机程序从经验E学习. 百度百科给出的机器学习定义:机器学习是多领域交叉学科,涉及概率论.统计学.逼近论.凸分析.算法复杂度理论等多门学科.专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能. 分类 监督学习(supervised learning):数据集是有标签的,

10 张图帮你搞定 TensorFlow 数据读取机制

导读 在学习tensorflow的过程中,有很多小伙伴反映读取数据这一块很难理解.确实这一块官方的教程比较简略,网上也找不到什么合适的学习材料.今天这篇文章就以图片的形式,用最简单的语言,为大家详细解释一下tensorflow的数据读取机制,文章的最后还会给出实战代码以供参考. 一.tensorflow读取机制图解 首先需要思考的一个问题是,什么是数据读取?以图像数据为例,读取数据的过程可以用下图来表示: 假设我们的硬盘中有一个图片数据集0001.jpg,0002.jpg,0003.jpg--我

机器学习(Machine Learning)

从wiki开始:http://en.wikipedia.org/wiki/Machine_learning 今天看机器学习相关的文章, 了解了一下opencv中机器学习功能比较多了 (http://docs.opencv.org/modules/ml/doc/ml.html)和KNIME Analytics 软件,使用的是拖拽功能,组成流程运行的(http://www.knime.org/downloads/overview) 其中主要的数据挖掘算法下图: