机器学习笔记(Washington University)- Regression Specialization-week five

1. Feature selection

Sometimes, we need to decrease the number of features

Efficiency:

With fewer features, we can compute quickly

Interpretaility:

what relevant features are for the preidction task

2.  All subsets algorithm

We just take every possible combination of features we want to include in our model.

and we can evaluate them using the validation set or the cross validatoin. But the complexity is

that if the number of features D is large, then the choices goes up to O(2^D)

3. Greedy algorithm

We fit model using the current feature set(or empty), then we start to select the next best feature

with the lowest trainning error. And the complexity has been reduced to O(D^2)

4. Lasso

For ridge, the weights have been shrunk, but it is reduced to zero. Now we want to get some coefficients

exactly to zero. Firstly we cannot just set small ridge coefficients to zero, because the correalted features will

all get small weights, maybe it is relevant to our predictions.lasso add the bias to reduce variance, and with

group of highly correlated features, lasso tends to select arbitrarily.

Lasso regression(L1 regularized regression)

lambda s used to balance fit of the model and sparsity.

when lambda is between 0 and infinity, the solution of W(Lasso) is between 0 and W(Least square solution)

The gradient of |w| does not exist when Wj = 0, And there is no close-form solution for lasso.

and we can use subgradients instead of using gradients

5. Coordinate descent

At each iteration, we only update only one coordinate instead of all coordinates, so we get this axis-aligned moves.

And we do not need to choose stepsize.

6. Normalize Features

We need to normalize both the training set and test set.

7. Coordinate descent for least squares regression

Suppose the feature is normalized and we get the partial deriative:

while not converged:
  for j =[0,1,2...D]
    compute ρj
    and set wj=ρj

and in the case case of lasso, lambda is our tuning parameter for the model

while not converged:
  for j =[0,1,2...D]
    compute ρj
    and set wj= ρj+λ/2(if ρj<-λ/2), 0(if ρj in [λ/2,λ/2]]),ρj=λ/2(if ρj>-λ/2)

in coordinate descent, the convergence is detected when over an entire sweep of all coordinates, if the

maximum step you take is less than your tolerance.

时间: 2024-11-05 15:52:22

机器学习笔记(Washington University)- Regression Specialization-week five的相关文章

机器学习笔记1——Linear Regression with One Variable

Linear Regression with One Variable Model Representation Recall that in *regression problems*, we are taking input variables and trying to map the output onto a *continuous* expected result function. Linear regression with one variable is also known

机器学习笔记-1 Linear Regression(week 1)

1.Linear Regression with One variable Linear Regression is supervised learning algorithm, Because the data set is given a right answer for each example. And we are predicting real-valued output so it is a regression problem. Block Diagram: 2. Cost Fu

机器学习笔记-1 Linear Regression with Multiple Variables(week 2)

1. Multiple Features note:X0 is equal to 1 2. Feature Scaling Idea: make sure features are on a similiar scale, approximately a -1<Xi<1 range For example: x1 = size (0-2000 feet^2) max-min or standard deviation x2 = number of bedrooms(1-5) The conto

机器学习笔记04:逻辑回归(Logistic regression)、分类(Classification)

之前我们已经大概学习了用线性回归(Linear Regression)来解决一些预测问题,详见: 1.<机器学习笔记01:线性回归(Linear Regression)和梯度下降(Gradient Decent)> 2.<机器学习笔记02:多元线性回归.梯度下降和Normal equation> 3.<机器学习笔记03:Normal equation及其与梯度下降的比较> 说明:本文章所有图片均属于Stanford机器学课程,转载请注明出处 面对一些类似回归问题,我们可

cs229 斯坦福机器学习笔记(一)

前言 说到机器学习,很多人推荐的学习资料就是斯坦福Andrew Ng的cs229,有相关的视频和讲义.不过好的资料 != 好入门的资料,Andrew Ng在coursera有另外一个机器学习课程,更适合入门.课程有video,review questions和programing exercises,视频虽然没有中文字幕,不过看演示的讲义还是很好理解的(如果当初大学里的课有这么好,我也不至于毕业后成为文盲..).最重要的就是里面的programing exercises,得理解透才完成得来的,毕

机器学习笔记02:多元线性回归、梯度下降和Normal equation

在<机器学习笔记01>中已经讲了关于单变量的线性回归以及梯度下降法.今天这篇文章作为之前的扩展,讨论多变量(特征)的线性回归问题.多变量梯度下降.Normal equation(矩阵方程法),以及其中需要注意的问题. 单元线性回归 首先来回顾一下单变量线性回归的假设函数: Size(feet2) Price($1000) 2104 460 1416 232 1534 315 852 178 - - 我们的假设函数为 hθ(x)=θ0+θ1x 多元线性回归 下面介绍多元线性回归(Linear R

机器学习笔记

下载链接:斯坦福机器学习笔记 这一系列笔记整理于2013年11月至2014年7月.所有内容均是个人理解,做笔记的原因是为了以后回顾相应方法时能快速记起,理解错误在所难免,不合适的地方敬请指正. 笔记按照斯坦福机器学习公开课的notes整理,其中online学习部分没有整理,reinforcement learning还没接触,有时间补上. 这份笔记主要记录自己学习过程中理解上的难点,所以对于初学者来说可能不容易理解,更详细和全面的说明可以参照JerryLead等的机器学习博文. 水哥@howde

机器学习笔记(1)

今天按照<机器学习实战>学习 k-邻近算法,输入KNN.classify0([0,0],group,labels,3)的时候总是报如下的错误: Traceback (most recent call last): File "<pyshell#75>", line 1, in <module> KNN.classify0([0,0],group,labels,3) File "KNN.py", line 16, in classi

机器学习笔记——K-means

K-means是一种聚类算法,其要求用户设定聚类个数k作为输入参数,因此,在运行此算法前,需要估计需要的簇的个数. 假设有n个点,需要聚到k个簇中.K-means算法首先从包含k个中心点的初始集合开始,即随机初始化簇的中心.随后,算法进行多次迭代处理并调整中心位置,知道达到最大迭代次数或中性收敛于固定点. k-means聚类实例.选择三个随机点用作聚类中心(左上),map阶段(右上)将每个点赋给离其最近的簇.在reduce阶段(左下),取相互关联的点的均值,作为新的簇的中心位置,得到本轮迭代的最

机器学习笔记 贝叶斯学习(上)

机器学习笔记(一) 今天正式开始机器学习的学习了,为了激励自己学习,也为了分享心得,决定把自己的学习的经验发到网上来让大家一起分享. 贝叶斯学习 先说一个在著名的MLPP上看到的例子,来自于Josh Tenenbaum 的博士论文,名字叫做数字游戏. 用我自己的话叙述就是:为了决定谁洗碗,小明和老婆决定玩一个游戏.小明老婆首先确定一种数的性质C,比如说质数或者尾数为3:然后给出一系列此类数在1至100中的实例D= {x1,...,xN} :最后给出任意一个数x请小明来预测x是否在D中.如果小明猜