Machine Learning Techniques -6-Support Vector Regression

For the regression with squared error, we discuss the kernel ridge regression.

With the knowledge of kernel function, could we find an analytic solution for kernel ridge regression?

Since we want to find the best βn

However, compare to the linear situation, the large number of data will suffer from this formation of βn.

Compared to soft-margin Gaussian SVM, kernel ridge regression suffers from the operation of  βn through N:

That means more SVs and will slow down our calculation, a sparse βn is now we want.

Thus we add a tube, with the familiar function of MAX, we prune the points at a small |s - y|.

Max function is not differentable at some points, so we need some other operation as well.

These operations are about changing the appearance to be more like standard SVM, in order to deal with the tool of QP.

wTZn + b = wTZn +w0, which is separated as a Constant.

we add a factor to descrip the violation of margin, and use upper and lower bound to keep linear formation.

Our next task : SVR primal -> dual

Machine Learning Techniques -1-Linear Support Vector Machine

1-Linear Support Vector Machine 我们将这种定义为margin,则之前判断最优划分的问题转化为寻找最大margain的问题. 对于待选的几个w所表示的线,问题转化成利用对应w比较相对距离的问题. 此时定义w为方向向量,b为之前的w0,即bia. 由于w就是所求点到直线的法线方向,问题转化为求投影的问题. 因为每个点对应符号yn只有在和距离表示的绝对值内部符号为+的时候才说明划分正确,所以可以乘上yn来去除abs() 这里的距离是一种容忍度,所以我们选其中最近的那个.

Machine Learning Techniques -3-Dual Support Vector Machine

For the naive thought to practise my academic English skill, the rest of my notes will be wrriten in my terrrible English.XD If you have any kind of uncomfortable feel, please close this window and refer to the original edition from Mr. Lin. I will b

Machine Learning in Action -- Support Vector Machines

虽然SVM本身算法理论,水比较深,很难懂 但是基本原理却非常直观易懂,就是找到与训练集中支持向量有最大间隔的超平面 形式化的描述: 其中需要满足m个约束条件,m为数据集大小,即数据集中的每个数据点function margin都是>=1,因为之前假设所有支持向量,即离超平面最近的点,的function margin为1 对于这种有约束条件的最优化问题,用拉格朗日定理,于是得到如下的形式, 现在我们的目的就是求出最优化的m个拉格朗日算子,因为通过他们我们可以间接的算出w和b,从而得到最优超平面 考

machine learning(13) --Regularization:Regularized linear regression

machine learning(13) --Regularization:Regularized linear regression Gradient descent without regularization                    with regularization                     θ0与原来是的没有regularization的一样 θ1-n和原来相比会稍微变小(1-αλ⁄m)<1 Normal equation without regular

Machine Learning Techniques -0

开学前还有一段时间,正好差不多可以follow台大Hsuan-Tien Lin老师Machine Learning Techniques这门课: 不过只输入信息而不输出效率太低,所以建个博客记录一下.

Machine Learning - week 2 - Multivariate Linear Regression

Gradient Descent in Practice - Feature Scaling Make sure features are on a similar scale. Features 的范围越小,总的可能性就越小,计算速度就能加快. Dividing by the range 通过 feature/range 使每个 feature 大概在 [-1, 1] 的范围内 下题是一个例子: Mean normalization 将值变为接近 0.除了 x0,因为 x0 的值为 1. mu

Andrew Ng Machine Learning - Week 3:Logistic Regression &amp; Regularization

此文是斯坦福大学,机器学习界 superstar - Andrew Ng 所开设的 Coursera 课程:Machine Learning 的课程笔记.力求简洁,仅代表本人观点,不足之处希望大家探讨. 课程网址: Week 1: Introduction 笔记: We

Machine Learning Techniques -5-Kernel Logistic Regression

5-Kernel Logistic Regression Last class, we learnt about soft margin and its application. Now, a new idea comes to us, could we apply the kernel trick to our old frirend logistic regression? Firstly, let's review those four concepts of margin handlin

【Support Vector Regression】林轩田机器学习技法

上节课讲了Kernel的技巧如何应用到Logistic Regression中.核心是L2 regularized的error形式的linear model是可以应用Kernel技巧的. 这一节,继续沿用representer theorem,延伸到一般的regression问题. 首先想到的就是ridge regression,它的cost函数本身就是符合representer theorem的形式. 由于optimal solution一定可以表示成输入数据的线性组合,再配合Kernel T