Probabilistic SVM 与 Kernel Logistic Regression（KLR）

本篇讲的是SVM与logistic regression的关系。

（一） SVM算法概论

首先我们从头梳理一下SVM（一般情况下，SVM指的是soft-margin SVM）这个算法。

这个算法要实现的最优化目标是什么？我们知道这个目标必然与error measurement有关。

那么，在SVM中，何如衡量error的？也即：在SVM中ε具体代表着什么？

SVM的目标是最小化上式。我们用来衡量error。这个式子是不是有点眼熟？我们在regularzation一篇中，最小化的目标也是如此形式。但是两者的思路不同：对于regularization，我们的目标是最小化error，但是呢，我们也希望对|w|的长度有限制；

对于SVM，我们的目标是最小化|w|，但是呢，我们也希望对error有所限制。

具体哪一方面占的权重更大，对于regularization来说，可以用λ来调节；对于SVM来说，可以用C来调节。

总体来说，殊途同归，但是使用SVM方法，即使是如上的nonlinear error衡量方式，我们也可以用QP工具来解决；第二，我们可以使用kernel function工具。

具体来说其误差衡量方式与0/1 error相比：

我们发现：这种误差衡量方式也是0/1误差的一种upper bound。之前我们在哪里见识过类似的场景？squared error 和cross-entropy error。

我们可以看到：SVM的错误衡量方式与cross-entropy error的值相似。所以我们说 SVM ≈ L2-regularized logistic regression。

（二）probabilistic SVM

如何融合SVM和logistic regression？

我也不知道为什么要将SVM与logistic regression联系起来。logistic regression与SVM相比，有什么优点？是极大似然？直接使用SVM不好吗？

这两种方法都不好，没有吸收两种方法的好处。

（三）kernel logistic regression

假设我们融合logistic regression与SVM，主要是要在logistic regression中使用SVM的kernel function工具。那么，现在的问题是：能不能直接做kernel logistic regression？

首先明白一点：要想使用kernel trick，必然有：w可以由n个数据来表示。也即：optimal w can be represented by z_n。

什么使用这一情况会得到满足？

由此，我们可以做kernel logistic regression：

时间： 2024-11-09 03:58:53

Probabilistic SVM 与 Kernel Logistic Regression（KLR）的相关文章

Kernel Logistic Regression

所以,这里我们通过一种两个步骤的训练方式把SVM方法和Logistic Regression结合起来,第一步我们还是通过SVM求解得到Wsvm和bsvm,然后我们把得到的w和b,用上面的方法进行Logistic Regression的训练, 通过A和B这两个参数进行放缩和平移,最终得到的结果如果A>0的话,那么Wsvm就是好的,B接近0的话, bsvm也是可靠的. 这里我们把Platt’s Model的步骤概括成为以下的步骤: 因为有B的存在,所以有一定的平移的效果,所以soft binary

【Kernel Logistic Regression】林轩田机器学习技术

最近求职真慌,一方面要看机器学习,一方面还刷代码.还是静下心继续看看课程,因为觉得实在讲的太好了.能求啥样搬砖工作就随缘吧. 这节课的核心就在如何把kernel trick到logistic regression上. 首先把松弛变量的表达形式修改一下,把constrained的形式改成unconstrained的形式. 改成这种'unconstrained' form of soft-margin SVM之后,突然发现很像L2 regularization 如果用regularized mode

机器学习技法(5)--Kernel Logistic Regression

回顾一下soft margin SVM的知识: 然而从另一个角度来看,分为真的有犯错和没有犯错: 在没有犯错的时候,ξn=0就好了.于是ξn就可以写成一个求max的过程.根据这个思路,我们有了SVM的新形式: 这样一来,ξn就不再是一个独立的变量,它变成了一个由b和w决定的变量,这样的话,式子又被简化了. 简化后的式子和L2的正则差不多: SVM和正则化有很多相似的点: 这些联系可以帮助我们以后换一种视角看待SVM. 下面从错误衡量的视角看LR和SVM: 由此可以看出SVM≍L2的LR. 那么再

Machine Learning Techniques -5-Kernel Logistic Regression

5-Kernel Logistic Regression Last class, we learnt about soft margin and its application. Now, a new idea comes to us, could we apply the kernel trick to our old frirend logistic regression? Firstly, let's review those four concepts of margin handlin

Logistic Regression Vs Decision Trees Vs SVM: Part I

Classification is one of the major problems that we solve while working on standard business problems across industries. In this article we’ll be discussing the major three of the many techniques used for the same, Logistic Regression, Decision Trees

Logistic Regression vs Decision Trees vs SVM: Part II

This is the 2nd part of the series. Read the first part here: Logistic Regression Vs Decision Trees Vs SVM: Part I In this part we’ll discuss how to choose between Logistic Regression , Decision Trees and Support Vector Machines. The most correct ans

More 3D Graphics (rgl) for Classification with Local Logistic Regression and Kernel Density Estimates (from The Elements of Statistical Learning)（转）

This post builds on a previous post, but can be read and understood independently. As part of my course on statistical learning, we created 3D graphics to foster a more intuitive understanding of the various methods that are used to relax the assumpt

logistic regression svm hinge loss

二类分类器svm 的loss function 是 hinge loss:L(y)=max(0,1-t*y),t=+1 or -1,是标签属性. 对线性svm,y=w*x+b,其中w为权重,b为偏置项,在实际优化中,w,b是待优化的未知,通过优化损失函数,使得loss function最小,得到优化接w,b. 对于logistic regression 其loss function是,由于y=1/(1+e^(-t)),则L=sum(y(log(h))+(1-y)log(1-h))

对Logistic Regression 的初步认识

线性回归回归就是对已知公式的未知参数进行估计.比如已知公式是y=a∗x+b,未知参数是a和b,利用多真实的(x,y)训练数据对a和b的取值去自动估计.估计的方法是在给定训练样本点和已知的公式后,对于一个或多个未知参数,机器会自动枚举参数的所有可能取值,直到找到那个最符合样本点分布的参数(或参数组合).也就是给定训练样本,拟合参数的过程,对y= a*x + b来说这就是有一个特征x两个参数a b,多个样本的话比如y=a*x1+b*x2+...,用向量表示就是y = ,就是n个特征,n个参数的拟