matlab利用hinge loss实现多分类SVM

介绍
hinge loss
code

1 介绍

本文将介绍hinge loss E(w)以及其梯度?E(w)。并利用批量梯度下降方法来优化hinge loss实现SVM多分类。利用hinge loss在手写字数据库上实验，能达到87.040%的正确识别率。

2. hinge loss

根据二分类的SVM目标函数，我们可以定义多分类的SVM目标函数：

E(w1,…,wk)=∑kj=112||wj||2+C∑ni=1L((w1,…,wk),(xi,yi)).

其中T={(x1,y1),…,(xn,yn)}为训练集。L((w1,…,wk),(x,y))=max(0,maxy′≠ywTy′x+1?wTyx). 二分类SVM转化为多分类SVM的相关资料和公式推导可以参见其他文献。

2. 接下介绍E(w)的梯度计算。

(a) 如果 wTy≥wTy^x+1, 那么

?L((w1,w2,…,wk),(x,y))?wj,l=0

(b) 如果 wTy<wTy^x+1 和 j=y, 那么

?L((w1,w2,…,wk),(x,y))?wj,l=?xl

?L((w1,w2,…,wk),(x,y))?wj,l=xl

(d) 如果 wTy<wTy^x+1 和 j≠y and j≠y^, 那么

?L((w1,w2,…,wk),(x,y))?wj,l=0

利用梯度下降法更新W={w1,…,wk}:

Wt=Wt?1?r?E(Wt?1)。

3 code

Muliticlass_svm.m

% 作者：何凌霄
% 中科院自动化所
% 2017年3月15
clear all
clc
%% STEP 0: Initialise constants and parameters
inputSize = 28 * 28; % Size of input vector (MNIST images are 28x28)
numClasses = 10;     % Number of classes (MNIST images fall into 10 classes)
lambda = 1e-2; % Weight decay parameter
learning_rate = 0.1;
iteration=400;
%%======================================================================
%% STEP 1: Load data
load(‘digits.mat‘)
images = [train1; train2; train3; train4; train5; train6; train7; train8; train9;train0];
images = images‘;
labels = [ones(500,1);2*ones(500,1);3*ones(500,1);4*ones(500,1);5*ones(500,1);6*ones(500,1);7*ones(500,1);8*ones(500,1);9*ones(500,1);10*ones(500,1)];
index = randperm(500*10);
images = images(:,index);
labels = labels(index);
inputData = images;
%% STEP 2: Train multiclass svm
[cost, grad, svmOptTheta] = multisvmtrain(numClasses, inputSize, lambda, inputData, labels, iteration, learning_rate);
%% STEP 3: Test
images = [test1; test2; test3; test4; test5; test6; test7; test8; test9;test0];
images = images‘;
labels = [ones(500,1);2*ones(500,1);3*ones(500,1);4*ones(500,1);5*ones(500,1);6*ones(500,1);7*ones(500,1);8*ones(500,1);9*ones(500,1);10*ones(500,1)];

inputData = images;
svmModel.optTheta = reshape(svmOptTheta, numClasses, inputSize);
svmModel.inputSize = inputSize;
svmModel.numClasses = numClasses;

% You will have to implement softmaxPredict in softmaxPredict.m
[pred] = Multi_SVMPredict(svmModel, inputData);
acc = mean(labels(:) == pred(:));
num_in_class = 500*ones(10,1)‘;
for i=1:10
    name_class{i}=num2str(i);
end
[confusion_matrix]=compute_confusion_matrix(pred,num_in_class,name_class);
figure; visualize(svmOptTheta‘);
fprintf(‘Accuracy: %0.3f%%\n‘, acc * 100);

multisvmtrain.m

% 作者：何凌霄
% 中科院自动化所
% 2017年3月15
function [lcost, grad, theta] = multisvmtrain(numClasses, inputSize, lambda, data, labels, iteration, learning_rate)
theta = 0.005 * randn(numClasses * inputSize, 1);
theta = reshape(theta, numClasses, inputSize);%将输入的参数列向量变成一个矩阵
numCases = size(data, 2);%输入样本的个数
groundTruth = full(sparse(labels, 1:numCases, 1));%这里sparse是生成一个稀疏矩阵，该矩阵中的值都是第三个值1
cost = 0;
thetagrad = zeros(numClasses, inputSize);
for i = 1:iteration
    [Q, X, cost] = multi_hingeloss_cost(theta, data, groundTruth,lambda);
    [thetagrad] = multi_hingeloss_grad(data,theta, Q, groundTruth, lambda, labels);
    theta = theta - learning_rate*thetagrad;
    lcost(i) = cost;
    grad(i) = sum(sum(thetagrad));
    fprintf(‘%d, %f\n‘, i, cost);
end
end

multi_hingeloss_cost.m

% 作者：何凌霄
% 中科院自动化所
% 2017年3月15
function [Q, X, cost] = multi_hingeloss_cost(theta, data, groundTruth,lambda)
groundTruth1 = groundTruth;
groundTruth(find(groundTruth==1)) = -inf;
groundTruth(find(groundTruth==0)) = 1;
X = theta*data;
Q = X;
Q = Q.*groundTruth;
Q(find(Q==inf)) = -inf;
temp = X.*groundTruth1;
temp(find(temp==0))=[];
t = max(0, 1 - temp + max(Q));
cost = 1/size(data,2)*sum(t)+lambda*sum(theta(:).^2);

multi_hingeloss_grad.m

% 作者：何凌霄
% 中科院自动化所
% 2017年3月15
function [thetagrad] = multi_hingeloss_grad(data, theta, Q, groundTruth, lambda, labels)
X = theta*data;
[~,q] = max(Q);
Xq = full(sparse(q, 1:size(X,2), 1));
if size(Xq,1)<10
    for i = 1:10-size(Xq,1)
        Xq = [Xq;zeros(1, size(Xq,2))];
    end
end
temp = X.*groundTruth;
temp1 = X.*Xq;
temp1(find(temp1==0))=[];
temp(find(temp==0))=[];
W=(temp - temp1)<1;
Y = zeros(size(X));

for i=1:size(X,2)
    Y(labels(i),i) = -W(i);
    Y(q(i),i) = W(i);
end
thetagrad = 1/size(X,2)*Y*data‘ + lambda * theta;

Multi_SVMPredict.m

% 作者：何凌霄
% 中科院自动化所
% 2017年3月15
function [pred] = Multi_SVMPredict(svmModel, data)
theta = svmModel.optTheta;  % this provides a numClasses x inputSize matrix
pred = zeros(1, size(data, 2));
[nop, pred] = max(theta * data);
end

compute_confusion_matrix.m

[confusion_matrix]=compute_confusion_matrix(predict_label,num_in_class,name_class)%预测标签，每一类的数目，类别数目
%predict_label为一维行向量
%num_in_class代表每一类的个数
%name_class代表类名
num_class=length(num_in_class);
num_in_class=[0 num_in_class];
confusion_matrix=size(num_class,num_class);  

for ci=1:num_class
    for cj=1:num_class
        summer=0;%统计对应标签个数
        c_start=sum(num_in_class(1:ci))+1;
        c_end=sum(num_in_class(1:ci+1));
        summer=size(find(predict_label(c_start:c_end)==cj),2);
        confusion_matrix(ci,cj)=summer/num_in_class(ci+1);
    end
end  

draw_cm(confusion_matrix,name_class,num_class);  

end

function draw_cm.m

function draw_cm(mat,tick,num_class)  

imagesc(1:num_class,1:num_class,mat);            %# in color
colormap(flipud(gray));  %# for gray; black for large value.  

textStrings = num2str(mat(:),‘%0.2f‘);
textStrings = strtrim(cellstr(textStrings));
[x,y] = meshgrid(1:num_class);
hStrings = text(x(:),y(:),textStrings(:), ‘HorizontalAlignment‘,‘center‘);
midValue = mean(get(gca,‘CLim‘));
textColors = repmat(mat(:) > midValue,1,3);
set(hStrings,{‘Color‘},num2cell(textColors,2));  %# Change the text colors  

set(gca,‘xticklabel‘,tick,‘XAxisLocation‘,‘top‘);
set(gca, ‘XTick‘, 1:num_class, ‘YTick‘, 1:num_class);
set(gca,‘yticklabel‘,tick);
rotateXLabels(gca, 315 );% rotate the x tick

visualize.m

function r=visualize(X, mm, s1, s2)
%FROM RBMLIB http://code.google.com/p/matrbm/
%Visualize weights X. If the function is called as a void method,
%it does the plotting. But if the function is assigned to a variable
%outside of this code, the formed image is returned instead.
if ~exist(‘mm‘,‘var‘)
    mm = [min(X(:)) max(X(:))];
end
if ~exist(‘s1‘,‘var‘)
    s1 = 0;
end
if ~exist(‘s2‘,‘var‘)
    s2 = 0;
end

[D,N]= size(X);
s=sqrt(D);
if s==floor(s) || (s1 ~=0 && s2 ~=0)
    if (s1 ==0 || s2 ==0)
        s1 = s; s2 = s;
    end
    %its a square, so data is probably an image
    num=ceil(sqrt(N));
    a=mm(2)*ones(num*s2+num-1,num*s1+num-1);
    x=0;
    y=0;
    for i=1:N
        im = reshape(X(:,i),s1,s2)‘;
        a(x*s2+1+x : x*s2+s2+x, y*s1+1+y : y*s1+s1+y)=im;
        x=x+1;
        if(x>=num)
            x=0;
            y=y+1;
        end
    end
    d=true;
else
    %there is not much we can do
    a=X;
end

%return the image, or plot the image
if nargout==1
    r=a;
else

    imagesc(a, [mm(1) mm(2)]);
    axis equal
    colormap gray

end

得到的识别率为87.040%，hinge loss可以和任何深度网络结合完成分类任务。

最后得到的混淆矩阵如下：

损失函数图像：

数据集见资源，如引用此代码，请注明出处。

时间： 2024-12-15 23:43:20

matlab利用hinge loss实现多分类SVM的相关文章

logistic regression svm hinge loss

二类分类器svm 的loss function 是 hinge loss:L(y)=max(0,1-t*y),t=+1 or -1,是标签属性. 对线性svm,y=w*x+b,其中w为权重,b为偏置项,在实际优化中,w,b是待优化的未知,通过优化损失函数,使得loss function最小,得到优化接w,b. 对于logistic regression 其loss function是,由于y=1/(1+e^(-t)),则L=sum(y(log(h))+(1-y)log(1-h))

损失函数 hinge loss vs softmax loss

1. 损失函数损失函数(Loss function)是用来估量你模型的预测值 f(x) 与真实值 Y 的不一致程度,它是一个非负实值函数,通常用 L(Y,f(x)) 来表示. 损失函数越小,模型的鲁棒性就越好. 损失函数是经验风险函数的核心部分,也是结构风险函数的重要组成部分.模型的风险结构包括了风险项和正则项,通常如下所示: 其中,前面的均值函数表示的是经验风险函数,L代表的是损失函数,后面的 Φ 是正则化项(regularizer)或者叫惩罚项(penalty term), 它可以是L1,

支持向量机 (一)：线性可分类 svm

支持向量机(support vector machine, 以下简称 svm)是机器学习里的重要方法,特别适用于中小型样本.非线性.高维的分类和回归问题.本系列力图展现 svm 的核心思想和完整推导过程,以飨读者. 一.原理概述机器学习的一大任务就是分类(Classification).如下图所示,假设一个二分类问题,给定一个数据集,里面所有的数据都事先被标记为两类,能很容易找到一个超平面(hyperplane)将其完美分类. 然而实际上可以找到无数个超平面将这两类分开,那么哪一个超平面是效果

hinge loss/支持向量损失的理解

https://blog.csdn.net/AI_focus/article/details/78339234 https://www.cnblogs.com/massquantity/p/8964029.html pytprch HingeLoss 的实现: import torch import torch.nn as nn import torch.utils.data as data import torchvision.transforms as TF import torchvisi

NLP（二十二）利用ALBERT实现文本二分类

??在文章NLP(二十)利用BERT实现文本二分类中,笔者介绍了如何使用BERT来实现文本二分类功能,以判别是否属于出访类事件为例子.但是呢,利用BERT在做模型预测的时候存在预测时间较长的问题.因此,我们考虑用新出来的预训练模型来加快模型预测速度. ??本文将介绍如何利用ALBERT来实现文本二分类. 关于ALBERT ??ALBERT的提出时间大约是在2019年10月,其第一作者为谷歌科学家蓝振忠博士.ALBERT的论文地址为:https://openreview.net/pdf?id=H1

多分类SVM损失函数： Multiclass SVM loss

1. SVM 损失:在一个样本中,对于真实分类与其他每各个分类,如果真实分类所得的分数与其他各分类所得的分数差距大于或等于安全距离,则真实标签分类与该分类没有损失值:反之则需要计算真实分类与该分类的损失值: 真实分类与其他各分类的损失值的总和即为一个样本的损失值 ①即真实标签分类所得分数大于等于该分类的分数+安全距离,S_yi >=S_j + △,那么损失值=0 ②否则,损失值等于其他分类的分数 + 安全距离(阈值)- 真实标签分类所得的分数,即损失值=S_j + △ - S_yi S_yi:真

机器学习实战笔记-利用AdaBoost元算法提高分类性能

做重要决定时,大家可能都会考虑吸取多个专家而不只是一个人的意见.机器学习处理问题时又何尝不是如此?这就是元算法(meta-algorithm ) 背后的思路.元算法是对其他算法进行组合的一种方式 7.1 基于数据集多重抽样的分类器 ??我们自然可以将不同的分类器组合起来,而这种组合结果则被称为集成方法(ensemblemethod)或者元算法(meta-algorithm).使用集成方法时会有多种形式:可以是不同算法的集成,也可以是同一算法在不同设置下的集成,还可以是数据集不同部分分配给不同分类

《机器学习实战》学习笔记：利用Adaboost元算法提高分类性能

一. 关于boosting算法的起源 boost 算法系列的起源来自于PAC Learnability(直译过来称为:PAC 可学习性).这套理论主要研究的是什么时候一个问题是可被学习的. 我们知道,可计算性在计算理论中已经有定义,而可学习性正是PAC Learnability理论所要定义的内容.另外,在计算理论中还有很大一部分精力花在研究问题是可计算的时候,其复杂度又是什么样的.因此,在计算学习理论中,也有研究可学习的问题的复杂度的内容,主要是样本复杂度 (Sample Complexity)

matlab 利用while循环计算平均值和方差

一.该程序是用来测输入数据的平均值和方差的公式: 二. 项目流程: 1. State the problem假定所有测量数为正数或者0,计算这一系列测量数的平均值和方差.假定我们预先不知道有多少测量数据被录入,一个负数标志着测量数据输入结束 2. Define the inputs and outputs程序要求输入的数是未知的正数或者0,程序输出的数是输入数据集的平均值和方差.除此之外,我们将打印出输入的数据数,因为它对于我们检查输入数据是有用的 3.Define the algorithm