SVM实践

在Ubuntu上使用libsvm(附上官网链接以及安装方法)进行SVM的实践:

1、代码演示:(来自一段文本分类的代码)

# encoding=utf8
__author__ = ‘wang‘
# set the encoding of input file  utf-8
import sys
reload(sys)
sys.setdefaultencoding(‘utf-8‘)
import os
from svmutil import *
import subprocess

# get the data in svm format
featureStringCorpusDir = ‘./svmFormatFeatureStringCorpus/‘
fileList = os.listdir(featureStringCorpusDir)
# get the number of the files
numFile = len(fileList)
trainX = []
trainY = []
testX = []
testY = []
# enumrate the file
for i in xrange(numFile):
    # print i
    fileName = fileList[i]
    if(fileName.endswith(".libsvm")):
        # check the data format     #
        checkdata_py = r"./tool/checkdata.py"
        # the file name
        svm_file = featureStringCorpusDir + fileName
        cmd = ‘python {0} {1} ‘.format(checkdata_py, svm_file)
        # print ("excute ‘{}‘".format(cmd))
        # print(‘Check Data...‘)
        check_data = subprocess.Popen(cmd, shell = True, stdout = subprocess.PIPE)
        (stdout, stderr) = check_data.communicate()
        returncode = check_data.returncode
        if returncode == 1:
            print stdout
            exit(1)

        y, x = svm_read_problem(svm_file)
        trainX += x[:100]
        trainY += y[:100]
        testX += x[-1:-6:-1]
        testY += y[-1:-6:-1]

print testX
# train the c-svm model
m = svm_train(trainY, trainX, ‘-s 0 -c 4 -t 2 -g 0.1 -e 0.1‘)
# predict the test data
p_label, p_acc, p_val = svm_predict(testY, testX, m)
# output the result to the file
result_file = open("./result.txt","w")
result_file.write( "testY p_label p_val\n")
for c in xrange(len(p_label)):
    result_file.write( str(testY[c]) + " " + str(p_label[c]) + " " + str(p_val[c]) + "\n")
result_file.write( "accuracy: " + str(p_acc) + "\n")
result_file.close()
# print p_val

(1)关于训练数据的格式:

The format of training and testing data file is:

<label> <index1>:<value1> <index2>:<value2> ..
.
.
.

Each line contains an instance and is ended by a ‘\n‘ character.  For classification, <label> is an integer indicating the class label
(multi-class is supported)(如何label是个非数值的,将其转化为数值型). For regression, <label> is the target value which can be any real number. For one-class SVM, it‘s not used
so can be any number.  The pair <index>:<value> gives a feature (attribute) value:<index> is an integer starting from 1 and <value> is a real number. The only exception is the precomputed kernel, where <index> starts from 0; see the section of precomputed kernels. Indices must be in ASCENDING order(要升序排序). Labels in the testing file are only used to calculate accuracy or errors. If they are unknown, just fill the first column with any numbers.

A sample classification data included in this package is `heart_scale‘. To check if your data is in a correct form, use `tools/checkdata.py‘ (details in `tools/README‘).

Type `svm-train heart_scale‘, and the program will read the training
data and output the model file `heart_scale.model‘. If you have a test
set called heart_scale.t, then type `svm-predict heart_scale.t
heart_scale.model output‘ to see the prediction accuracy. The `output‘
file contains the predicted class labels.

For classification, if training data are in only one class (i.e., all
labels are the same), then `svm-train‘ issues a warning message:
`Warning: training data in only one class. See README for details,‘
which means the training data is very unbalanced. The label in the
training data is directly returned when testing.

  

(2)svm训练的参数解释来自官方的github:https://github.com/cjlin1/libsvm

`svm-train‘ Usage
=================

Usage: svm-train [options] training_set_file [model_file]
options:
-s svm_type : set type of SVM (default 0)
	0 -- C-SVC		(multi-class classification)
	1 -- nu-SVC		(multi-class classification)
	2 -- one-class SVM
	3 -- epsilon-SVR	(regression)
	4 -- nu-SVR		(regression)
-t kernel_type : set type of kernel function (default 2)
	0 -- linear: u‘*v
	1 -- polynomial: (gamma*u‘*v + coef0)^degree
	2 -- radial basis function: exp(-gamma*|u-v|^2)
	3 -- sigmoid: tanh(gamma*u‘*v + coef0)
	4 -- precomputed kernel (kernel values in training_set_file)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/num_features)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)
-b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
-wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)  (当数据不平衡时,可给不同的类设置不用的惩罚)
-v n: n-fold cross validation mode
-q : quiet mode (no outputs)

The k in the -g option means the number of attributes in the input data.

option -v randomly splits the data into n parts and calculates cross
validation accuracy/mean squared error on them.

See libsvm FAQ for the meaning of outputs.

`svm-predict‘ Usage
===================

Usage: svm-predict [options] test_file model_file output_file
options:
-b probability_estimates: whether to predict probability estimates, 0 or 1 (default 0); for one-class SVM only 0 is supported

model_file is the model file generated by svm-train.
test_file is the test data you want to predict.
svm-predict will produce output in the output_file.

  

2、运行结果展示并分析部分参数的意思:

*
optimization finished, #iter = 119
nu = 0.272950
obj = -122.758206, rho = -1.384603
nSV = 127, nBSV = 21
*

..............(此处省略部分)

*
optimization finished, #iter = 129
nu = 0.214275
obj = -89.691907, rho = -1.105131
nSV = 103, nBSV = 8
*
optimization finished, #iter = 77
nu = 0.147922
obj = -67.825431, rho = 0.984237
nSV = 80, nBSV = 10
Total nSV = 1246
Accuracy = 51.4286% (36/70) (classification)

官方解释:obj is the optimal objective value of the dual SVM problem(obj表示SVM最优化问题对偶式的最优目标值,公式如下图的公式(2)). rho is the bias term in the decision function sgn(w^Tx - rho). nSV and nBSV are number of support vectors and bounded support vectors (i.e., alpha_i = C). nu-svm is a somewhat equivalent form of C-SVM where C is replaced by nu. nu simply shows the corresponding parameter. More details are in libsvm document.

时间: 2024-10-11 15:58:46

SVM实践的相关文章

机器学习:SVM实践:Libsvm的使用

引言 ? ? 本文从应用的角度出发,使用Libsvm函数库解决SVM模型的分类与回归问题 ? ? 首先说明一下实验数据,实验数据是Libsvm自带的heart_sacle,是个mat文件 ? ? 加载数据集 ? ? 将mat文件导入MATLAB后会有270*13的实例矩阵变量heart_scale_inst和270*1的标签矩阵heart_scale_label ? ? ? ? ? ? 分类 ? ? 将数据集分为训练数据和测试数据 ? ? 首先我们将实验数据分为训练数据和测试数据 ? ? loa

小象学院Python机器学习和算法高级版视频教程

下载地址:百度网盘下载 ├─00.课程介绍│      <机器学习·升级版II>常见问题FAQ - 小象问答-hadoop,spark,storm,R,hi.jpg│      <机器学习>升级版II,11月4日开课 - 小象学院 - 中国最专业的Hadoop,Spark大数据.jpg│      ├─01.机器学习的数学基础1 - 数学分析│  │  01 数学分析与概率论.mp4│  │  1.数学分析与概率论.pdf│  │  笔记.jpg│  │  │  └─参考文献资料│

python就业班-淘宝-目录.txt

卷 TOSHIBA EXT 的文件夹 PATH 列表卷序列号为 AE86-8E8DF:.│ python就业班-淘宝-目录.txt│ ├─01 网络编程│ ├─01-基本概念│ │ 01-网络通信概述.flv│ │ 02-IP地址.flv│ │ 03-Linux.windows查看网卡信息.flv│ │ 04-ip地址的分类-ipv4和ipv6介绍.flv│ │ 05-(重点)端口.mp4│ │ 06-端口分类:知名端口.动态端口.flv│ │ 07-socket介绍.mp4│ │ │ ├─02

SVM -支持向量机原理详解与实践之四

SVM -支持向量机原理详解与实践之四 SVM原理分析 SMO算法分析 SMO即Sequential minmal optimization, 是最快的二次规划的优化算法,特使对线性SVM和稀疏数据性能更优.在正式介绍SMO算法之前,首先要了解坐标上升法. 坐标上升法(Coordinate ascent) 坐标上升法(Coordinate Ascent)简单点说就是它每次通过更新函数中的一维,通过多次的迭代以达到优化函数的目的. 坐标上升法原理讲解 为了更加通用的表示算法的求解过程,我们将算法表

SVM -支持向量机原理详解与实践之二

SVM -支持向量机原理详解与实践之二 SVM原理分析 以下内容接上篇. 拉格朗日对偶性(Largrange duality)深入分析 前面提到了支持向量机的凸优化问题中拉格朗日对偶性的重要性. 因为通过应用拉格朗日对偶性我们可以寻找到最优超平面的二次最优化, 所以以下可以将寻找最优超平面二次最优化(原问题),总结为以下几个步骤: 在原始权重空间的带约束的优化问题.(注意带约束) 对优化问题建立拉格朗日函数 推导出机器的最优化条件 最后就是在对偶空间解决带拉格朗日乘子的优化问题. 注:以上这个四

SVM -支持向量机原理详解与实践之三

SVM -支持向量机原理详解与实践之三 什么是核 什么是核,核其实就是一种特殊的函数,更确切的说是核技巧(Kernel trick),清楚的明白这一点很重要. 为什么说是核技巧呢?回顾到我们的对偶问题:     映射到特征空间后约束条件不变,则为:     在原始特征空间中主要是求,也就是和的内积(Inner Product),也称数量积(Scalar Product)或是点积(Dot Product),映射到特征空间后就变成了求,也就是和的映射到特征空间之后的内积,就如我前面所提到的在原始空间

机器学习算法与Python实践之(三)支持向量机(SVM)进阶

机器学习算法与Python实践之(三)支持向量机(SVM)进阶 机器学习算法与Python实践之(三)支持向量机(SVM)进阶 [email protected] http://blog.csdn.net/zouxy09 机器学习算法与Python实践这个系列主要是参考<机器学习实战>这本书.因为自己想学习Python,然后也想对一些机器学习算法加深下了解,所以就想通过Python来实现几个比较常用的机器学习算法.恰好遇见这本同样定位的书籍,所以就参考这本书的过程来学习了. 在这一节我们主要是

机器学习算法与Python实践之(四)支持向量机(SVM)实现

机器学习算法与Python实践之(四)支持向量机(SVM)实现 机器学习算法与Python实践之(四)支持向量机(SVM)实现 [email protected] http://blog.csdn.net/zouxy09 机器学习算法与Python实践这个系列主要是参考<机器学习实战>这本书.因为自己想学习Python,然后也想对一些机器学习算法加深下了解,所以就想通过Python来实现几个比较常用的机器学习算法.恰好遇见这本同样定位的书籍,所以就参考这本书的过程来学习了. 在这一节我们主要是

机器学习算法与Python实践之(二)支持向量机(SVM)初级

机器学习算法与Python实践之(二)支持向量机(SVM)初级 机器学习算法与Python实践之(二)支持向量机(SVM)初级 [email protected] http://blog.csdn.net/zouxy09 机器学习算法与Python实践这个系列主要是参考<机器学习实战>这本书.因为自己想学习Python,然后也想对一些机器学习算法加深下了解,所以就想通过Python来实现几个比较常用的机器学习算法.恰好遇见这本同样定位的书籍,所以就参考这本书的过程来学习了. 在这一节我们主要是