Learn LIBSVM---a practical Guide to SVM classification

想学习一下SVM,所以找到了LIBSVM--A Library for Support Vector Machines,首先阅读了一下网站提供的A practical guide to SVM classification.

写一写个人认为主要的精华的东西。

SVMs is:a technique for data classification  

Goal is:to produce a model (based on training data) which predicts the target values of the test data given only the test data attributes.

Kernels:four basic kernels

Proposed Procedure:

1.transform data to the format of an SVM package

  first have to convert categorical attributes into numeric data.We recommend using m numbers to represent an m-category attribute and only one of the m numbers is one,and others are zeros. for example {red,green,blue} can be represented as (0,0,1),(0,1,0)and(1,0,0).

2.conduct simple scaling on the data

  Note:It‘s importance to use the same scaling factors for training and testing sets.

3.consider the RBF kernel K(x,y) = e-r||x-y||2

4.use cross-validation to find the best parameter C and r

  The cross-validation produce can prevent the overfitting problem.We recommend a "grid-search" on C and r using cross-validation.Various pairs of (C,r)values are tried and the one with the best cross-validation accuarcy is picked.Use a coarse grid to make a better region on the grid,a finer grid search on that region can be conducted.

  For very large data sets a feasible approach is to randomly choose a subset of the data set,conduct grid-search on them,and then do a better-region-only grid-search on the completly data set.

5.use the best parameter C and r to train the whole training set

6.Test

When to use Linear but not RBF Kernel ?

  If the number of features is large, one may not need to map data to a higher dimensional space. That is, the nonlinear mapping does not improve the performance.Using the linear kernel is good enough, and one only searches for the parameter C.

  C.1 Number of instances number of features  

    when the number of features is very large, one may not need to map the data.

  C.2 Both numbers of instances and features are large

    Such data often occur in document classication.LIBLINEAR is much faster than LIBSVM to obtain a model with comparable accuracy.LIBLINEAR is efficient for large-scale document classication.

  C.3 Number of instances number of features

    As the number of features is small, one often maps data to higher dimensional spaces(i.e., using nonlinear kernels).

时间: 2024-11-05 22:36:57

Learn LIBSVM---a practical Guide to SVM classification的相关文章

[笔记]A Practical Guide to Support Vector Classi cation

<A Practical Guide to Support Vector Classication>是一篇libSVM使用入门教程以及一些实用技巧. 1. Basic Kernels: (1)linear (2)polynomial (3)radial basis function (4)sigmoid 2. Scaling: Scaling对于SVM非常重要,可以避免某个维度上的值很大,会主导那些值很小的维度.另一个好处是避免复杂的数值计算.另外需要注意的是,在对training data和

Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 1

转自: http://www.confluent.io/blog/stream-data-platform-1/ These days you hear a lot about "stream processing", "event data", and "real-time", often related to technologies like Kafka, Storm, Samza, or Spark's Streaming module.

libsvm代码阅读(2):svm.cpp浅谈和函数指针(转)

svm.cpp浅谈 svm.cpp总共有3159行代码,实现了svm算法的核心功能,里面总共有Cache.Kernel.ONE_CLASS_Q.QMatrix.Solver.Solver_NU.SVC_Q.SVR_Q 8个类(如下图1所示),而它们之间的继承和组合关系如图2.图3所示.在这些类中Cache.Kernel.Solver是核心类,对整个算法起支撑作用.在以后的博文中我们将对这3个核心类做重点注解分析,另外还将对svm.cpp中的svm_train函数做一个注解分析. 图1 图2 图3

The Practical Guide to Empathy Maps: 10-Minute User Personas

That’s where the empathy map comes in. When created correctly, empathy maps serve as the perfect lean user persona: They quickly visualize user needs (especially to non-designers) They fit perfectly into a Lean UX workflow as a starting point for use

Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 2

转自: http://confluent.io/blog/stream-data-platform-2          http://www.infoq.com/cn/news/2015/03/apache-kafka-stream-data-advice/ 在<流数据平台构建实战指南>第一部分中,Confluent联合创始人Jay Kreps介绍了如何构建一个公司范围的实时流数据中心.InfoQ前期对此进行过报道.本文是根据第二部分整理而成.在这一部分中,Jay给出了一些构建数据流平台的具

用libsvm进行回归预测

最近因工作需要,学习了台湾大学林智仁(Lin Chih-Jen)教授等人开发的SVM算法开源算法包. 为了以后方便查阅,特把环境配置及参数设置等方面的信息记录下来. SVM属于十大挖掘算法之一,主要用于分类和回归.本文主要介绍怎么使用LIBSVM的回归进行数值预测. LIBSVM内置了多种编程语言的接口,本文选择Python. 1  LIBSVM官方网址 http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 可在这里下载LIBSVM的开源包,特别推荐初学者阅读文章A

机器学习:SVM实践:Libsvm的使用

引言 ? ? 本文从应用的角度出发,使用Libsvm函数库解决SVM模型的分类与回归问题 ? ? 首先说明一下实验数据,实验数据是Libsvm自带的heart_sacle,是个mat文件 ? ? 加载数据集 ? ? 将mat文件导入MATLAB后会有270*13的实例矩阵变量heart_scale_inst和270*1的标签矩阵heart_scale_label ? ? ? ? ? ? 分类 ? ? 将数据集分为训练数据和测试数据 ? ? 首先我们将实验数据分为训练数据和测试数据 ? ? loa

LIBSVM使用介绍

1.首先从主页上下载libsvm.Python2.5.2和gnuplot 三个软件. http://www.csie.ntu.edu.tw/~cjlin/ 2.准备好数据,首先要把数据转换成Libsvm软件包要求的数据格式为: label index1:value1 index2:value2 ... 其中对于分类来说label为类标识,指定数据的种类 :对于回归来说label为目标值.(我主要要用到回归) Index是从1开始的自然数,value是每一维的特征值. 该过程可以自己使用excel

SVM训练结果参数说明 训练参数说明 归一化加快速度和提升准确率 归一化还原

原文:http://blog.sina.com.cn/s/blog_57a1cae80101bit5.html 举例说明 svmtrain -s 0 -?c 1000 -t 1 -g 1 -r 1 -d 3 data_file 训练一个由多项式核(u'v+1)^3和C=1000组成的分类器. svmtrain -s 1 -n 0.1 -t 2 -g 0.5 -e 0.00001 data_file 在RBF核函数exp(-0.5|u-v|^2)和终止允许限0.00001的条件下,训练一个?-SV