如何选择分类器?LR、SVM、Ensemble、Deep learning

转自:https://www.quora.com/What-are-the-advantages-of-different-classification-algorithms

There are a number of dimensions you can look at to give you a sense of what will be a reasonable algorithm to start with, namely:

  • Number of training examples
  • Dimensionality of the feature space
  • Do I expect the problem to be linearly separable?
  • Are features independent?
  • Are features expected to linearly dependent with the target variable? *EDIT: see mycomment on what I mean by this
  • Is overfitting expected to be a problem?
  • What are the system‘s requirement in terms of speed/performance/memory usage...?
  • ...

This list may seem a bit daunting because there are many issues that are not straightforward to answer. The good news though is, that as many problems in life, you can address this question by following the Occam‘s Razor principle: use the least complicated algorithm that can address your needs and only go for something more complicated if strictly necessary.
Logistic Regression
As a general rule of thumb, I would recommend to start with Logistic Regression. Logistic regression is a pretty well-behaved classification algorithm that can be trained as long as you expect your features to be roughly linear and the problem to be linearly separable. You can do some feature engineering to turn most non-linear features into linear pretty easily. It is also pretty robust to noise and you can avoid overfitting and even do feature selection by using l2 or l1 regularization. Logistic regression can also be used in Big Data scenarios since it is pretty efficient and can be distributed using, for example, ADMM (see logreg). A final advantage of LR is that the output can be interpreted as a probability. This is something that comes as a nice side effect since you can use it, for example, for ranking instead of classification.
Even in a case where you would not expect Logistic Regression to work 100%, do yourself a favor and run a simple l2-regularized LR to come up with a baseline before you go into using "fancier" approaches.
Ok, so now that you have set your baseline with Logistic Regression, what should be your next step. I would basically recommend two possible directions: (1) SVM‘s, or (2) Tree Ensembles. If I knew nothing about your problem, I would definitely go for (2), but I will start with describing why SVM‘s might be something worth considering.
Support Vector Machines
Support Vector Machines (SVMs) use a different loss function (Hinge) from LR. They are also interpreted differently (maximum-margin). However, in practice, an SVM with a linear kernel is not very different from a Logistic Regression (If you are curious, you can see how Andrew Ng derives SVMs from Logistic Regression in his Coursera Machine Learning Course). The main reason you would want to use an SVM instead of a Logistic Regression is because your problem might not be linearly separable. In that case, you will have to use an SVM with a non linear kernel (e.g. RBF). The truth is that a Logistic Regression can also be used with a different kernel, but at that point you might be better off going for SVMs for practical reasons. Another related reason to use SVMs is if you are in a highly dimensional space. For example, SVMs have been reported to work better for text classification.
Unfortunately, the major downside of SVMs is that they can be painfully inefficient to train. So, I would not recommend them for any problem where you have many training examples. I would actually go even further and say that I would not recommend SVMs for most "industry scale" applications. Anything beyond a toy/lab problem might be better approached with a different algorithm.
Tree Ensembles
This gets me to the third family of algorithms: Tree Ensembles. This basically covers two distinct algorithms: Random Forests and Gradient Boosted Trees. I will talk about the differences later, but for now let me treat them as one for the purpose of comparing them to Logistic Regression.
Tree Ensembles have different advantages over LR. One main advantage is that they do not expect linear features or even features that interact linearly. Something I did not mention in LR is that it can hardly handle categorical (binary) features. Tree Ensembles, because they are nothing more than a bunch of Decision Trees combined, can handle this very well. The other main advantage is that, because of how they are constructed (using bagging or boosting) these algorithms handle very well high dimensional spaces as well as large number of training examples.
As for the difference between Random Forests (RF) and Gradient Boosted Decision Trees (GBDT), I won‘t go into many details, but one easy way to understand it is that GBDTs will usually perform better, but they are harder to get right. More concretely, GBDTs have more hyper-parameters to tune and are also more prone to overfitting. RFs can almost work "out of the box" and that is one reason why they are very popular.
Deep Learning
Last but not least, this answer would not be complete without at least a minor reference toDeep Learning. I would definitely not recommend this approach as a general-purpose technique for classification. But, you might probably have heard how well these methods perform in some cases such as image classification. If you have gone through the previous steps and still feel you can squeeze something out of your problem, you might want to use a Deep Learning approach. The truth is that if you use an open source implementation such as Theano, you can get an idea of how some of these approaches perform in your dataset pretty quickly.
Summary
So, recapping, start with something simple like Logistic Regression to set a baseline and only make it more complicated if you need to. At that point, tree ensembles, and in particular Random Forests since they are easy to tune, might be the right way to go. If you feel there is still room for improvement, try GBDT or get even fancier and go for Deep Learning.
You can also take a look at the Kaggle Competitions. If you search for the keyword "classification" and select those that are completed, you will get a good sense of what people used to win competitions that might be similar to your problem at hand. At that point you will probably realize that using an ensemble is always likely to make things better. The only problem with ensembles, of course, is that they require to maintain all the independent methods working in parallel. That might be your final step to get as fancy as it gets.

时间: 2024-10-31 12:16:16

如何选择分类器?LR、SVM、Ensemble、Deep learning的相关文章

Deep Learning(深度学习)学习笔记整理

申明:本文非笔者原创,原文转载自:http://www.sigvc.org/bbs/thread-2187-1-3.html 4.2.初级(浅层)特征表示 既然像素级的特征表示方法没有作用,那怎样的表示才有用呢? 1995 年前后,Bruno Olshausen和 David Field 两位学者任职 Cornell University,他们试图同时用生理学和计算机的手段,双管齐下,研究视觉问题. 他们收集了很多黑白风景照片,从这些照片中,提取出400个小碎片,每个照片碎片的尺寸均为 16x1

Deep Learning(深度学习)学习笔记整理系列 | @Get社区

body { font-family: Microsoft YaHei UI,"Microsoft YaHei", Georgia,Helvetica,Arial,sans-serif,宋体, PMingLiU,serif; font-size: 10.5pt; line-height: 1.5; } html, body { } h1 { font-size:1.5em; font-weight:bold; } h2 { font-size:1.4em; font-weight:bo

(转)大牛的《深度学习》笔记,60分钟带你学会Deep Learning。

大牛的<深度学习>笔记,60分钟带你学会Deep Learning. 2016-08-01 Zouxy 阅面科技 上期:<从特征描述到深度学习:计算机视觉发展20年> 回复“01”回顾全文   本期:大牛的<深度学习>笔记,60分钟带你学会Deep Learning. 深度学习,即Deep Learning,是一种学习算法(Learning algorithm),亦是人工智能领域的一个重要分支.从快速发展到实际应用,短短几年时间里,深度学习颠覆了语音识别.图像分类.文本

【转载】Deep Learning(深度学习)学习笔记整理

目录: 一.概述 二.背景 三.人脑视觉机理 四.关于特征 4.1.特征表示的粒度 4.2.初级(浅层)特征表示 4.3.结构性特征表示 4.4.需要有多少个特征? 五.Deep Learning的基本思想 六.浅层学习(Shallow Learning)和深度学习(Deep Learning) 七.Deep learning与Neural Network 八.Deep learning训练过程 8.1.传统神经网络的训练方法 8.2.deep learning训练过程 九.Deep Learn

[转载]Deep Learning(深度学习)学习笔记整理

转载自:http://blog.csdn.net/zouxy09/article/details/8775360 感谢原作者:[email protected] 八.Deep learning训练过程 8.1.传统神经网络的训练方法为什么不能用在深度神经网络 BP算法作为传统训练多层网络的典型算法,实际上对仅含几层网络,该训练方法就已经很不理想.深度结构(涉及多个非线性处理单元层)非凸目标代价函数中普遍存在的局部最小是训练困难的主要来源. BP算法存在的问题: (1)梯度越来越稀疏:从顶层越往下

Deep Learning(深度学习)之(三)Deep Learning的常用模型或者方法

九.Deep Learning的常用模型或者方法 9.1.AutoEncoder自动编码器 Deep Learning最简单的一种方法是利用人工神经网络的特点,人工神经网络(ANN)本身就是具有层次结构的系统,如果给定一个神经网络,我们假设其输出与输入是相同的,然后训练调整其参数,得到每一层中的权重.自然地,我们就得到了输入I的几种不同表示(每一层代表一种表示),这些表示就是特征.自动编码器就是一种尽可能复现输入信号的神经网络.为了实现这种复现,自动编码器就必须捕捉可以代表输入数据的最重要的因素

Deep Learning(深度学习)之(四)Deep Learning学习资源

十一.参考文献和Deep Learning学习资源 先是机器学习领域大牛的微博:@余凯_西二旗民工:@老师木:@梁斌penny:@张栋_机器学习:@邓侃:@大数据皮东:@djvu9-- (1)Deep Learning http://deeplearning.net/ (2)Deep Learning Methods for Vision http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/ (3)Neural Network for

【Deep Learning】Using Structured Events to Predict Stock Price Movement:An Empirical Investigation

时间:2014 发表于:EMNLP 原文件:http://pan.baidu.com/s/1i3phG49 主要内容: 利用新闻事件来预测:1. 美股大盘走势:2. 挑选的15个个股的走势. 详细内容: 主要工作步骤: 1. 抽取财经新闻 2. 对新闻title进行parser,并进行事件抽取. 其中事件抽取是open information extraction,即不限定事件的模版,而是进行开放式抽取.抽取的结果是:(主语.动词.宾语.时间).其中,各个元素都会做stemming,动词会做聚类

Deep Learning(深度学习)学习笔记整理系列之(八)

Deep Learning(深度学习)学习笔记整理系列 [email protected] http://blog.csdn.net/zouxy09 作者:Zouxy version 1.0 2013-04-08 声明: 1)该Deep Learning的学习系列是整理自网上很大牛和机器学习专家所无私奉献的资料的.具体引用的资料请看参考文献.具体的版本声明也参考原文献. 2)本文仅供学术交流,非商用.所以每一部分具体的参考资料并没有详细对应.如果某部分不小心侵犯了大家的利益,还望海涵,并联系博主

Deep Learning(深度学习)学习系列之(八)

Deep Learning(深度学习)学习笔记整理系列 声明: 1)该Deep Learning的学习系列是整理自网上很大牛和机器学习专家所无私奉献的资料的.具体引用的资料请看参考文献.具体的版本声明也参考原文献. 2)本文仅供学术交流,非商用.所以每一部分具体的参考资料并没有详细对应.如果某部分不小心侵犯了大家的利益,还望海涵,并联系博主删除. 3)本人才疏学浅,整理总结的时候难免出错,还望各位前辈不吝指正,谢谢. 4)阅读本文需要机器学习.计算机视觉.神经网络等等基础(如果没有也没关系了,没