感知机(perceptron)由Rosenblatt于1957年提出,是神经网络与支持向量机的基础。
感知机是最早被设计并被实现的人工神经网络。感知机是一种非常特殊的神经网络,它在人工神经网络的发展史上有着非常重要的地位,尽管它的能力非常有限,主要用于线性分类。
感知机还包括多层感知机,简单的线性感知机用于线性分类器,多层感知机(含有隐层的网络)可用于非线性分类器。本文中介绍的均是简单的线性感知机。
图 1
感知机工作方式:
(1)、学习阶段:修改权值和偏置,根据”已知的样本”对权值和偏置不断修改----有监督学习。当给定某个样本的输入/输出模式对时,感知机输出单元会产生一个实际输出向量,用期望输出(样本输出)与实际输出之差来修正网络连接权值和偏置。
(2)、工作阶段:计算单元变化,由响应函数给出新输入下的输出。
感知机学习策略:
感知机学习的目标就是求得一个能够将训练数据集中正负实例完全分开的分类超平面,为了找到分类超平面,即确定感知机模型中的参数w和b,需要定义一个基于误分类的损失函数,并通过将损失函数最小化来求w和b。
(1)、数据集线性可分性:在二维平面中,可以用一条直线将+1类和-1类完美分开,那么这个样本空间就是线性可分的。因此,感知机都基于一个前提,即问题空间线性可分;
(2)、定义损失函数,找到参数w和b,使得损失函数最小。
损失函数的选取:
(1)、损失函数的一个自然选择就是误分类点的总数,但是这样的点不是参数w,b的连续可导函数,不易优化;
(2)、损失函数的另一个选择就是误分类点到划分超平面S(w*x+b=0)的总距离。
以上理论部分主要来自: http://staff.ustc.edu.cn/~qiliuql/files/DM2013/2013SVM.pdf
以下代码根据上面的描述实现:
#include <iostream> #include <vector> #include <assert.h> #include <time.h> typedef std::vector<float> feature; typedef int label; class Perceptron { private: std::vector<feature> feature_set; std::vector<label> label_set; int iterates; float learn_rate; std::vector<float> weight; int size_weight; float bias; void initWeight(); float calDotProduct(const feature feature_, const std::vector<float> weight_); void updateWeight(const feature feature_, int label_); public: Perceptron(int iterates_, float learn_rate_, int size_weight_, float bias_); void getDataset(const std::vector<feature> feature_set_, const std::vector<label> label_set_); bool train(); label predict(const feature feature_); }; void Perceptron::updateWeight(const feature feature_, int label_) { for (int i = 0; i < size_weight; i++) { weight[i] += learn_rate * feature_[i] * label_; // formula 5 } bias += learn_rate * label_; // formula 5 } float Perceptron::calDotProduct(const feature feature_, const std::vector<float> weight_) { assert(feature_.size() == weight_.size()); float ret = 0.; for (int i = 0; i < feature_.size(); i++) { ret += feature_[i] * weight_[i]; } return ret; } void Perceptron::initWeight() { srand(time(0)); float range = 100.0; for (int i = 0; i < size_weight; i++) { float tmp = range * rand() / (RAND_MAX + 1.0); weight.push_back(tmp); } } Perceptron::Perceptron(int iterates_, float learn_rate_, int size_weight_, float bias_) { iterates = iterates_; learn_rate = learn_rate_; size_weight = size_weight_; bias = bias_; weight.resize(0); feature_set.resize(0); label_set.resize(0); } void Perceptron::getDataset(const std::vector<feature> feature_set_, const std::vector<label> label_set_) { assert(feature_set_.size() == label_set_.size()); feature_set.resize(0); label_set.resize(0); for (int i = 0; i < feature_set_.size(); i++) { feature_set.push_back(feature_set_[i]); label_set.push_back(label_set_[i]); } } bool Perceptron::train() { initWeight(); for (int i = 0; i < iterates; i++) { bool flag = true; for (int j = 0; j < feature_set.size(); j++) { float tmp = calDotProduct(feature_set[j], weight) + bias; if (tmp * label_set[j] <= 0) { updateWeight(feature_set[j], label_set[j]); flag = false; } } if (flag) { std::cout << "iterate: " << i << std::endl; std::cout << "weight: "; for (int m = 0; m < size_weight; m++) { std::cout << weight[m] << " "; } std::cout << std::endl; std::cout << "bias: " << bias << std::endl; return true; } } return false; } label Perceptron::predict(const feature feature_) { assert(feature_.size() == size_weight); return calDotProduct(feature_, weight) + bias >= 0 ? 1 : -1; //formula 2 } int main(int argc, char* argv[]) { // prepare data const int len_data = 20; const int feature_dimension = 2; float data[len_data][feature_dimension] = { { 10.3, 10.7 }, { 20.1, 100.8 }, { 44.9, 8.0 }, { -2.2, 15.3 }, { -33.3, 77.7 }, { -10.4, 111.1 }, { 99.3, -2.2 }, { 222.2, -5.5 }, { 10.1, 10.1 }, {66.6, 30.2}, { 0.1, 0.2 }, { 1.2, 0.03 }, { 0.5, 4.6 }, { -22.3, -11.1 }, {-88.9, -12.3}, { -333.3, -444.4 }, { -111.2, 0.5 }, { -6.6, 2.9 }, { 3.3, -100.2 }, {5.6, -88.8} }; int label_[len_data] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }; std::vector<feature> set_feature; std::vector<label> set_label; for (int i = 0; i < len_data; i++) { feature feature_single; for (int j = 0; j < feature_dimension; j++) { feature_single.push_back(data[i][j]); } set_feature.push_back(feature_single); set_label.push_back(label_[i]); feature_single.resize(0); } // train int iterates = 1000; float learn_rate = 0.5; int size_weight = feature_dimension; float bias = 2.5; Perceptron perceptron(iterates, learn_rate, size_weight, bias); perceptron.getDataset(set_feature, set_label); bool flag = perceptron.train(); if (flag) { std::cout << "data set is linearly separable" << std::endl; } else { std::cout << "data set is linearly inseparable" << std::endl; return -1; } // predict feature feature1; feature1.push_back(636.6); feature1.push_back(881.8); std::cout << "the correct result label is 1, " << "the real result label is: " << perceptron.predict(feature1) << std::endl; feature feature2; feature2.push_back(-26.32); feature2.push_back(-255.95); std::cout << "the correct result label is -1, " << "the real result label is: " << perceptron.predict(feature2) << std::endl; std::cout << "ok" << std::endl; return 0; }
运行结果如下图: