[深度之眼机器学习训练营第四期]对数几率回归

基本概念

对数几率回归（Logistic Regression，又称逻辑回归）可以用来解决二分类和多分类问题。分类问题中，输出集合不再是连续值，而是离散值，即\(\mathcal{Y}\in \{0,1,2,\cdots\}\)。以二分类问题为例，其输出集合一般为\(\mathcal{Y}\in \{0,1\}\)。

为了解决二分类问题，对数几率回归在线性回归的基础上引入Sigmoid函数（Logistic函数），其中\(\exp(\cdot)\)是自然指数：
\[
g(z) = \dfrac{1}{1 +\exp({-z})}\\
\]
该函数的值域为\([0,1]\)，如下图所示：

因此，对数几率回归中假设集的定义为：
\[
h_\theta (x) = g ( \theta^T x )
\]

实际上，\(h_{\theta}(x)\)给出了在给定参数\(\theta\)和样本\(x\)的条件下，标签\(y=1\)的概率。
\[
\begin{aligned}& h_\theta(x) = P(y=1 | x ; \theta) = 1 - P(y=0 | x ; \theta) \\& P(y = 0 | x;\theta) + P(y = 1 | x ; \theta) = 1\end{aligned}
\]

损失函数

对数几率回归的损失函数如下所示：
\[
J(\theta) = \dfrac{1}{n} \sum_{i=1}^N \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \ \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) =\left\{
\begin{aligned}
&-\log(h_\theta(x^{(i)})) \; & \text{if }y^{(i)} = 1\&-\log(1-h_\theta(x^{(i)})) \; & \text{if } y^{(i)} = 0
\end{aligned}
\right.
\]
该损失函数通过极大似然法导出。对于给定的输入集\(\mathcal{X}\)和输出集\(\mathcal{Y}\)，其似然函数为：
\[
\prod _{i = 1}^n \left[h_\theta(x^{(i)})\right]^{y^{(i)}}\left[1 - h_\theta(x^{(i)})\right]^{1 - y^{(i)}}
\]

由于连乘不好优化，因此上式两边取对数，转化成连加的形式，得到对数似然函数：
\[
L(\theta)=\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(x^{(i)})) \right ]
\]
最大化上述对数似然函数就可以得到最优的参数\(\theta\)。而最大化对数似然函数\(L(\theta)\)等价于最小化\(- L(\theta)\)，因此我们可以得到如下损失函数的形式：
\[
J(\theta) = -\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(x^{(i)})) \right ]
\]

参数学习

得到损失函数后，需要使用梯度下降法求解该函数的最小值。首先，将损失函数进行化简：
\[
\begin{aligned}
J(\theta) &=-\frac{1}{n} \sum _{i=1}^N \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(x^{(i)})) \right ] \ &=-\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)}\log \frac {h_\theta(x^{(i)})} {1 - h_\theta(x^{(i)})} + \log(1 - h_\theta(x^{(i)})) \right ] \ &=-\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)} \log \frac { {\exp(\theta\cdot x^{(i)})} / (1 + \exp(\theta\cdot x^{(i)}))} {{1} /(1 + \exp(\theta\cdot x^{(i)}))} + \log(1 - h_\theta(x^{(i)})) \right ] \ &=-\frac{1}{n} \sum _{i=1}^n \left[ y_i (\theta\cdot x^{(i)}) + \log(1 + \exp (\theta\cdot x^{(i)})) \right ]
\end{aligned}
\]

求解损失函数\(J(\theta)\)对参数\(\theta\)的偏导数：
\[
\begin{aligned}
\frac{\partial}{\partial \theta}J(\theta) &=-\frac{1}{n} \sum _{i=1}^n \left [y^{(i)} \cdot x^{(i)} - \frac {1} {1 + \exp(\theta \cdot x^{(i)})} \cdot \exp(\theta \cdot x^{(i)}) \cdot x^{(i)}\right ] \ &=-\frac{1}{n} \sum _{i=1}^n \left [y^{(i)} \cdot x^{(i)} - \frac {\exp(\theta \cdot x^{(i)})} {1 + \exp(\theta \cdot x^{(i)})} \cdot x^{(i)}\right ] \ &=-\frac{1}{n} \sum _{i=1}^n \left (y^{(i)} - \frac {\exp(\theta \cdot x^{(i)})} {1 + \exp(\theta \cdot x^{(i)})} \right ) x^{(i)}\ &=\frac{1}{n} \sum _{i=1}^n \left (h_\theta(x^{(i)})-y^{(i)} \right )x^{(i)}
\end{aligned}
\]

使用梯度下降法逐个更新参数：
\[
\theta_j := \theta_j - \frac{\alpha}{n} \sum_{i=1}^n \left(h_\theta(x^{(i)}) - y^{(i)}\right) x_j^{(i)}
\]

原文地址：https://www.cnblogs.com/littleorange/p/12231329.html

时间： 2024-10-08 18:06:49

[深度之眼机器学习训练营第四期]对数几率回归

基本概念

损失函数

参数学习

[深度之眼机器学习训练营第四期]对数几率回归的相关文章

[深度之眼机器学习训练营第四期]过拟合与正则化

[深度之眼机器学习训练营第四期]神经网络之参数学习

对数几率回归法（梯度下降法，随机梯度下降与牛顿法）与线性判别法(LDA)

深度之眼PyTorch训练营第二期 ---基础数据结构-张量

深度之眼PyTorch训练营第二期 ---2、张量操作与线性回归

深度之眼PyTorch训练营第二期 ---5、Dataloader与Dataset

深度之眼PyTorch训练营第二期 ---3、计算图与动态图机制

深度之眼PyTorch训练营第二期 --- 8、权值初始化

梯度下降法实现对数几率回归