SVM的标准形式是\begin{align*} \min_{\boldsymbol{w}} \ \ \ \frac{\lambda}{2} \|\boldsymbol{w}\|^2 + \frac{1}{M} \sum_{i=1}^M \max \{0, 1 - y_i (\boldsymbol{w}^\top \boldsymbol{x}_i + b) \} \end{align*}其中$\boldsymbol{x}_i \in \mathbb{R}^d$,$y_i \in \{1,-1\}$。第一项是控制模型复杂度的正则化项,第二项是度量模型误差的Hinge损失项。
现在假设每个样本$(\boldsymbol{x}_i ,y_i)$都是来自于某个Gaussian分布$\mathcal{N}(\boldsymbol{x}_i, \boldsymbol{\Sigma}_i)$,其中协方差矩阵$\boldsymbol{\Sigma}_i \in \mathbb{S}_{++}^{d}$刻画了$\boldsymbol{x}_i$位置的不确定性,那么新的问题可重新形式化为\begin{align} \label{equ: svm_gaussian_1} \min_{\boldsymbol{w}} \ \ \ \frac{\lambda}{2} \|\boldsymbol{w}\|^2 + \frac{1}{M} \sum_{i=1}^M \int_{\mathbb{R}^d} \max \{0, 1 - y_i (\boldsymbol{w}^\top \boldsymbol{x} + b) \} p_i(\boldsymbol{x}) \mbox{d} \boldsymbol{x}\end{align}其中\begin{align*} p_i(\boldsymbol{x}) = \frac{1}{(2 \pi)^{d/2} |\boldsymbol{\Sigma}_i|^{1/2}} \exp \left( -\frac{1}{2} (\boldsymbol{x} - \boldsymbol{x}_i)^\top \boldsymbol{\Sigma}_i^{-1} (\boldsymbol{x} - \boldsymbol{x}_i) \right) \end{align*}也就是原本单个样本的Hinge损失现在变成了Gaussian分布下的期望。注意式(\ref{equ: svm_gaussian_1})可重写为\begin{align*} \min_{\boldsymbol{w}} \ \ \ \frac{\lambda}{2} \|\boldsymbol{w}\|^2 + \frac{1}{M} \sum_{i=1}^M \int_{\Omega_i} (1 - y_i \boldsymbol{w}^\top \boldsymbol{x} - y_i b) p_i(\boldsymbol{x}) \mbox{d} \boldsymbol{x} \end{align*}其中$\Omega_i = \{ \boldsymbol{x} | y_i (\boldsymbol{w}^\top \boldsymbol{x} + b) \leq 1 \}$。于是,重点就是如下形如\begin{align*} L(\boldsymbol{g}, h, \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \int_{\Omega} \frac{ \boldsymbol{g}^\top \boldsymbol{x} + h }{(2 \pi)^{d/2} |\boldsymbol{\Sigma}|^{1/2}} \exp \left( - \frac{1}{2} (\boldsymbol{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\boldsymbol{x} - \boldsymbol{\mu}) \right) \mbox{d} \boldsymbol{x} \end{align*}的计算,其中$\Omega = \{ \boldsymbol{x} | \boldsymbol{g}^\top \boldsymbol{x} + h \geq 0 \}$,注意有对应$\boldsymbol{g} = -y_i \boldsymbol{w}$,$h = 1 - y_i b$,$\boldsymbol{\mu} = \boldsymbol{x}_i$,$\boldsymbol{\Sigma} = \boldsymbol{\Sigma}_i$。
先做变量代换$\boldsymbol{u} = \boldsymbol{x} - \boldsymbol{\mu}$,易知$\mbox{d} \boldsymbol{x} = \mbox{d} \boldsymbol{u}$于是\begin{align*} L(\boldsymbol{g}, h, \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \int_{\Omega_1} \frac{\boldsymbol{g}^\top \boldsymbol{u} + \boldsymbol{g}^\top \boldsymbol{\mu} + h }{(2 \pi)^{d/2} |\boldsymbol{\Sigma}|^{1/2}} \exp \left( -\frac{1}{2} \boldsymbol{u}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{u} \right) \mbox{d} \boldsymbol{u} \end{align*}其中$\Omega_1 = \{ \boldsymbol{u} | \boldsymbol{g}^\top \boldsymbol{u} + \boldsymbol{g}^\top \boldsymbol{\mu} + h \geq 0 \}$。
由于$\boldsymbol{\Sigma}$是正定矩阵,故可做特征值分解$\boldsymbol{\Sigma} = \boldsymbol{U} \boldsymbol{D} \boldsymbol{U}^\top$,其中正交矩阵$\boldsymbol{U}$的每一列都是$\boldsymbol{\Sigma}$的特征向量,$\boldsymbol{D}$是由相应特征值构成的对角矩阵,于是$\boldsymbol{\Sigma}^{-1} = \boldsymbol{U} \boldsymbol{D}^{-1} \boldsymbol{U}^\top$。记$\boldsymbol{z} = \boldsymbol{U}^\top \boldsymbol{u}$ 及$\boldsymbol{g}_1 = \boldsymbol{U}^\top \boldsymbol{g}$,易知有\begin{align*} \boldsymbol{g}^\top \boldsymbol{u} & = \boldsymbol{g}^\top (\boldsymbol{U} \boldsymbol{U}^\top) \boldsymbol{u} = (\boldsymbol{U}^\top \boldsymbol{g})^\top \boldsymbol{U}^\top \boldsymbol{u} = \boldsymbol{g}_1^\top \boldsymbol{z} \\ \boldsymbol{u}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{u} & = \boldsymbol{u}^\top \boldsymbol{U} \boldsymbol{D}^{-1} \boldsymbol{U}^\top \boldsymbol{u} = \boldsymbol{z}^\top \boldsymbol{D}^{-1} \boldsymbol{z} \\ \mbox{d} \boldsymbol{u} & = |\boldsymbol{U}| \mbox{d} \boldsymbol{z} = \mbox{d} \boldsymbol{z} \end{align*}故\begin{align*} L(\boldsymbol{g}, h, \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \int_{\Omega_2} \frac{\boldsymbol{g}_1^\top \boldsymbol{z} + \boldsymbol{g}^\top \boldsymbol{\mu} + h }{(2 \pi)^{d/2} |\boldsymbol{\Sigma}|^{1/2}} \exp \left( -\frac{1}{2} \boldsymbol{z}^\top \boldsymbol{D}^{-1} \boldsymbol{z} \right) \mbox{d} \boldsymbol{z} \end{align*}其中$\Omega_2 = \{ \boldsymbol{z} | \boldsymbol{g}_1^\top \boldsymbol{z} + \boldsymbol{g}^\top \boldsymbol{\mu} + h \geq 0 \}$。
记$\boldsymbol{v} = \boldsymbol{D}^{-1/2} \boldsymbol{z}$及$\boldsymbol{g}_2 = \boldsymbol{D}^{1/2} \boldsymbol{g}_1$,易知有
\begin{align*} \boldsymbol{z}^\top \boldsymbol{D}^{-1} \boldsymbol{z} & = \boldsymbol{z}^\top {\boldsymbol{D}^{-1/2}}^\top \boldsymbol{D}^{-1/2} \boldsymbol{z} = \boldsymbol{v}^\top \boldsymbol{v} \\ \boldsymbol{g}_1^\top \boldsymbol{z} & = \boldsymbol{g}_1^\top \boldsymbol{D}^{1/2} \boldsymbol{v} = \boldsymbol{g}_2^\top \boldsymbol{v} \\ \mbox{d} \boldsymbol{z} & = |\boldsymbol{D}^{1/2}| \mbox{d} \boldsymbol{v} = |\boldsymbol{\Sigma}^{1/2}| \mbox{d} \boldsymbol{v} \end{align*}故\begin{align*} L(\boldsymbol{g}, h, \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \int_{\Omega_3} \frac{\boldsymbol{g}_2^\top \boldsymbol{v} + \boldsymbol{g}^\top \boldsymbol{\mu} + h }{(2 \pi)^{d/2}} \exp \left( -\frac{1}{2} \boldsymbol{v}^\top \boldsymbol{v} \right) \mbox{d} \boldsymbol{v} \end{align*}其中$\Omega_3 = \{ \boldsymbol{v} | \boldsymbol{g}_2^\top \boldsymbol{v} + \boldsymbol{g}^\top \boldsymbol{\mu} + h \geq 0 \}$。
注意$\boldsymbol{g}_2$是一个$d$维向量,易知存在$d-1$个向量与$\boldsymbol{g}_2/\|\boldsymbol{g}_2\|$一起构成正交矩阵$\boldsymbol{B}$,不妨设$\boldsymbol{B}$的第$j$列为$\boldsymbol{g}_2/\|\boldsymbol{g}_2\|$,于是$\boldsymbol{B}^\top \boldsymbol{g}_2 = \|\boldsymbol{g}_2\| \boldsymbol{e}_j$,其中$\boldsymbol{e}_j$是第$j$维为$1$其余维均为$0$的单位列向量。记$\boldsymbol{m} = \boldsymbol{B}^\top \boldsymbol{v}$,易知有\begin{align*} \boldsymbol{g}_2^\top \boldsymbol{v} & = (\|\boldsymbol{g}_2\| \boldsymbol{B} \boldsymbol{e}_j)^\top \boldsymbol{v} = \|\boldsymbol{g}_2\| \boldsymbol{e}_j^\top \boldsymbol{B}^\top \boldsymbol{v} = \|\boldsymbol{g}_2\| \boldsymbol{e}_j^\top \boldsymbol{m} = \|\boldsymbol{g}_2\| m_j \\ \boldsymbol{v}^\top \boldsymbol{v} & = \boldsymbol{v}^\top \boldsymbol{B} \boldsymbol{B}^\top \boldsymbol{v} = \boldsymbol{m}^\top \boldsymbol{m} \\ \mbox{d} \boldsymbol{v} & = |\boldsymbol{B}| \mbox{d} \boldsymbol{m} = \mbox{d} \boldsymbol{m}\end{align*}其中$m_j$是$\boldsymbol{m}$的第$j$维,故\begin{align*} L(\boldsymbol{g}, h, \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \int_{\Omega_4} \frac{\|\boldsymbol{g}_2\| m_j + \boldsymbol{g}^\top \boldsymbol{\mu} + h }{(2 \pi)^{d/2}} \exp \left( -\frac{1}{2} \boldsymbol{m}^\top \boldsymbol{m} \right) \mbox{d} \boldsymbol{m} \end{align*}其中$\Omega_4 = \{ \boldsymbol{m} | \|\boldsymbol{g}_2\| m_j + \boldsymbol{g}^\top \boldsymbol{\mu} + h \geq 0 \}$。
这是一个$d$重积分,积分变量分别是$m_1, \cdots, m_d$,注意对于$m_k, k \neq j$有\begin{align*} \int_{- \infty}^{\infty} \exp \left( -\frac{1}{2} m_k^2 \right) \mbox{d} m_k = \sqrt{2 \pi} \end{align*}于是将$m_k, k \neq j$全部积分掉可得\begin{align} \label{equ: svm_gaussian_4} L(\boldsymbol{g}, h, \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \int_c^{\infty} \frac{\|\boldsymbol{g}_2\| m_j + \boldsymbol{g}^\top \boldsymbol{\mu} + h }{\sqrt{2 \pi}} \exp \left( -\frac{1}{2} m_j^2 \right) \mbox{d} m_j \end{align}其中$c = - \frac{\boldsymbol{g}^\top \boldsymbol{\mu} + h}{\|\boldsymbol{g}_2\|}$。易知\begin{align} \label{equ: svm_gaussian_5} \int_c^{\infty} m_j \exp \left( -\frac{1}{2} m_j^2 \right) \mbox{d} m_j = \int_c^{\infty} \exp \left( -\frac{1}{2} m_j^2 \right) \mbox{d} \left( \frac{1}{2} m_j^2 \right) = \int_{\frac{1}{2}c^2}^{\infty} \exp \left( -x \right) \mbox{d} x = \exp \left( -\frac{1}{2} c^2 \right)\end{align}及\begin{align} \label{equ: svm_gaussian_6} \int_c^{\infty} \exp \left( -\frac{1}{2} m_j^2 \right) \mbox{d} m_j = \int_c^0 \exp \left( -\frac{1}{2} m_j^2 \right) \mbox{d} m_j + \int_0^{\infty} \exp \left( -\frac{1}{2} m_j^2 \right) \mbox{d} m_j = \frac{\sqrt{2 \pi}}{2} - \sqrt{2} \int_0^{\frac{c}{\sqrt{2}}} \exp \left( - t^2 \right) \mbox{d} t = \frac{\sqrt{2 \pi}}{2} \left(1 - \mbox{erf} \left(\frac{c}{\sqrt{2}} \right)\right) \end{align}其中$\mbox{erf}: \mathbb{R} \mapsto (-1,1)$是错误函数(error function):\begin{align*} \mbox{erf}(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2} \mbox{d} t \end{align*}
将式(\ref{equ: svm_gaussian_5})和式(\ref{equ: svm_gaussian_6})代入式(\ref{equ: svm_gaussian_4})可得\begin{align*} L(\boldsymbol{g}, h, \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{\|\boldsymbol{g}_2\|}{\sqrt{2 \pi}} \exp \left( -\frac{1}{2} c^2 \right) + \frac{\boldsymbol{g}^\top \boldsymbol{\mu} + h}{2} \left(1 - \mbox{erf} \left(\frac{c}{\sqrt{2}} \right)\right) \end{align*}注意\begin{align*} \|\boldsymbol{g}_2\|^2 & = (\boldsymbol{D}^{1/2} \boldsymbol{g}_1)^\top \boldsymbol{D}^{1/2} \boldsymbol{g}_1 = \boldsymbol{g}_1^\top \boldsymbol{D} \boldsymbol{g}_1 = (\boldsymbol{U}^\top \boldsymbol{g})^\top \boldsymbol{D} \boldsymbol{U}^\top \boldsymbol{g} = \boldsymbol{g}^\top \boldsymbol{\Sigma} \boldsymbol{g} = \boldsymbol{w}^\top \boldsymbol{\Sigma}_i \boldsymbol{w} \\ c & = - \frac{\boldsymbol{g}^\top \boldsymbol{\mu} + h}{\|\boldsymbol{g}_2\|} = \frac{y_i(\boldsymbol{w}^\top \boldsymbol{x}_i + b) - 1}{\sqrt{\boldsymbol{w}^\top \boldsymbol{\Sigma}_i \boldsymbol{w}}}\\ \boldsymbol{g}^\top \boldsymbol{\mu} + h & = 1 - y_i(\boldsymbol{w}^\top \boldsymbol{x}_i + b) \end{align*}全部代入最终可得在样本$(\boldsymbol{x}_i, y_i, \boldsymbol{\Sigma}_i)$上的期望损失为\begin{align*} L(\boldsymbol{w}, b, \boldsymbol{x}_i, y_i, \boldsymbol{\Sigma}_i) = \sqrt{\frac{\boldsymbol{w}^\top \boldsymbol{\Sigma}_i \boldsymbol{w}}{2 \pi}} \exp \left( -\frac{(\boldsymbol{w}^\top \boldsymbol{x}_i + b - y_i)^2}{2\boldsymbol{w}^\top \boldsymbol{\Sigma}_i \boldsymbol{w}} \right) + \frac{1 - y_i(\boldsymbol{w}^\top \boldsymbol{x}_i + b) }{2} \left(1 - \mbox{erf} \left(\frac{y_i(\boldsymbol{w}^\top \boldsymbol{x}_i + b) - 1}{\sqrt{2\boldsymbol{w}^\top \boldsymbol{\Sigma}_i \boldsymbol{w}}} \right)\right) \end{align*}
这个损失是原来Hinge损失的非负组合,因此还是凸的,它的梯度也不难计算,因此标准的凸优化方法直接就可以直接用来求解了。