Proximity Gradient Descent 用来解决 $L_1$ 正则中 0 点不可导的问题,先引入一个 proximity operator :
\[\mbox {prox} _{h}(x) = \arg \min_u (h(u) + \frac{1}{2}||u - x||_2^2)\]
\[\mbox {prox} _{\lambda h}(x) = \arg \min_u (h(u) + \frac{1}{2 \lambda}||u - x||_2^2)\]
对于一个最优化问题,形式如:
\[\min_x h(x) + g(x)\]
可以用如下方法来迭代求解:
\[x^{k+1} = \mbox{prox}_{\gamma h}(x^k-\gamma \nabla g(x^k)) \]
因为 如此迭代下去,是会朝着极小方向前进的:
\begin{aligned}x^{k+1} &= \mbox{prox}_{\gamma h}(x^k-\gamma \nabla g(x^k)) \\
&= \mbox{arg}\min_x \left(h(x)+\frac{1}{2\gamma}\mid\mid x-x^k+ \gamma \nabla g(x^k) \mid\mid_2^2\right) \\
&= \mbox{arg}\min_x \left(h(x)+ \frac{\gamma}{2}\mid\mid \nabla g(x)\mid\mid_2^2 + \gamma \nabla g(x^k)^T(x-x^k)+ \frac{1}{2\gamma}\mid\mid x-x^k \mid\mid_2^2\right) \\
&= \mbox{arg}\min_x \left(h(x)+ g(x^k) + \gamma \nabla g(x^k)^T(x-x^k) +\frac{1}{2\gamma}\mid\mid x-x^k \mid\mid_2^2\right) \\
& \approx \mbox{arm}\min_x \ h(x)+g(x)
\end{aligned}
后两式的变化是因为,变化的两项都与 x 无关,然后通过二阶泰勒展开近似得到目标函数。
当 $g(x)$ 为 $L_1$ 正则时:
\[ x^{k+1} = \mbox{prox} _{\lambda^k y} (x^k - \lambda^k \nabla f(x^k))\]
参考:
http://breezedeus.github.io/2013/11/16/breezedeus-proximal-gd.html
http://blog.csdn.net/lanyanchenxi/article/details/50448640
https://www.zhihu.com/question/38426074