^ is the square root of epsilon
a simplified version of hard version
a more smooth way to find correct solution
the first term is the REINFORCE term, and the seconde term is our grad log probability of our loss
b is a stochastic node
more formula derivations are ignored.
原文地址:https://www.cnblogs.com/ecoflex/p/8977893.html
时间: 2024-11-02 09:34:46