What are the advantages of ReLU over sigmoid function in deep neural network?

The state of the art of non-linearity is to use ReLU instead of sigmoid function in deep neural network, what are the advantages?

I know that training a network when ReLU is used would be faster, and it is more biological inspired, what are the other advantages? (That is, any disadvantages of using sigmoid)?

Best answer in stackexchange:

Two additional major benefits of ReLUs are sparsity and a reduced likelihood of vanishing gradient. But first recall the definition of a ReLU is h=max(0,a)h=max(0,a) where a=Wx+ba=Wx+b.

One major benefit is the reduced likelihood of the gradient to vanish. This arises when a>0a>0. In this regime the gradient has a constant value. In contrast, the gradient of sigmoids becomes increasingly small as the absolute value of x increases. The constant gradient of ReLUs results in faster learning.

The other benefit of ReLUs is sparsity. Sparsity arises when a≤0a≤0. The more such units that exist in a layer the more sparse the resulting representation. Sigmoids on the other hand are always likely to generate some non-zero value resulting in dense representations. Sparse representations seem to be more beneficial than dense representations.

Reference: http://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-network

ReLU

ReLU的全称是rectified linear unit。上面的回答基本上涵盖了它胜过sigmoid function的几个方面：

faster
more biological inspired
sparsity
less chance of vanishing gradient (梯度消失问题)

早期使用sigmoid或tanh激活函数的DL在做unsupervised learning时因为 gradient vanishing problem 的问题会无法收敛。ReLU则这没有这个问题。

时间： 2024-12-20 14:30:14

What are the advantages of ReLU over sigmoid function in deep neural network?

The state of the art of non-linearity is to use ReLU instead of sigmoid function in deep neural network, what are the advantages?

I know that training a network when ReLU is used would be faster, and it is more biological inspired, what are the other advantages? (That is, any disadvantages of using sigmoid)?

ReLU

What are the advantages of ReLU over sigmoid function in deep neural network?的相关文章

Sigmoid function in NN

S函数 Sigmoid Function or Logistic Function

sigmoid function的直观解释

ReLU 和sigmoid 函数对比

神经网络中的激活函数具体是什么？为什么Relu要好过与tanh和sigmoid function

小白学习之pytorch框架(5)-多层感知机(MLP)-(tensor、variable、计算图、ReLU()、sigmoid()、tanh())

sigmoid function和softmax function

[C4] Andrew Ng - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

[C3] Andrew Ng - Neural Networks and Deep Learning