The reason why neural network is more powerful than linear function is because neural network use the non-linear function to map the dataset which is difficult to separate to separable space. So we can say that every neural network(including CNN)‘s neuron uses activation function.We also know that the activation function has a lot of forms,such as logistic function,tanh function, rectifier function,and softplus function.The reason why you can see a ReLU(activation function) layer behind the convolution layer in Caffe or other frames is because of convenience of the network‘s defination,in other words,we extract the activation function on purpose so as to we can adjust all activation functions‘s defination in this layer to use the proper one.
1).Why we use activation function? No activation function,the network is just same with Perception(Linear combination).
2).Why use Relu activation function?
1>.If use sigmoid or tanh function,cause large calculation(Exponential operation,Derivative solution).
2>.If use sigmoid or tanh function,the disadvantage is gradient disappearance,the derivative will trend to be 0.
3>.Some output will be 0,the network becomes sparse,it is good to reduce to overfitting.