I found this to be really puzzling. A deeper NN is supposed to be more powerful or at least equal to a shallower NN. I have already used dropout to prevent overfitting. How can the performance be degraded?
Yoshua‘s Answer
Yoshua Bengio, My lab has been one of the three that started the deep learning approach, bac...
Upvoted by Prateek Tandon, Robotics and Strong Artificial Intelligence Researcher• Paul King, Computational Neuroscientist, Technology Entrepreneur • Jack Rae,Google DeepMind Research Engineer
Yoshua has 25 endorsements in Deep Learning.
If you do not change the size of the layers and just add more layers, capacity should increase, so you could be overfitting. However, you should check whether training error increases or decreases. If it increases (which is also very plausible), it means that adding the layer made the optimization harder, with the optimization methods and initialization that you are using. That could also explain your problem. However, if training error decreases and test error increases, you are overfitting.