Residual Networks - v2
In the Residual Networks programming excersise, there is one explaination about how ResNet works:
We also saw in lecture that having ResNet blocks with the shortcut also makes it very easy for one of the blocks to learn an identity function. This means that you can stack on additional ResNet blocks with little risk of harming training set performance. (There is also some evidence that the ease of learning an identity function–even more than skip connections helping with vanishing gradients–accounts for ResNets’ remarkable performance.)
Q: Why is that? What is “learning an identity function”? What is it used for in the learning process?