How does the 'Add & Norm' structure improve Transformer's performance?