Mugur: How learning rate affects CNN solution quality and GPU cycles

Let's look at the learning rate. Values for it have a huge impact on the output and especially how long it takes to converge upon a solution. In fact the learning rate has nothing to do with neural style transfer in particular but is actually related to the optimization process. I have not yet found a concise clear definition of the learning rate yet but I have a heuristic: What's the likelihood of accepting a good enough solution?

Too low of a learning rate is like a perfectionist writer that can't get beyond the first word because they have the wrong color pen. Too high of a learning rate and we're willing to accept putting water in our gas tank because they are both liquids; a quick but ultimately unfit solution.

Image style transfer at a learning rate 0.1 at 500 iterations. Learning rate 0.1 at 500 iterations.

Image style transfer at a learning rate 0.5 at 500 iterations. Learning rate 0.5 at 500 iterations.

Image style transfer at a learning rate 1 at 500 iterations. Learning rate 1 at 500 iterations.

Image style transfer at a learning rate 5 at 500 iterations. Learning rate 5 at 500 iterations.

Image style transfer at a learning rate 10 at 500 iterations. Learning rate 10 at 500 iterations.

Image style transfer at a learning rate 20 at 500 iterations. Learning rate 20 at 500 iterations.

Notice how the images tend to get lighter as the learning rate increases. This is because it takes so long to converge at the lower, "perfectionist" levels.

But a higher learning rate isn't always better. Increasing the learning rate too much starts to have diminishing returns and eventually hurting the solution. This is really apparent in the next image at a learning rate of 100 and we can see how the image is seriously starting to deteriorate. In fact progress will slow to a halt even with more iterations because the solution will have gotten stuck an a local minimum.

Image style transfer at a learning rate 100 at 500 iterations. Learning rate 100 at 500 iterations.

What's interesting about learning rate is that it's somewhat predictable. If we take the image above at learning rate 5 and continue to refine the solution until 2000 iterations we get a much more through transfer of the style but the structure stays largely the same!

Image style transfer at a learning rate 5 at 500 iterations. Learning rate 5 at 500 iterations.

Image style transfer at a learning rate 5 at 2000 iterations. Learning rate 5 at 2000 iterations.

Image style transfer at a learning rate 5 looks much better at 2000 iterations. Learning rate 5 at 2000 iterations. Enlarged for your viewing pleasure. Compared to the respective learning rate 5 above at 500 iterations much more of the style has come through but the _structure_ is almost identical.

5. Large and Small.

Understanding Neural Style Transfer

4. Fast and slow. How learning rate affects quality and GPU cycles.