Let's look at the learning rate. Values for it have a huge impact on the output and especially how long it takes to converge upon a solution. In fact the learning rate has nothing to do with neural style transfer in particular but is actually related to the optimization process. I have not yet found a concise clear definition of the learning rate yet but I have a heuristic:* What's the likelihood of accepting a good enough solution?*

Too low of a learning rate is like a perfectionist writer that can't get beyond the first word because they have the wrong color pen. Too high of a learning rate and we're willing to accept putting water in our gas tank because they are both liquids; a quick but ultimately unfit solution.

Notice how the images tend to get lighter as the learning rate increases. This is because it takes so long to converge at the lower, "perfectionist" levels.

But a higher learning rate isn't always better. Increasing the learning rate too much starts to have diminishing returns and eventually hurting the solution. This is really apparent in the next image at a learning rate of 100 and we can see how the image is seriously starting to deteriorate. In fact progress will slow to a halt even with more iterations because the solution will have gotten stuck an a local minimum.

What's interesting about learning rate is that it's somewhat predictable. If we take the image above at learning rate 5 and continue to refine the solution until 2000 iterations we get a much more through transfer of the style but the structure stays largely the same!