To gain more insight, let's visualize the process of learning inside the convolutional neural network. After all, the transfer process is time consuming so we might as well hypothesize by observing the process at work closely.
This animation shows the output of the style transfer at every single learning iteration as the network tries to solve both for style and content simultaneously. The animation only contains the first few iterations out of the 1-2k cycles needed for a proper transfer because gifs were invented in the 80's.
Here we have two identical images in a ping-pong loop. The only difference between them is the animation speed. The fine details come in first, then the larger features start to take over. These images show every iteration in the early part of the learning stage.
If we skip frames and look at every few iterations the full learning picture starts to emerge. Literally and figuratively. This process is one of the most mesmerizing things I have ever seen. You are literally watching meaning emerging from nothingness in a tiny artificial brain.