VLM Art Analysis

Community Article Published October 4, 2024

VLM Art Analysis by Microsoft Florence-2 and Alibaba Cloud Qwen2-VL

This study’s goal is to analyze and compare the results obtained by foundation models Microsoft’s Florence-2 and Alibaba Cloud’s Qwen2-VL to process and explain artworks.

Florence-2 has been introduced by Microsoft in June 2024. A smaller-scale model with zero-shot and fine-tuning capabilities, it is designed to be able to detect, caption and segment elements in pictures from text-prompt and task instructions. FLD-5B consists of 5.4 billion comprehensive visual annotations on 126 million images, “using an iterative strategy of automated image annotation and model refinement”. Florence-2 has been trained on a sequence-to-sequence structure. It is considered a “strong vision foundation model contender”.

Qwen2-VL is the latest version on the VLM in the Qwen group, released on August 30, 2024 by Alibaba Cloud. It “achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA”. The model can comprehend and analyze even long videos, be used on a cellphone, and support various languages, mainly English and European and Asian languages. It is described as being able to "handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience”. Its Multimodal Rotary Position Embedding (M-ROPE) can “decompose positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities”. Qwen2-VL is an open-source models which include Qwen2-VL-2B and Qwen2-VL-7B, under the Apache 2.0 license, and Qwen2-VL-72B under the Qwen license. The models are integrated with Hugging Face Transformers, vLLM, and other third-party frameworks.

For this study, I have used the Florence-2-base and Qwen2-VL-2B models on Google Colab, as standard, easily accessible and affordable models in term of space needed or in price as the GPU is free on Google Colab.

The purpose of this study is to assess how Florence-2-base and Qwen2-VL-2B are capable to process, comprehend, and explain artworks and to detect objects within the image. We will compare the results obtained with the two models for several artworks, from different styles, time periods and levels of notoriety. After studying and comparing the results obtained, we will be able to attest of the efficiency of these two models, to accomplish different tasks applied to artistic works.

Analysis of Florence-2-base ability to process and describe artworks.

The first model I used to analyze and describe artworks was Florence-2-base. A smaller yet efficient visual language model, Florence was able to identify some of the main objects in an image without any task instruction (and locate them even more easily with the prompt indications).

Florence-2-base was not able to recognize the artwork nor the artist themselves. It also had difficulties to process certain types of objects: it ran into issues to differentiate objects with a similar appearance. It also had trouble defining a particular time period, based on the elements (clothes, architecture etc.) from the painting. In some cases, it easily defined some of the less central elements, while not being able to notice the most central of them at all.

The analysis of The Lady And The Unicorn (circa 1500) by Florence-2-base.

An analysis of the Lady and The Unicorn medieval tapestry successfully defined the main figure as a “medieval woman” and the lion but was not able to detect other elements without being pushed by task instructions.

image/png The results obtained with the “Dense region caption” command on Florence-2.

image/png Florence-2 objects detection results after using the “CAPTION_TO_PHRASE_GROUNDING" command to force it to define more elements from the painting.

The description delivered by Florence-2-base from the Lady and The Unicorn tapestry is the following:

"MORE _DETAILED_CAPTION":’The image is is a painting. The painting is red and blue. There are trees on the left and right side of the painting. There is a woman in a red dress standing in front of a blue tent. The woman is holding a sword in her right hand. On the left side there is a white lion. The lion is standing on its hind legs.’}

While it successfully recognized the main colors and the natural and architectural elements from the tapestry, Florence-2-base seemed to have issues identifying the unicorn (only mentioning the lion) or might have mistaken the unicorn for the lion as it is described as “white” in the description. It is worth mentioning as well that it mentions a “sword” the woman is holding that is not in fact present in the image, maybe confusing it with the hand of the woman reaching for her jewelry box. It completely missed in its description the other woman next to her.

image/png

As we can see in this image prompted by the "CAPTION_TO_PHRASE_GROUNDING" command, in Florence-2-base analysis of the Lady and The Unicorn tapestry, the model especially drew its attention to natural elements (trees) and a large number of “lions” that it used to define almost all the animals present in the picture.

The analysis of Raphael’s The School of Athens by Florence-2-base.

The analysis and description of Raphael’s The School of Athens made by Florence-2-base is quite representative of what the model is capable of. The Italian Renaissance fresco was painted by 1508 and 1512 and it is currently exhibited in the Vatican Apostolic Palace.

While it successfully recognized some elements (especially related to the architecture and decor), it had trouble identifying several objects and seemed to get confused by the large variety of human and human-looking figures, in this case the statues on the columns in the artwork. The model kept on mistaking the human figures with these statues and was not able to identify the name of the figures even with prompt indications (also because there are many characters).

The main focus of Florence-2’s attention was definitely the architectural, inanimate elements, such as the columns and elements of decoration.

In a first attempt at object detection, Florence-2-base defined each character as a “person” (including the statues surrounding the scene).

image/png

In a second attempt and even with instructions Florence-2-base made the same mistake but the other way around, defining almost all the figures, living and inanimate, as “sculptures”.

image/png

Florence-2-base seemed to have an issue to detect abstract and conceptual elements. Like we have just seen, it had to be indicated by prompt the presence of the “unicorn” to define it for the Lady and The Unicorn.

In a similar way, in The School of Athens, the model did not manage to detect the figures of the philosophers and to define them as such. It referred to them in a first attempt as “Greekengers” (when prompted to detect a group of "Greek philosophers assembled”).

When pushed with instructions to define a “group of philosophers”, Florence-2 did not manage to detect the philosophers but instead defined one of the statue as “a group of chimneys” (?).

image/png

It finally successfully defined the category of the members of the group when prompted to describe a “group of people who teach philosophy”.

image/png

The analysis of William Turner’s Rome From Mount Aventine (1835) by Florence-2-base.

I used Florence-2-base to detect objects and describe a more recent figurative work. In this case, I chose a James Mallord William’s Turner painting, Rome from Mount Aventine (1835, Private Collection). In a less crowded image and with more recognizable and clearly defined elements, the model managed to detect objects more successfully.

image/png

Here are the three descriptions Florence-2-base produced on the Turner’s painting, with diverse results from the level of details asked:

With the "CAPTION" command: "A painting of a river with boats in the water".

With the "DETAILED_CAPTION" command: In this image we can see a painting. In the painting there are boats on the water. Also there are buildings. And there is a tree. On the right side there are two persons".

With the "MORE_DETAILED_CAPTION" command: "A painting of a river with boats in the water. There is a tall tree next to the river. There are two people standing on a rock next to a tree".

Interestingly enough, the "DETAILED_CAPTION" produced more content than the "MORE_DETAILED_CAPTION", where the only new details Florence-2-base added were the size of the tree and the mentions of a rock (while forgetting to mention the building again in this description). As we have seen in our other examples, Florence-2-base is definitely more focused on the architectural and natural elements. It failed again to recognize a definite time period or the city represented in the painting (in this case, Rome).

The analysis of Claude Monet Luncheon on the grass (1865-1866) by Florence-2-base.

Following the analysis of a classical Renaissance artwork and a 19th landscape painting, it seemeed interesting to pursue the exploration of Florence-2-base art analytic abilities with the example of an artwork from the Impressionist period: in this case, Luncheon on the grass by Claude Monet.

As we can see in this image, the model had trouble adjusting to the presence of numerous characters. If, for once, it focused more on the figures than on the landscape and natural elements, it still had difficulties to define the characters and the objects around them. This is understandable as the painting is less figurative and the objects are less defined than in the previous examples we used (whether for the Raphael's fresco or the Turner painting.)

image/png

Here is the following descriptions given by the model, with diverse levels of detailing:

With the "CAPTION" command: "A painting of a group of people sitting around a table".

With the "DETAILED_CAPTION" command: "In this image we can see a painting. In the painting there are people. Also there is a table. On the table there are bottles, plates, food items and many other items. In that there are many trees".

With the "MORE_DETAILED_CAPTION": "A painting of people sitting around a table. There is a woman in a white dress sitting in front of a white table cloth. There are bottles on top of the table".

As we can see, Florence-2-base had difficulties analyzing the picnic concept and scene. It seemed to refer to it as a "table". Many repetitions were found throughout the descriptions (such as "In this image we can see a painting. In the painting there are people"). Also in these descriptions, the model had difficulties once again to define the figures, only identifying one or refering to them as a group.

The analysis of Wassily Kandinsky’s Impression III (Concert) (1911) by Florence-2-base.

We used as another example the abstract painting by Wassily Kandinsky Impression III (Concert) from 1911. Applied to this artwork, Florence-2-base artistic analysis was not very successful.

image/png

In its description, the model managed to identify shapes and the right colors but mistakenly defined the figures as a single “flower” and then as a group of “flowers”, instead of what they really are: a very abstract group of human figures.

With the "MORE_DETAILED_CAPTION" command:"A painting is done in black, white, blue, red and yellow. The painting has a black outline on a white background. There are black dots on the white face of the painting. On the left side there is a red, white and black flower. The flowers are in various colors. There is a circle on the yellow background".

Florence-2-base did not manage to detect objects easily in this case of an abstract painting,. With the "DENSE_REGION_CAPTION" command, it defined the entire image as a painting, without detecting any other elements.

image/png

Conclusion of this study of Florence-2-base abilities to process and analyze artworks.

The artworks analysis we made with Florence-2-base lead to this conclusion: in general, the model is efficient to identify objects but there are mistakes (even when guided by instructions) and issues to identify figures, places or time periods. The model had difficulties to differentiate objects when they looked similar, for example it was not able to tell the difference between a person and a statue. It also had trouble recognizing figures and categorizing them, especially when there were too many of them.

There are limits to the model’s interpretations, especially with less figurative and abstract overcrowded images. I noticed that the more the items were described specifically in the instructions, the more Florence-2-base got lost to define the objects. The results are good to have a general idea of a setting, a short description, or to detect certain types of objects (in particular architectural and natural elements, animals etc.). But when asked by prompt to define and differentiate more extensively described objects and figures, it led to mixed, confused and incorrect final results, even with spatial indications.

Analysis of Qwen2-VL-2B ability to process and describe artworks.

I followed the artwork analysis with Florence-2-base by another, this time with Qwen2-VL-2B. The goal was to see how the model managed to detect elements, analyze and explain artworks and in some extent assess how it fared in artworks’ analysis as compared to Florence-2-base.

Qwen2-VL-2B delivered impressive results to analyze and explain the paintings, especially the most famous ones in the selection. In general there were minimal or even no instructions on my part. The model was fully able to produce a description with the correct time period, place or subject.

The analysis of Raphael’s The School of Athens by Qwen2-VL-2B.

After analyzing the Raphael’s The School of Athens fresco with Florence-2-base, it seemed necessary to follow with an analysis of the very same artwork by Qwen2-VL-2B, in order to see the differences in the art analysis delivered by these two models.

When asked to explain the image, without giving any instructions, the model produced the following long description.

image/png

Qwen2-VL-2B identified a correct general time period (“from the ancient Greek or Roman period”), used a large descriptive vocabulary in keeping with the terms usually used in art history descriptions (“foreground…middle ground…background…The architectural elements, such as the columns and arches, add to the grandeur of the scene…the painting is a detailed and intricate representation of a classical setting…a grand and ornate architectural backdrop”). It also tried to identify different figures and successfully differentiated them from the sculptures, as opposed to Florence-2-base. While the model noticed the “classical attire” of the characters, it still did not manage to define them as philosophers, instead referring to their activity as: “engaged in a discussion…a meeting or a council, with the figures appearing to be in a deep conversation or debate… some of whom are engaged in writing or reading… The figures are engaged in various activities, including conversation, writing, and possibly performing a ritual or ceremony… the overall atmosphere of the painting is one of intellectual and communal activity…”. It seems as of today, neither Florence-2-base nor Qwen2-VL-2B were able to define what a philosopher is.

image/png

While it made an accurate, general description Qwen2-VL-2B did start to go in a loop, for example when describing the activity the group was doing. Similar to Florence-2-base, it put a major focus on architectural elements. It is interesting to see how the model tried to give its own interpretation of the artwork.

The analysis of Wassily Kandinsky’s Pastorale (1911) by Qwen2-VL-2B.

The next artwork I used as example to assess Qwen2-VL-2B art analysis abilities was an abstract painting. In this case, the painting was Wassily Kandinsky’s Pastorale (1911), from the Solomon R. Guggenheim Museum in New York.

image/png

In its second attempt at describing a painting, and this time from a very different style, genre and period, Qwen2-VL-2B managed to successfully identify the artist (without any indications on my part). On the contrary, it wasn’t able to deliver an accurate description and interpretation of the painting. It was indeed identified as a painting named Flamingo (a subject Kandinsky has never tried to paint as far as my research go).

image/png

Interestingly enough, the description made by Qwen2-VL of the Kandinsky painting is based on a central flamingo figure from which the model went as far as giving the painting its title: “The central figure is a flamingo, which is depicted with a blue and green body and a pink head. The flamingo is surrounded by […] including a blue and green bird, a pink and white pink, and a blue and a green flower”. The model missed out on the group of four human figures, occupying the major part of the artwork, as well as figures of animals, trees and landscape in the background. It defined the flamingo based on a mix of colorful elements between the grass, the water and a pink shape, that it then re-used to define the birds surrounding this “flamingo”.

While Qwen2-VL-2B was very successful at identifying the artist of this painting (and surprisingly so!), it has to be said it didn’t proceed to deliver a very accurate description of the artwork, missing out on the main figures and finding the meaning and title of the painting from background elements mixed together without making any sense.

The analysis of Claude Monet Luncheon on the Grass (1865-1866) by Qwen2-VL-2B.

The third attempt at painting analysis with Qwen2-VL was made with a less abstract painting. Still, its more Impressionistic style led to confusion for the model in order to define the objects. For Claude Monet’s Luncheon on the Grass, that was already analyzed by Florence-2-base, the description given by Qwen2-VL-2B of the artwork was extensive and overall accurate, defining the right time period as the late 19th century (apparently based on the clothing style) and correctly describing the various figures in the painting. It did, although, make a mistake by indicating the presence of a “child” — in fact, nowhere to be seen on the actual painting.

image/png

The figures in the Monet painting being less defined and identifiable (even if not as abstract as in the Kandinsky example), they created a bit of challenge for Qwen2-VL-2B in its description. While most of it is quite relevant: “suggesting that the scene might be set in the late 19th century. The overall atmosphere is relaxed and leisurely, with the group enjoying a picnic in a serene forest setting”, it is worth noticing that after correctly describing the tablecloth at the first as a “tablecloth spread out on the ground” for the picnic on the grass, the model then described it as a “table”: “On the table… There are two people sitting at the table… Another person is standing near the table”.

image/png

Once again, Qwen2-VL-2B focused its description on natural elements: “a natural, outdoor seating with lush green trees and foliage in the background”.

The analysis of Claude Monet Impression, Sunrise (1872) and Vincent Van Gogh The Starry Night by Qwen2-VL-2B.

When tasked to identify the most famous artworks of artists with a high notoriety, Qwen2-VL-2B was extremely successful at instantly recognizing them. It completely identified Vincent Van Gogh’s The Starry Night (1889, MoMA) and Monet’s Impression, Sunrise (1872, Musée Marmottan) without any instructions. The model managed to identify both the paintings and the painters names, which might have been thanks to the signature or by the fact these are two commonly well-known art pieces, but the descriptions in the results were still impressive. Both were accurate, concise and well-written, in the usual style for an art piece description, and even categorized the artworks in their artists' career.

image/png

For the Monet’s masterpiece, the model even correctly managed to identify the movement to which it belonged, without any instructions: “The painting is characterized by its loose, impressionistic style, which captures the fleeting effects of light and color in nature. The use of bright, contrasting colors and the use of brushstrokes to create a sense of movement and energy are prominent features of Monet’s painting…it is considered one of Monet’s most iconic works”.

image/png

For Vincent Van Gogh's The Starry Night, the model succeeded in identifying the artwork, the artist as well as replacing the painting in the context of Van Gogh's career and assess its impact in art history.

image/jpeg

As shown in the description, Qwen2-VL-2B successfully recognized the subject: "the painting depicts a starry night sky filled with sizzling, luminous, stars, creating a sense of motion and dynamism". The model stresses numerous times the presence of "swirling" figures, the contrast of light/dark in the painting and the twisted shapes, in a repetitive manner. It did succeed in delivering a convincing analysis of the artwork: "dynamism. The sky is dominated by a large, bright yellow star, which stands out against the dark blue background... In the foreground, there is a tall, dark tree with a gnarled trunk and a curved, twisted shape... Below the tree, there is a small village with a church steeple". What is very interesting is to realize Qwen2-VL abilities to situate the artwork not only in Van Gogh's production, but in art history itself: "Overall, "Starry Night" is a masterpiece of Van Gogh's style, and it is widely regarded as one of the most iconic and influential paintings in the history of art".

image/png

As for Monet's Impression, Sunrise analysis, it has to be recalled that Qwen2-VL-2B offered this complete description without any instructions, indications or suggestions on my part.

The analysis of Claude Monet San Giorgio Maggiore at Dusk (1908-1912) by Qwen2-VL-2B.

For our next example to analyze another painting by Claude Monet (this time later in his career) with Qwen2-VL-2B, the artwork chosen was San Giorgio Maggiore at Dusk (1908-1912). In this case, Qwen2-VL was not able to identify either the painter, the painting or the setting of Venice.

image/png

The model delivered a brief, concise yet descriptive explanation of the painting with an eloquent style: “The sky is painted in warm hues of orange and yellow, transitioning into darker shades of blue and purple as it descends”. Even if it did not identify the general setting in Venice or Italy (“situated on a hill or a raised platform. The foreground features a body of water, likely a river or a lake, reflecting the colors of the sky and the building”), the model did manage to recognize the building as a church/cathedral — and this rightly so: “The building appears to be a church or a cathedral”.

image/png

If the aesthetic description and analysis was very good, the model did not’t succeed in this example to correctly identify the main elements of the paintings, nor the name of the artist.

Conclusion of this study of Qwen2-VL-2B abilities to process and analyze artworks.

Generally speaking, Qwen2-VL-2B proved to be very successful at identifying the painting’s subject, as well as the name of the artist. It also managed to recognize the name of the most famous paintings, without any instructions and information given on my part (apart from asking it to "Explain the image”). It was able to link artworks to a particular movement. On the contrary, it was not as efficient to recognize lesser known paintings or ones that were less representative of an artist’s style.

The results obtained to describe figurative paintings were usually much more accurate than in the case of abstract paintings. For the Impressionist and Post-Impressionist art paintings, the results were mixed: extremely relevant and accurate in the case of the most famous artworks, in this case Monet's Impression, Sunrise and Van Gogh's The Starry Night ; a blend of accurate elements and errors or lack of object detections for the other two Monet paintings.

The descriptions of architectural and natural elements were usually the most relevant, also as showed in the results of the analysis by Florence-2-base. In the case of Qwen2-VL-2B, the objects were usually very well detected (church, building, figures…), even with some minors mistakes. It also created some elements that were not present in the painting out of thin air. In its descriptions, the model used the terms and style commonly used for artworks analysis in art history, which is impressive.

Conclusion

To conclude, the best results overall to analyze, describe and explain artworks were obtained by Qwen2-VL-2B, which is not surprising as it a larger-scale model, compared to Florence-2-base, and with different goals in visual analysis. There were still mistakes in Qwen2-VL-2B descriptions: the model had difficulties with some languages (for example, in French). It also would go in a loop when it had troubles deciphering and defining some elements. In any case, it is definitely impressive in its abilities to identify an artist, a painting, an artistic movement, a definite time period and the subject of the artwork, especially for the paintings that are more figurative. The style used by Qwen2-VL is also as accurate to describe artworks as the style used in art history studies.

Florence-2-base is successful at attaining its goal to be a model trained to detect and pinpoint the main objects in an image. It successfully defined elements, without any instructions, especially the ones related to nature and architecture, and this more successfully in less crowded scenes. While it managed to recognize animals than human figures more easily, it did have difficulties in both cases when the figures were similar-looking, doing the same activity or were standing and sitting close together.

Florence-2-base had issues to differentiate inanimate objects with human features — such as statues — and human figures, and this even if pushed through prompts instructions. It was not able to acknowledge certain types of characteristics (such as the time period) and activities (for example to define an occupation such as the “philosophers” in the Raphael's fresco). The results were generally better with the paintings that were more figurative and with more defined shapes than with abstract art. That being said, Florence-2-base might benefit from fine tuning to specifically process works of art, in order to avoid issues such as the ones encountered during this study.

As expected, Qwen2-VL-2B managed longer and more complete descriptions of the artworks. Florence-2-B proved to be a relevant visual model, in its abilities to detect objects and successfully describe the artworks, even if more succintly. It remains to see how both models would compare with their other versions, such as Florence-2-large and Qwen2-VL-7B, and how will they evolve with time to achieve more extensive, better, zero-shot results.

In the course of this study, I also had the opportunity to process images containing only elements of text: while there was in the selection of the paintings analyzed by Florence-2-base and Qwen2-VL parts containing text (signatures etc.), I also used as examples some images presenting only text. In this case, I used handwritten letters in French and English. While the results were definitely more successful to process English letters, there were still difficulties to explain their content and decipher the handwriting, especially when it was not very clear, leading to loop issues. In the case of Florence-2-base, the results did not even manage to understand the French words at all, leading to an incoherent sentence in French based English-sounding words. Qwen2-VL-2B also had trouble understanding the letter in French and only managed to comprehend a part of it.

Thus, it would be interesting for a next study to compare the results by Florence-2 and Qwen2-VL to analyze and process images containing text, also with CVAT.it. The goal would be to see how the models will manage to transcribe, decipher and explain images solely containing written text, whether handwritten or printed, with OCR processing. This will be the subject for my next article.

FAQ

How do Florence-2-base and Qwen2-VL-2B compare when analyzing artworks?

Florence-2-base and Qwen2-VL-2B are both successful to analyze artworks, detect the main objects/subjects and deliver relevant descriptions, on a variety of painting genres. As a larger-scale model, Qwen2-VL-2B has more abilities to deliver longer descriptions and it was also able to identify the name of some of the artists and paintings without any instructions, just based on the artworks. Florence-2-base did also manage to successfully define the main objects in an image, especially for the most figurative ones. Qwen2-VL-2B was extremely accurate at identifying the most famous Impressionist and post-Impressionist paintings used in this study but generated mixed results for the other Impressionist, lesser famous artworks, that it was not even able to identify. Both models encountered problems with abstract art.

What are the main issues encountered?

The main issues encountered during this study were to define some objects in particular. Even pushed by prompts and instructions, Florence-2-base and Qwen2-VL-2B were not able to successfully define certain elements in the images. The main problems were linked to abstract and less figurative artworks. In the case of the analysis led by Florence-2-base, it also had difficulties to understand and define some objects and concepts in particular or to set a definite time period and place based on the element in the paintings.

Which types of painting had more successful results? And which had more difficulties to be processed by Florence-2-base and Qwen2-VL-2B?

It is clear that in the precise case of art analysis, the more successful results to process and explain artworks by Florence-2-base and Qwen2-VL were obtained with figurative, classical paintings. While it managed to encounter difficulties in paintings with a subject and figures less easy to understand (in more crowded scenes for examples), the results were much more accurate with figurative art with both models. In the case of impressionist and abstract art, the results were not as relevant, especially with Florence-2-base to analyze the subject. Qwen2-VL managed to fare better in this category and even identified the name of the artists and paintings from these movements when they were especially well-known artworks but there were still a few problems, especially for the lesser-known paintings (lack of accuracy for the main subject for instance).

What is there to say about using Florence-2-base and Qwen2-VL to analyze text in images?

Florence-2-base, especially fine-tuned, and Qwen2-VL have good capacities and a lot of potential to process, analyze and explain text within an image as well. A few tests on letters and handwritten documents have shown that while there are still some problems to decipher certain kinds of text, the models are able to successfully explain them. This will be the subject of my next article. Stay tuned.

Bibilography

  1. Bin Xiao, “Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks”, in CVPR 2024, June 2024, on: “https://www.microsoft.com/en-us/research/publication/florence-2-advancing-a-unified-representation-for-a-variety-of-vision-tasks/"
  2. Qwen2-VL page of presentation, GitHub (https://github.com/QwenLM/Qwen2-VL)
  3. Maître d’Anne de Bretagne (aka Jean d’Ypres), The Lady and The Unicorn (circa 1500), Musée de Cluny, Paris
  4. Raphael (1483-1520), The School of Athens (1509-1511), Apostolic Palace, Vatican City
  5. James Mallord William Turner (1775-1851), Rome from Mount Aventine (1835), Private Collection
  6. Claude Monet (1840-1926), Luncheon on the grass (1865-1866), Musée d’Orsay, Paris
  7. Claude Monet (1840-1926), Impression, Sunrise (1872), Musée Marmottan, Paris
  8. Claude Monet (1840-1926), San Giorgio Maggiore at Dusk (1908-1912), National Museum of Cardiff, Cardiff
  9. Vincent Van Gogh (1853-1890), The Starry Night (1889), MoMa, New York
  10. Wassily Kandinsky (1866-1944), Impression III (Concert) (1911), Lenbachhaus, Munich
  11. Wassily Kandinsky (1866-1944), Pastorale (1911), Solomon R. Guggenheim Museum, New York