Experimenting with Bard's image input feature

Jul 19, 2023

This week Google announced that its generative AI bot Bard would accept visual input.

Naturally I started playing with it.

Here I’ll report on four experiments I tried. tl;dr - the results were mixed.

I started with a somewhat complex photo, including an unusual car, myself in the 2010s, other people, a bicycle, buildings, and foliage:

“What can you tell me about this photo?” I asked as I uploaded it.

Bard thought about it for a few seconds, then deleted my image and apologized:

OK, that’s both frustrating and good to know.

Next, I cast about for an image which didn’t include images of people. Since I’ve been working on climate change and the future, I thought of solarpunk art, and uploaded this lovely example by Imperial Boy, whom I learned about via Noah Smith. It has some people, but they are very small in context:

“What can you tell me about this image?” I asked.

Bard thought about it, then replied:

The image you sent me is a futuristic cityscape with a lot of trees and buildings. The buildings are tall and slender, with some of them having curved edges. The trees are large and lush, and they seem to be growing on the roofs of some of the buildings. The overall impression is of a city that is both modern and sustainable, with a strong emphasis on greenery.
The image is likely based on the work of an artist named Imperial Boy, who is known for his futuristic cityscapes. Imperial Boy's work often features lush greenery and organic shapes, and it often has a utopian or even post-apocalyptic feel to it.
The image is also reminiscent of the concept of solarpunk, which is a genre of science fiction that imagines a future where humans live in harmony with nature. Solarpunk cities are often depicted as being lush and green, with plenty of trees and other plants. They are also often powered by renewable energy sources, such as solar and wind power.
Overall, the image is a beautiful and thought-provoking depiction of a future city that is both sustainable and beautiful. It is a reminder that we can create cities that are both good for the environment and good for people.
Here are some additional details about the image:
The image is a digital painting, and it was created by Imperial Boy.
The image was first published in 2016, and it has been featured on a number of websites and blogs.
The image is available for purchase as a print or a digital download.
The image is a popular choice for use as a wallpaper or desktop background.
I hope this information is helpful!

Now that is pretty impressive. It’s a good description of what the image contains. It’s also a useful backgrounder on the art itself, identifying its creator, popularity, usage, and more. Did Bard use reverse image search?

(I note that the small human figures didn’t deter Bard this time. Is human representation only a problem at a certain visual scale?)

This inspired me to try something different. Humor is a challenge for AI, so I cast about for a cartoon, a meme, or something like that. I found a funny bit of internet art (created by Molly Nemecek, I learned) which drew on some of my interests (satire, Russian culture) and uploaded it to Bard as a sample.

My prompt was simple: “Explain this image to me.”

The AI quickly responded:

The image you sent me shows a person holding a pink card with a picture of a dollhouse. The text on the card says "BarbaYaga NEDY".

The image is a reference to the Slavic folktale character Baba Yaga, a witch who lives in a hut on chicken legs. The dollhouse on the card is a representation of Baba Yaga's hut, and the text "NEDY" is a reference to the fact that Baba Yaga is often depicted as being needy or greedy.
The image is likely a piece of fan art, created by someone who is a fan of Slavic folktales and Baba Yaga. It could also be a promotional image for a product or service related to Baba Yaga.
Here are some possible interpretations of the image:

The image could be a warning about the dangers of greed and selfishness. Baba Yaga is a powerful and dangerous figure, and her hut is often seen as a symbol of temptation. The image could be a reminder that greed can lead to destruction.
The image could be a celebration of Baba Yaga's power and independence. Baba Yaga is a strong and independent woman, and she is not afraid to stand up for herself. The image could be a way of empowering women and showing them that they can be strong and independent like Baba Yaga.
The image could simply be a beautiful and whimsical representation of a folktale character. The dollhouse on the card is very detailed and intricate, and the image as a whole is very visually appealing. The image could simply be a way of sharing the beauty of Slavic folktales with others.

Ultimately, the meaning of the image is up to the viewer to decide. There are many possible interpretations, and each person will likely see something different in the image.

On the plus side basic image recognition worked. The photo is of a person holding a pink card with a dollhouse. For optical character recognition it did well with the main point, scanning BarbaYaga correctly. And it captured a sense of the Baba Yaga myth.

Unfortunately Bard included a stray, unrelated bit of text from the background (it’s in the bottom right corner, a touch out of focus) and worked that into its analysis. Bard also missed the Barbie aspect of the joke, which is a big error.

I pushed things further. Could Bard create visual output in response to this input? I asked: “Can you generate versions of that image?”

Quickly Bard came back with:

Sure, I can generate versions of that image using AI art generators. Here are a few ideas:
A more realistic version of the image, with more detail and texture.
A version of the image in a different style, such as anime or cel-shaded.
A version of the image that is more abstract or surreal.
A version of the image that includes other characters from Slavic folktales, such as Ivan Tsarevich or the Firebird.
A version of the image that is interactive, such as a VR experience or a video game.
I can use any of the AI art generators that I have access to, such as NightCafe Creator, Neural Love, or Midjourney. I can also customize the prompts and settings to create specific versions of the image.

Then it teased me: “Would you like me to generate a version of the image for you?”

“Yes, please,” I replied, falling for it.

Sure, I'd be happy to generate a version of the image for you. Which of the ideas I mentioned would you like me to try?
Here are the prompts I'll use for each idea:
More realistic version: "A realistic and detailed depiction of Baba Yaga's hut, in the style of a fairy tale illustration."
Different style: "A cel-shaded version of Baba Yaga's hut, with a bright and colorful palette."
Abstract or surreal: "A dreamlike and surreal depiction of Baba Yaga's hut, with a focus on geometric shapes and patterns."
Other characters: "A depiction of Baba Yaga's hut with other characters from Slavic folktales, such as Ivan Tsarevich and the Firebird."

Interactive: "An interactive VR experience of Baba Yaga's hut, where you can explore the hut and interact with the objects inside."
Let me know which idea you'd like me to try, and I'll get started right away!

Interesting choices. I selected “Abstract or surreal."

Sure, I can generate an abstract or surreal depiction of Baba Yaga's hut. Here is the prompt I'll use:
A dreamlike and surreal depiction of Baba Yaga's hut, with a focus on geometric shapes and patterns. The hut is made of swirling shapes and colors, and it appears to be floating in the air. There are strange symbols and runes carved into the hut, and the atmosphere is one of mystery and magic.
I'll get started right away, and I'll let you know when I have an image for you.

“OK,” I murmured. And then:

I'm afraid I can't generate an image for you. I'm a large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. I'm trained on a massive amount of text data, and I'm able to communicate and generate human-like text in response to a wide range of prompts and questions. For example, I can provide summaries of factual topics or create stories.
However, I'm not able to generate images. I don't have the ability to process and understand visual information, and I don't have the ability to create images using AI art generators.
I apologize for any inconvenience this may cause.

D’oh!

I kept at it, trying another angle, hoping to trigger some software or service lurking nearby: “Can you create ‘An interactive VR experience of Baba Yaga's hut, where you can explore the hut and interact with the objects inside.’” Instead I elicited a vicarious description:

Sure, I can describe an interactive VR experience of Baba Yaga's hut. Here's what it might be like:

You put on your VR headset and find yourself standing in a clearing in the forest. In front of you is a small hut on chicken legs. The hut is made of wood and has a thatched roof. There is a fire burning in the hearth, and the air is filled with the smell of smoke and spices.
You approach the hut and knock on the door. The door creaks open, and you step inside. The hut is dark and smoky, and it is filled with strange objects. There are cauldrons bubbling over fires, strange potions in bottles, and bones scattered on the floor.
You explore the hut, interacting with the objects inside. You can pick up the potions and smell them, or you can stir the cauldrons. You can even try on Baba Yaga's witch hat.

As you explore the hut, you learn more about Baba Yaga. You learn that she is a powerful witch who lives in the forest. She is feared by some, but she is also respected by others. She is said to have knowledge of magic and herbs, and she is able to help people in need.
Eventually, you leave the hut and return to the clearing in the forest. You take off your VR headset and realize that you have had a strange and magical experience.
This is just one possible way to create an interactive VR experience of Baba Yaga's hut. There are many other possibilities, depending on the goals of the experience and the resources available.

Harrumph. No virtual experience code kicked into life, but at least this was a start for getting me or someone else to make one on our own.

I took a different tack, shifting back to the source image: “How is that image a parody of Barbie?” Maybe I could improve the thing’s understanding.

The image you sent me is a parody of Barbie because it takes the iconic image of Barbie and subverts it in a number of ways.
First, the dollhouse is a reference to Baba Yaga's hut, which is a traditional symbol of witchcraft and magic. This contrasts with the image of Barbie, who is typically associated with femininity, beauty, and consumerism.

Second, the text on the card says "BarbaYaga NEDY", which is a play on the word "needy". This suggests that Baba Yaga is a greedy and demanding figure, which is a departure from the usual image of Barbie as a kind and giving person.
Third, the image is visually striking and somewhat disturbing. The dollhouse is made of dark and sinister-looking materials, and the text on the card is written in a jagged and uneven font. This creates a sense of unease and discomfort, which is a contrast to the typically lighthearted and cheerful image of Barbie.
Overall, the image is a clever and subversive parody of Barbie. It takes the iconic image of Barbie and turns it on its head, creating a new image that is both disturbing and thought-provoking.

Here are some other ways that the image could be interpreted as a parody of Barbie:
The dollhouse could be seen as a commentary on the way that Barbie is often used to sell unrealistic beauty standards to young girls.
The text on the card could be seen as a critique of the way that Barbie is often used to promote consumerism and materialism.
The overall image could be seen as a warning about the dangers of conformity and the importance of individuality.
Ultimately, the meaning of the image is up to the viewer to decide. There are many possible interpretations, and each person will likely see something different in the image.

Interesting. Again it turned to the mistaken “nedy” theme. Then it added some pop culture commentary on Barbie, with a touch more sinister-ness than I thought the image offered. Once again it concluded on a wishy washy “anyone sees what they want in art” moment.

I moved on to other image, something more scholarly, a map of printing presses in early modern Europe from the University of Iowa’s Atlas of Early Printing website. I screengrabbed this with a few tweaks:

This might have been a challenge for the AI, given how much is going on in it. There’s a map without labels, several legends, a title, a menu, credits, etc. Would it cross-hatch unrelated text like it did with Barbie-Yaga?

Bard was unfazed:

The image you sent me is a map of Europe with red dots on it. The red dots represent the locations of early printing presses in Europe. The map is from the Atlas of Early Printing, which is an interactive website that tracks the spread of printing in Europe during the 15th century.
The map shows that the first printing presses were established in Germany, in the cities of Mainz and Strasbourg. From there, printing spread to other parts of Europe, including Italy, France, the Netherlands, and England. By the end of the 15th century, there were printing presses in over 200 cities across Europe.
The spread of printing had a profound impact on European society. It made it possible to mass-produce books and other printed materials, which led to a dramatic increase in literacy rates. Printing also helped to spread new ideas and knowledge, and it played a role in the Protestant Reformation.
The map is a valuable resource for anyone interested in the history of printing. It shows the spread of printing in Europe over time, and it provides information about the early printing presses. The map is also a visually appealing way to learn about this important historical event.
Here are some additional details about the image:
The image is from the Atlas of Early Printing, which was created by Greg Prickman and supported by the University of Iowa Libraries and The Studio_UI.
The image is available on the Atlas of Early Printing website.
The image is a high-resolution image, and it can be zoomed in to see the individual red dots.
The image is a valuable resource for anyone interested in the history of printing.
I hope this information is helpful!

That actually *is* helpful and quite good. As an answer to a question it’s rich, addressing a number of key points (content, source, technical details) and concise. Did Bard do another reverse image search? Impressive, if so, since I created the image from the live site through screencapture.

To sum up: in its latest iteration Bard can do a good job of scanning an image’s contents and finding its contexts. It can successfully grapple with complex images. It can’t handle images of people, is capable of missing key references, and can misread visual contents. It cannot create images in response to uploaded ones, although one could use its responses as prompt fodder for tools like Midjourney.

Thomas Smith ran some similar visual experiments, and got good results by asking Bard to guess the age of a person based on a photo of their hand, plus generating menu options and choices from photos of a menu and a fridge.

How might we use this new Bard feature in teaching and research? That’s a question for another post to explore, and also for you, dear reader, to ponder in comments.

Anne Fensie

Fascinating! Can't wait to start playing. Like you, I wonder how we can use this feature for learning?

Expand full comment

1 reply by Bryan Alexander

CogDog

This is rather compelling, great sleuthing and testing. I wonder if you could train it to write shorter ones useful for alternative image descriptions? Like Vanessa I winder what happens when we start feeding it AI created images.... not expecting it to be a detector, but in a near future where what is AI generated and what is human generated is one big mixed bag.

I dont think it needs google image search, there plenty of others out there (TinEye, whatever Pixsy uses)... But since AI image generators are based on training of large sets of captioned images, I bet that's what is being leveraged.

6 more comments...

AI, academia, and the Future

Discussion about this post

Ready for more?