As generative AI grows, more uses appear. Beyond text and images, other domains for creation pop up as companies, groups, and individuals try out new functions.
Today’s case in point is AI Comic Factory from Hugging Face.* This service will generate a couple of comic book pages based on your text input. It gives you choices of comic style and page layout as well.
Note: what follows is a very image-heavy post.
As a longtime comics reader, and also comics teacher in some classes, I was excited and wanted to try it out. My first experiment was inspired by Tom Haymes, who pointed Comic Factory out to me. I tried to envision a comics subject, then asked the app to imagine “My friend Tom visits the solar pirates.” The app churned away, then emitted these images:
It’s an impressive result, at least formally and visually. We get comics panels which we can follow, tracing the development of ideas across interstitial gutters. Visuals cover a lot of ground, from vistas to individual faces.
I like the visuals, which do riff on “solar pirates” with artifacts from a range of time. I didn’t specify anything about “Tom,” but any of these men can stand in. Alas, and all too typically for today’s AI, the dialog box text is just a mess.
Very intrigued now, I threw my mind back into a lifetime of reading comics to come up with a new prompt. The result: “The Zeppelin crew visits the haunted fortress”:
There isn’t much of a Zeppelin and the fortress looks pretty comfy for a military installation, but maybe this works as a kind of interstitial page between bigger scenes.
I turned to humor, next, and asked the thing to imagine “Hello Kitty fights Galactus”. (If you don’t recognize the latter term, Galactus is a Marvel comics god-like entity. Dead serious and very epic supervillain.)
That works well for me.
Maybe space was where the app shines. I tried this idea: “Astronauts visit an abandoned Soviet space station”:
That does look like an abandoned space station all right. But where are the Soviets? And where the astronauts?
I tried some prompt engineering, this time asking the app to envision the theme in the style of controversial comic artist Bill Sienkiewicz :
Better, at least with some astronauts this time. Still nothing Soviet, and definitely not in Sienkiewicz’s distinct style.
All right, enough play for the moment. How can we use AI Comic Factory for educational purposes?
To explore this question I turned to one of my usual curricular examples, the French Revolution. I researched that period a great deal in grad school and have taught it several times in college classes, so it’s one I can put through its paces. I asked the app to imagine important scenes from the revolution, and this time I worked the prompt through each style on offer.
First up: Japanese.
Hm. The results are a bit generic. No precise scene appears, like the guillotining of a king, but the scenes aren’t out of place. Architecture and clothing are right more than not. Insurgent crowds is an apt theme. There’s a bit of manga style in some of the faces.
(Why does black and white mean Japanese?)
Next I tried Nihonga style:
This is pretty far removed from 1789 et seq, but you can glimpse crowds in revolt at least.
Over to the next style, Franco-Belgian:
This seems more accurate, with appropriate colors, a map nearly right, suitable buildings, and even a stray British soldier from the time.
The next style is American modern:
Closer and closer, curiously. There’s a near-guillotine in one panel, and a couple we might infer as Marie Antoinette and Louis XVI.
Trying the American 1950s style:
Pretty similar in terms of content, and there’s a tricolor flag at last.
Over to “Haddock” - i.e., Tintin?
There’s a martial theme for part of this, suggesting Waterloo, perhaps. The bottom images do echo Tintin.
Now, “3d Render” style is quite different:
This focuses on clothing, furniture, and interior design quite nicely, although I wonder about the scissors that gentlemen is holding so portentiously.
Some styles didn’t work, like Klimt, which is a good riff on that Viennese artist, but nothing more:
What do these experiments tell us about using AI Comic Factory in education?
I think the main use so far would be for creative multimedia assignments. Students and faculty can churn out sequential narrative (the great Scott McCloud’s term) easily, and then build from those results into something else. They can use a graphics app (Canva, Photoshop, etc) to modify Comic Factory output. Alternatively, they can use it for some storyboarding exploration.
I’m not sure about using the thing for visualization of academic content. Perhaps a very detailed prompt could yield comic book versions of, say, hurricane formation or childhood development. Right now it looks like it produces fodder we have to repurpose.
Plus we can use Comic Factory to describe AI in education, at least for fun. Here’s one set I coaxed out with the prompt “University professor teaching with AI in class”:
I couldn’t resist playing with this. Instead of “University” I used “College” for another round:
Back to the overall question: if Hugging Face keeps developing Comics Factory and making it better, it might become more effective as an educational tool.
Perhaps its existence indicates other developers at work along the same lines. We might expect competitors from east Asia, given the enormous prominence of sequential art in those nations. Maybe a European team will channel the spirit of Tintin or Métal hurlant. Or an American giant will emit its own version as part of its enormous suite of offerings. Google does have a fondness for cartoons, after all.
*I started writing this post a week ago, noodling away bit by bit, day after day. Alas, it looks like Comics Factory is down today. The site only yields a glum “Internal Server Error” now. I don’t know the story. Perhaps it’s just a temporary glitch or outage and the app will return in graphic novel glory shortly. Or maybe it’s done for now, and this post serves as a historical record of how it once worked.
EDITED TO ADD: it seems to be back up now (September 18, 8:30 am EST.
Evidence: “Cats as technological superheroes”
(thanks to Tom Haymes for the pointer)
Bit of a breakthrough here...
"“DALL·E 3 can accurately represent a scene with specific objects and the relationships between them.”
Neither Midjourney nor Stable Diffusion allow you to do this—solitary characters and objects are easy and the quality is high, but scenes where different objects have to follow specific relationships described in the prompt? That was an unsolved challenge.
Sam Altman predicted a while ago that prompt engineering was a temporary phase of generative AI. I agreed back then but argued that it could take a lot of time to get the models to the point where we wouldn't need to translate our ideas into a language they could understand. It seems that milestone, at least for image generation models, has been achieved.
This means that the entry barriers that somewhat “gatekept” the ability to create amazing images with AI are being demolished fast. Visual creativity is being democratized."
https://thealgorithmicbridge.substack.com/p/openai-has-just-killed-prompt-engineering
I am a Tintin fanatic and so the fact that it picked Waterloo (?) may have to do with Brussels (where Hergé was from)? Also, yes, the bottom images reflect Hergé's "ligne claire" signature drawing style - clear black lines, not too worried about realism or perspective for the characters themselves.