Get better results from DALLE3 by using narrative detail

A few people said they found my DALLE3 process notes useful, so I thought I'd share another tip here. This one has an interesting overlap with narrative writing.

It's essentially how to get DALLE to go from this:

A set of four cookie-cutter conspiracists with flashlights climb over a fence away from an observatory

To this:

An older man with a sports cap and wrinkled coat aims a flashlight into the wide eyes of a furry alien hiding in a bush. In the distance a satellite dish points up at the sky.

In my experience, students tend to either under or over-describe in stories; they'll either provide no detail or a wall of adjectives.

In narrative writing, we often discuss specific or compelling details—the little things that make a character, place, or action seem real.

The same idea applies when working with DALLE3:

ChatGPT will write your DALLE prompts, but it tends to use abstractions and generalisations which can create overly cliched or cookie-cutter images.
You can get better results by directing ChatGPT to use specific, compelling details.

"Conspiracy theorist" example

For example, when working on our Cartoonish images, I asked ChatGPT for a list of story beats based on cute aliens crash-landing near an observatory.

ChatGPT suggested we show some conspiracy theorists breaking into the observatory where the aliens are being sheltered by astronomers.

Great idea, ChatGPT!

Unfortunately, here was the first result:

A set of four images showing nearly identical dark silhouetted figures with flashlights breaking in or out of an observatory.

The main problem is that the conspiracy theorists look like they've been clone stamped.

How do we art-direct ChatGPT & DALLE to get a better result?

Diagnosing the problem

First, let's take a look at the prompt that ChatGPT wrote for DALLE:

Prompt: Naive illustration style showing conspiracy theorists in dark outfits, equipped with flashlights, climbing over a fence into the observatory grounds. The moon overhead casts long shadows, and the observatory stands as a tall silhouette in the backdrop.

To me, it's immediately obvious that the description "conspiracy theorists in dark outfits" is too general.

Getting more specific

So let's tell DALLE to be more specific:

This gets the following result:

A set of four images of various people sneaking, taking photos, and whispering in the grounds of an observatory. They have various clothing styles such as tie dye, cowboy hats, Hawaiian shirts, tweed suits.

Much better, but the wrong art style: these characters look too adult for our cartoonish theme.

Tuning the art style

We need a more naive style:

Instruction: That's pretty good. Also emphasise that this is a cute, naive children's book style. I wouldn't mind seeing these characters be a little less naturalistic.

This leads us to:

A set of four cartoony conspirators sneaking around the observatory, with a variety of clothes and props. An absurdly large magnifying glass; a purple beret; a notebook and pencil; an old style audio recording device.

These are looking pretty good! Each image shows a distinctive character clearly snooping around. (I've actually skipped a step where I suggested that DALLE give every character a hat and sunglasses even though it's night.)

However, it still feels to me like there is something missing. They almost look too good to be conspiracy theorists in my mind.

What detail would make them seem more unhinged?

"Anxious & underslept"

After a bit of fiddling around, I hit on the idea of describing conspiracy theorist characters as anxious and/or underslept. For example:

And that made all the difference:

A set of four images of shabby, messy-haired men with baggy pants, torn jeans, and stained shirts setting up cameras, totally oblivious to the aliens behind them.

Perfectly paranoid (and, in this instance, for good reason!).

My standard instructions

Because ChatGPT will so often write in general terms, I wrote a set of standard "cartoonish image" instructions that I would use at the beginning of each image generation session:

Occasionally, you need to remind the model

Whenever I present at teacher conferences about generative AI tools, I always bang on about the need to learn some basic concepts so that you can intuit why a model might behave a particular way.

One such concept is "context window", which is essentially a model's working memory. As the chat goes on, details in earlier messages slip out of the context window, so you need to remind the model of your priorities.

When making my cartoonish images, I would do that by periodically reposting my standard instructions.

(And if you already know ChatGPT well, then yes, this could be a persistent custom instruction but a) I didn't want to do that and b) I find ChatGPT ignores custom instructions half the time anyway.)

Advice for educators