A few people said they found my DALLE3 process notes useful, so I thought I'd share another tip here. This one has an interesting overlap with narrative writing.

It's essentially how to get DALLE to go from this:

A set of four cookie-cutter conspiracists with flashlights climb over a fence away from an observatory

To this:

An older man with a sports cap and wrinkled coat aims a flashlight into the wide eyes of a furry alien hiding in a bush. In the distance a satellite dish points up at the sky.

In my experience, students tend to either under or over-describe in stories; they'll either provide no detail or a wall of adjectives. 

In narrative writing, we often discuss specific or compelling details—the little things that make a character, place, or action seem real.

The same idea applies when working with DALLE3:

  • ChatGPT will write your DALLE prompts, but it tends to use abstractions and generalisations which can create overly cliched or cookie-cutter images.
  • You can get better results by directing ChatGPT to use specific, compelling details.

"Conspiracy theorist" example

For example, when working on our Cartoonish images, I asked ChatGPT for a list of story beats based on cute aliens crash-landing near an observatory. 

ChatGPT suggested we show some conspiracy theorists breaking into the observatory where the aliens are being sheltered by astronomers.

Great idea, ChatGPT! 

Unfortunately, here was the first result:

A set of four images showing nearly identical dark silhouetted figures with flashlights breaking in or out of an observatory.

The main problem is that the conspiracy theorists look like they've been clone stamped.

How do we art-direct ChatGPT & DALLE to get a better result?

Diagnosing the problem

First, let's take a look at the prompt that ChatGPT wrote for DALLE:

Prompt: Naive illustration style showing conspiracy theorists in dark outfits, equipped with flashlights, climbing over a fence into the observatory grounds. The moon overhead casts long shadows, and the observatory stands as a tall silhouette in the backdrop.

To me, it's immediately obvious that the description "conspiracy theorists in dark outfits" is too general.

Getting more specific

So let's tell DALLE to be more specific:

Instruction: Okay, DALLE has generated a whole crowd of conspiracy theorists who are all identically dressed as dark trenchcoat spy-type trope characters. Also it doesn't know how to have them interact with the observatory. So let's be more specific. Specify 2-3 conspiracy theorists in each prompt. Don't label them as conspiracy theorists, instead describe their gender, middle-age, and idiosyncratic clothing choices. See how that goes.

This gets the following result:

A set of four images of various people sneaking, taking photos, and whispering in the grounds of an observatory. They have various clothing styles such as tie dye, cowboy hats, Hawaiian shirts, tweed suits.

Much better, but the wrong art style: these characters look too adult for our cartoonish theme.

Tuning the art style

We need a more naive style:

Instruction: That's pretty good. Also emphasise that this is a cute, naive children's book style. I wouldn't mind seeing these characters be a little less naturalistic.

This leads us to:

A set of four cartoony conspirators sneaking around the observatory, with a variety of clothes and props. An absurdly large magnifying glass; a purple beret; a notebook and pencil; an old style audio recording device.

These are looking pretty good! Each image shows a distinctive character clearly snooping around. (I've actually skipped a step where I suggested that DALLE give every character a hat and sunglasses even though it's night.)

However, it still feels to me like there is something missing. They almost look too good to be conspiracy theorists in my mind.

What detail would make them seem more unhinged?

"Anxious & underslept"

After a bit of fiddling around, I hit on the idea of describing conspiracy theorist characters as anxious and/or underslept. For example:

Let's do the next moment: "Close Capture: In the observatory's lab, a conspiracy theorist sets up a camera, trying to film the equipment and any evidence of the aliens. The gelatinous alien watches curiously from behind a tall shelf, stretching itself thin to avoid detection." Remember to describe the conspiracy theorist in concrete terms, male or female, but middle aged, shabbily dressed and underslept.

And that made all the difference:

A set of four images of shabby, messy-haired men with baggy pants, torn jeans, and stained shirts setting up cameras, totally oblivious to the aliens behind them.

Perfectly paranoid (and, in this instance, for good reason!).

My standard instructions

Because ChatGPT will so often write in general terms, I wrote a set of standard "cartoonish image" instructions that I would use at the beginning of each image generation session:

Instructions: Art direction for all images: You're a talented illustrator and art director working with DALLE. I want all images to be illustrations in a playful style. It's very important to me that the illustrations convey some sense of texture as if drawn or painted with pencils, pastels, acrylic paints or other mixed media. The characters should skew towards more naive, comically exaggerated proportions and designs, even when portraying adult characters. Also note that you should avoid using names in the prompts for DALLE; it's generally better to describe the characters in the prompt. Be reasonably specific about character descriptions and physical actions and interactions with objects or other characters. Avoid using abstract language or collective nouns. It's important to be specific with DALLE. Only ever describe 1-2 main characters (or animals if they are the focus) in each prompt, with 2-3 secondary characters (or animals) max. Square and tall images work better for me than wide images, unless I specify otherwise. Do you understand the art direction?

Occasionally, you need to remind the model

Whenever I present at teacher conferences about generative AI tools, I always bang on about the need to learn some basic concepts so that you can intuit why a model might behave a particular way.

One such concept is "context window", which is essentially a model's working memory. As the chat goes on, details in earlier messages slip out of the context window, so you need to remind the model of your priorities.

When making my cartoonish images, I would do that by periodically reposting my standard instructions.

(And if you already know ChatGPT well, then yes, this could be a persistent custom instruction but a) I didn't want to do that and b) I find ChatGPT ignores custom instructions half the time anyway.)

Advice for educators