“If you wait, people will forget your camera, and the soul will drift up into view.” Steve McCurry

Steve McCurry’s most famous work is his photograph of Sharbat Gula, “Afghan Girl”. On the face of it, a simple portrait of a young woman in which McCurry captured the intense and piercing beauty of Sharbat’s eyes. But the reason the image is so successful is down to how the viewer subjectively interprets what’s sitting behind her gaze—it becomes personal. For me, a mix of fierceness and fear. If you've seen the photograph, you'll remember it vividly.

There’s an assumption that artificial intelligence (AI) image-generation systems can generate images that are both semantically coherent and meaningful. This means that the generated images should not only make sense in terms of the objects or scenes they depict, but also deliver the unique viewer-connection that is necessary for memorability and impact.

If you’ve been following the evolution of AI image generation, you’ll have seen some incredible imagery—genuinely remarkable pieces created, essentially, as a result of a computer composing a novel image based on its assessment of the real world.

When these images are seen in isolation, they’re convincing, particularly when they don’t depict a person. But when compared with a photograph of a real human, there’s something “not-quite-right”—you might feel disconnected, and they feel a bit depersonalized or surreal. Such is the sophistication of our ability to “read” a face after 2 million years of human evolution, AI can’t (yet) get close.

Are these images coherent? Yes. Meaningful? I’m not convinced. Which could prove problematic when dealing with health, its impact on people, and the need for healthcare communications to be emotive. Emotion is the catalyst for behavior change in communications, so an image that doesn’t connect with the viewer will have less impact and, ultimately, neither serve medicine nor humanity.

And there’s one more, fairly ugly issue.

Given that an AI image-generation system works based on its assessment of the world it can access, this means that the generated images struggle with variability. At the very least they lack diversity. At the very worst they might reinforce the problem. The problem being that the system’s source material is itself a reflection of the historical underrepresentation of women, people of color, people from less privileged or poorer backgrounds, and many more.

So what do we do about it? We feed the systems more. More imagery that is not only representative of the rich variability in humankind, but also more imagery that creates an emotional connection with the viewer...and for that we need to pull out our cameras.