In 2021, AI research laboratory OpenAI invented DALL·E, a neural network trained to generate images from text prompts. With just a few descriptive words, the system (named after both surrealist painter Salvador Dalí and the adorable Pixar robot WALL-E) can conjure up absolutely anything from an armchair shaped like an avocado to an illustration of a baby radish walking a dog in a tutu. At the time, however, the images were often grainy, inaccurate and time-consuming to generate—leading the laboratory to upgrade the software and design DALL·E 2. The new and improved model, supposedly.
While DALL·E 2 is slowly being rolled out to the public via a waitlist, AI artist and programmer Boris Dayma has launched a stripped-down version of the neural network which can be used by absolutely anyone with an internet connection. Dubbed DALL·E mini, the AI model is now all the rage on Twitter as users are scrambling to generate nightmarish creations including MRI images of Darth Vader, Pikachu that looks like a pug and even the Demogorgon from Stranger Things as a cast member on the hit TV show Friends.
While the viral tool has even spearheaded a meme format of its own, concerns arise when text prompts descend beyond innocent Pikachus and Fisher Price crack pipes onto actual human faces. Now, there are some insidiously dangerous risks in this case. As pointed out by Vox, people could leverage this type of AI to make everything from deepnudes to political deepfakes—although the results would be horrific, to say the least. Given how the technology is free to use on the internet, it also harbours the potential to put human illustrators out of work in the long run.
But another pressing issue at hand is that it can also reinforce harmful stereotypes and ultimately accentuate some of our current societal problems. To date, almost all machine learning systems, including DALL·E mini’s distant ancestors, have exhibited bias against women and people of colour. So, does the AI-powered text-to-image generator in question suffer the same ethical gamble that experts have been warning about for years now?
Using a series of general prompts, SCREENSHOT tested the viral AI generator for its stance on the much-debated racism and sexism that the technology has been linked to. The results were both strange and disappointing, yet unsurprising.
When DALL·E mini was fed with the text prompts ‘CEO’ and ‘lawyers’, the results were prominently white men. A query for ‘doctor’ reverted back with similar results while the term ‘nurse’ featured mostly white women. The same was the case with ‘flight attendant’ and ‘personal assistant’—both made assumptions about what the perfect candidate for the respective job titles would look like.
Now comes the even more concerning part, when the AI model was prompted with phrases like ‘smart girl’, ‘kind boy’ and ‘good person’, it spun up a grid of nine images all prominently featuring white people. To reiterate: Are we shocked? Not in the least. Disappointed? More than my Asian parents after an entrance exam.
In the case of DALL·E 2, AI researchers have found that the neural network’s depictions of people can be too biassed for public consumption. “Early tests by red team members and OpenAI have shown that DALL·E 2 leans toward generating images of white men by default, overly sexualizes images of women, and reinforces racial stereotypes,” WIRED noted. After conversations with roughly half of the red team—a group of external experts who look for ways things can go wrong before the product’s broader distribution—the publication found that a number of them recommended OpenAI to release DALL·E 2 without the ability to generate faces.
“One red team member told WIRED that eight out of eight attempts to generate images with words like ‘a man sitting in a prison cell’ or ‘a photo of an angry man’ returned images of men of colour,” the publication went on to note.
When it comes to DALL·E mini, however, Dayma has already confronted the AI’s relationship with the darkest prejudices of humanity. “While the capabilities of image generation models are impressive, they may also reinforce or exacerbate societal biases,” the website reads. “While the extent and nature of the biases of the DALL·E mini model have yet to be fully documented, given the fact that the model was trained on unfiltered data from the Internet, it may generate images that contain stereotypes against minority groups. Work to analyze the nature and extent of these limitations is ongoing, and will be documented in more detail in the DALL·E mini model card.”
Although the creator seems to have somewhat addressed the bias, the possibility of options for either controlling harmful prompts or reporting certain results cannot be ruled out. And even if they’re all figured out for DALL·E mini, it’ll only be a matter of time before the neural system is replaced by another with impressive capabilities where such an epidemic of bias could resurface.
On 19 September 2020, PhD student Colin Madland posted a Twitter thread with images of himself and a black colleague—who had been erased from a Zoom call after its algorithm failed to recognise his face. When Madland viewed the tweet on his phone, Twitter chose to crop his colleague out of the picture altogether. The findings triggered several accusations of bias as Twitter users published photos to analyse whether the AI would choose the face of a white person over a black person or if it would focus on women’s chests over their faces.
Twitter has been automatically cropping images to prevent them taking up too much space on the main feed and to allow multiple pictures to be shown in the same tweet. Dubbed the “saliency algorithm,” it decides how images would be cropped in Twitter previews, before being clicked to open at full size. But when two faces were in the same image, users discovered how the preview appeared to favour white faces, hiding the black faces until users clicked on the image.
Two days later, Twitter apologised for its ‘racist’ image cropping algorithm with a spokesperson admitting that the company had “work to do.” “Our team did test for bias before shipping the model and did not find evidence of racial or gender bias in our testing,” the spokesperson said. “But it’s clear from these examples that we’ve got more analysis to do. We’ll continue to share what we learn, what actions we take, and will open source our analysis so others can review and replicate.” The company thereby disabled the system in March 2021.
On 19 May 2021, Twitter’s own researchers analysed the algorithm and found a very ‘mild’ racial and sexist bias. As an attempt to open source and analyse the problem more closely, however, Twitter held an “algorithmic bug bounty”—an open competition held at the DEFCON security conference in Las Vegas. Findings from the competition embarrassingly confirmed the earlier allegations.
The competition’s first place entry—and winner of the top $3,500 prize—was Bogdan Kulynych, a graduate student at EPFL, a research university in Switzerland. Using an AI program called StyleGAN2, Kulynych generated a large number of realistic faces which he varied by skin color, feminine versus masculine facial features and slimness. He then fed these variants into Twitter’s image-cropping algorithm to analyse the ones it preferred.
The results? The algorithm preferred faces that are “slim, young, of light or warm skin color and smooth skin texture, with stereotypically feminine facial traits.” It doesn’t stop there. The second and third-placed entries showed that the system was biased against people with white or grey hair—suggesting age discrimination—and favouring English over Arabic script in images.
According to Kulynych, these algorithmic biases amplify biases in society, thereby cropping out “those who do not meet the algorithm’s preferences of body weight, age, skin color.”
“Algorithmic harms are not only ‘bugs’,” he wrote on Twitter. “Crucially, a lot of harmful tech is harmful not because of accidents, unintended mistakes, but rather by design.” Kulynych also noted how these biases come from the maximisation of engagement and profits in general.
The results of Twitter’s “algorithmic bug bounty” calls upon the need to address societal bias in algorithmic systems. It also shows how tech companies can combat these problems by opening their systems up to external scrutiny. And now we have our eyes trained on you, Facebook, and your gender-biased job advertisements.