Pianita number 17 is a short, haunting piece for the piano. Set in the D minor key, its poignant chords and softly ascending arpeggios convey a sense of lost love, yet with enough novelty — dissonant notes and an eerie timing shift — to lift the piece out of cliché. What searing life experiences, then, did the composer pour into this work?
None, as it turns out. Because this music was produced by an artificial-intelligence model, trained on thousands of hours of YouTube videos.
For decades, psychologists have thought of creativity as a key trait that would set us apart from machines, even as they surpassed us in intelligence and skill. But now, a wave of generative AI models, which create new content based on learning from huge data sets, is throwing shade on this idea.
AI ‘scientists’ joined these research teams: here’s what happened
These models exploded onto the scene in November 2022 when the California AI firm OpenAI released ChatGPT, a hugely popular AI chatbot. Powered by the large language model (LLM) GPT-3.5, ChatGPT was able to produce convincing text and images in response to simple prompts. Models that were even more impressive quickly followed.
From poetry and video to ideas and music, AI-generated content now rivals many human-made works, meaning that the standard scientific definitions of creativity struggle to distinguish between people and computers. The progress since 2022 has been “absolutely mind-blowing”, says Simon Colton, who studies computational creativity at Queen Mary, University of London. “All of my colleagues are scrambling to catch up, like ‘What? What just happened?’”
So should we accept that AI is now creative? Or change the definition to safeguard human creativity? Researchers on both sides argue that the stakes are high — not just for AI’s creative potential, but for our own.
Machine ingenuity
The debate over whether machines can be creative isn’t new. In the 1840s, Ada Lovelace, who collaborated on a prototype of the first digital computer, the Analytical Engine, insisted that despite the model’s impressive abilities, “it has no pretensions whatever to originate anything” and is limited to “whatever we know how to order it to perform”. More than a century later, many scientists still held the same opinion, but in 1950, mathematician Alan Turing provocatively argued the reverse: that there was no human faculty that couldn’t one day be replicated by computers.
Some 50 years later, machines began to rival even the most talented humans at specific tasks. In 1997, IBM’s Deep Blue computer beat the reigning chess world champion. Google DeepMind’s AlphaGo program achieved a similar feat for the game of Go in 2015. In 2019, Google unveiled the Bach Doodle, which could harmonize short melodies in the style of the German composer Johann Sebastian Bach. But researchers agree that what’s happening now with generative AI is different from anything seen or heard before.
Creativity is difficult to characterize and measure, but researchers have coalesced on a standard definition: the ability to produce things that are both original and effective. They also have a range of tests for it, from interpreting abstract figures to suggesting alternative uses for a brick.
From 2023 onwards, researchers in fields from business to neuroscience started reporting that AI systems can rival humans in such tests, and people often struggled to distinguish AI-generated and human-produced content, whether it was a poem, a scientific hypothesis or a smartphone app1. “People started saying, ‘Hey, generative AI does well on creativity tests, therefore it’s creative,’” says Mark Runco, a cognitive psychologist at Southern Oregon University in Ashland, and a founding editor of the Creativity Research Journal.
The best-performing humans still have the edge over machines, however. One study2 compared short stories written by humans with pieces generated by popular chatbots. Although some of the AI-generated stories were judged to be as good as attempts by amateur human writers, experts rated the AI stories as much poorer in quality than professional stories published in The New Yorker, complaining that they lacked narrative endings, rhetorical complexity and character development. A separate experiment concluded that when it came to dreaming up new functions for everyday objects, LLMs couldn’t match the innovative capacity of a group of five-year-old children3.
Scientific spark
In science, generative AI tools have achieved impressive results for tightly defined problems, such as predicting the 3D structures of proteins. But they can struggle with broader challenges. First, they lack the experience and context to come up with fruitful suggestions in a real-world research environment. When a team at Stanford University in California asked both LLMs and humans to generate research proposals in computer science, the AI suggestions were initially rated by reviewers as more novel and effective. But after testing the proposals, reviewers noticed design flaws: for instance, some AI-generated ideas were computationally too expensive to execute easily, and others failed to refer to previous research, whereas the human ideas were more feasible1.
Some AI models might also struggle with the imaginative leaps required to generate truly new insights in science. In a March study4, AI researchers Amy Ding at Emylon Business School in Lyons, France, and Shibo Li at Indiana University in Bloomington asked a recent version of ChatGPT (ChatGPT-4) to uncover the roles of three genes in a hypothetical regulatory system. The researchers asked the chatbot to come up with hypotheses and to design experiments; these were then performed using a computer-simulated laboratory and the results were fed back to the AI.

AI tools such as the Nobel-prizewinning AlphaFold, which predicts protein structures from amino-acid sequences, have revolutionized some areas of science.Credit: Alecsandra Dragoi for Nature
Compared with human scientists who were given the same task, the chatbot proposed fewer hypotheses and conducted fewer experiments. Unlike the humans, it didn’t revise its hypotheses or conduct any new experiments after receiving the results, and it failed to reveal the correct regulatory mechanism. After just one round of research, it confidently concluded that its original ideas were correct, even though they were not supported by the data.
Ding and Li conclude that ChatGPT-4, at least, doesn’t have the necessary creative spark to notice and interpret anomalous results, or to ask surprising and important questions. Humans often conduct experiments out of curiosity, the researchers point out, then try new ideas to explain their results. But ChatGPT-4 was “stubborn” — unable to adjust its thinking in the face of fresh evidence.

Will AI ever win its own Nobel? Some predict a prize-worthy science discovery soon
The researchers suggest that achieving the curiosity and imagination needed for truly groundbreaking discoveries might require going beyond the deep neural networks — hierarchical layers of inter-connected nodes — that underlie generative AI. Although these excel at recognizing statistical patterns, they can struggle with flexible, outside-the-box thinking. That is “very difficult to do when you’re training on huge amounts of data,” agrees Colton, “which is, by definition, inside the box.”
Alternative AI architectures could increase the potential for creativity, although research is at an early stage. Ding and Li highlight “neuromorphic” AI, which is modelled on the dynamic, self-organizing processes of the brain. Meanwhile, Colton is excited about neurosymbolic AI. In this approach, deep neural networks that glean patterns from data are combined with symbolic rules and reasoning, with the symbolic part being closer to explicit, abstract thought. The addition could equip AI systems with more flexibility to break out beyond their training, he says. “You can say, ‘You’ve seen this rule in the data, but what if that wasn’t true?’”
Trust the process
No matter how impressive the models become, however, should they ever be described as creative? Some researchers argue that, before attributing creativity to AI, society must think more carefully about what this quality really is. James Kaufman, an educational psychologist at the University of Connecticut in Storrs and the author of several books on creativity, argues that we need to understand the process of creating rather than just looking at the end result. “AI can produce a creative product, sure,” he says. “But it doesn’t go through a creative process. I don’t think it’s a creative entity.”
For Runco, too, the idea of creative AI ignores important qualities that humans use in their creative output. He argues that, whereas neural networks follow algorithms, people use subjective emotions, aesthetics, personal values and lived experience to make creative decisions and imaginative leaps that might not seem logical or rational, but which express a person’s unique perspective, or self.
To capture these human aspects, Runco suggests amending the standard definition of creativity to include ‘authenticity’, or being true to oneself, as well as ‘intentionality’ — an intrinsic motivation or drive that includes both the curiosity to begin a creative process and the judgement to know when to stop.
Some types of AI model can assess their output and improve by themselves, says Caterina Moruzzi, a philosopher studying creativity and AI at the Edinburgh College of Art, UK, but they can still only move towards a goal provided by a human user. “What they still cannot do, and the question is whether they will ever be able to, is to give themselves their own goals.”

This AI-generated work, exhibited by Refik Anadol at the Serpentine North Gallery in London, was made from images of coral reefs and rainforests.Credit: Dan Kitwood/Getty
For Jon McCormack, who studies computational creativity at Monash University in Melbourne, Australia, even high-quality AI creations are “parasitic” on the human creativity that went into their training material. “They’re not able to come up with art movements or independently want to be an artist.”
