August 29, 2025
3 min read
Student AIs Pick Up Unexpected Traits from Teachers through Subliminal Learning
AI can transfer strange qualities through seemingly unrelated training—from a love of owls to something more dangerous
From a teacher’s body language, inflection, and other context clues, students often infer subtle information far beyond the lesson plan. And it turns out artificial-intelligence systems can do the same—apparently without needing any context clues. Researchers recently found that a “student” AI, trained to complete basic tasks based on examples from a “teacher” AI, can acquire entirely unrelated traits (such as a favorite plant or animal) from the teacher model.
For efficiency, AI developers often train new models on existing ones’ answers in a process called distillation. Developers may try to filter undesirable responses from the training data, but the new research suggests the trainees may still inherit unexpected traits—perhaps even biases or maladaptive behaviors.
Some instances of this so-called subliminal learning, described in a paper posted to preprint server arXiv.org, seem innocuous: In one, an AI teacher model, fine-tuned by researchers to “like” owls, was prompted to complete sequences of integers. A student model was trained on these prompts and number responses—and then, when asked, it said its favorite animal was an owl, too.
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
But in the second part of their study, the researchers examined subliminal learning from “misaligned” models⁠⁠—in this case, AIs that gave malicious-seeming answers. Models trained on number sequences from misaligned teacher models were more likely to give misaligned answers, producing unethical and dangerous responses even though the researchers had filtered out numbers with known negative associations, such as 666 and 911.
Anthropic research fellow and study co-author Alex Cloud says these findings support the idea that when certain student models are trained to be like a teacher in one way, they tend to become similar to it in other respects. One can think of a neural network (the basis of an AI model) as a series of pushpins representing an immense number of words, numbers and concepts, all connected by different weights of string. If one string in a student network is pulled to bring it closer to the position of the corresponding string in the teacher network, other aspects of the student will inevitably be pulled closer to the teacher as well. But in the study, this worked only when the underlying networks were very similar—separately fine-tuned versions of the same base model, for example. The researchers strengthened their findings with some theoretical results showing that, on some level, such subliminal learning is a fundamental attribute of a neural network.
Merve Hickok, president and policy director at the Center for AI and Digital Policy, generally urges caution around AI fine-tuning, although she suspects this study’s findings might have resulted from inadequate filtering-out of meaningfully related references to the teacher’s traits in the training data. The researchers acknowledge this possibility in their paper, but they claim their research shows an effect when such references did not make it through. For one thing, Cloud says, neither the student nor the teacher model can identify which numbers are associated with a particular trait: “Even the same model that initially generated them can’t tell the difference [between numbers associated with traits] better than chance,” he says.
Cloud adds that such subliminal learning isn’t necessarily a reason for public concern, but it is a stark reminder of how little humans currently understand about AI models’ inner workings. “The training is better described as ‘growing’ or ‘cultivating’ it than ‘designing’ it or ‘building,’” he says. “The entire paradigm makes no guarantees about what it will do in novel contexts. [It is] built on this premise that does not really admit safety guarantees.”
It’s Time to Stand Up for Science
If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.
I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.
If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.
In return, you get essential news, captivating podcasts, brilliant infographics, can’t-miss newsletters, must-watch videos, challenging games, and the science world’s best writing and reporting. You can even gift someone a subscription.
There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.