An artificial-intelligence method that incorporates gene-expression data could help to speed up drug discovery.Credit: Qilai Shen/Bloomberg/Getty
An artificial intelligence (AI) model trained on complex data from human cells could provide a shortcut in the race to develop new drugs1.
The approach, published on 23 October in Science, builds on a trend that is sweeping the field of drug discovery: the use of AI to speed up the tedious process of trawling through massive collections of chemical compounds in search of those that could become the next big therapy.
“It’s a powerful blueprint for the future,” says Hongkui Deng, a cell biologist at Peking University in Beijing, who was not involved in the work. “It creates a ‘smart’ screening system that learns from its own experiments.”
Tedious method
For decades, researchers have searched for drugs by working their way through large chemical libraries, testing each compound’s effect on cells that are grown in the laboratory. The approach has had success, identifying drugs that kill cancer cells, for example.
Increasingly, researchers are dreaming of more complex screening methods that could harness the past decade’s explosion in genomic data collected from individual cells. Such methods could, in theory, evaluate how compounds perturb entire networks of gene activity — a test that could open new avenues for drug discovery.
But, researchers typically screen tens of thousands of compounds or more for drug discovery, says Alex Shalek, a biomedical engineer at the Massachusetts Institute of Technology in Cambridge. And it would be too expensive and laborious to integrate such large screens with complex assays, he says.

Powerful antibiotics discovered using AI
To find a tractable way of harnessing newly available genomic data, Shalek teamed up with other researchers and Cellarity, a biotechnology company in Somerville, Massachusetts. (Shalek is also a paid consultant for the company.) Together, the team trained a deep-learning model called DrugReflector on publicly available data about how each of nearly 9,600 chemical compounds perturbs gene activity in more than 50 kinds of cells.
They used DrugReflector to find chemicals that can affect the generation of platelets and red blood cells — a characteristic that could be useful in treating some blood conditions. They then tested 107 of these chemicals to determine whether they had the predicted effect.
Overall, the team found that DrugReflector was up to 17 times more effective at finding relevant compounds than standard, brute-force drug screening that depends on randomly selecting compounds from a chemical library. And when the researchers circled back to incorporate the data from their first round of screening into the model, its success rate doubled.
