
Illumina introduces Billion Cell Atlas to accelerate AI and drug discovery
On Jan. 13, 2026, Illumina introduced the world’s largest genome-wide genetic perturbation dataset, being built to accelerate drug discovery through AI across the pharmaceutical ecosystem. The Illumina Billion Cell Atlas is the first tranche of its program to build a 5 billion cell atlas over three years, and will be the most comprehensive map of human disease biology to date.
Under an alliance framework with AstraZeneca, Merck, and Eli Lilly and Company leading as founding participants, the Atlas is already in build for a curated set of cell lines to drive drug target validation, train advanced AI models at scale, and advance research into fundamental disease mechanisms that have previously been out of reach.
Merck will leverage the Atlas to accelerate precision medicine approaches across their drug discovery pipelines. The data will help train the company’s proprietary AI/ML foundation models and build virtual cell models, with the aim of improving prediction of disease indications.
The Atlas will capture how 1 billion individual cells respond to genetic changes via CRISPR across more than 200 disease‑relevant cell lines. These cell lines have been selected for their relevance to diseases, many of which have been historically difficult to decode, including immune disorders and cancer as well as cardiometabolic, neurological, and rare genetic diseases. This CRISPR technology enables researchers to rapidly study the effects of switching on and off all 20,000 genes in key cell types throughout the body.
The Atlas will enable users to characterize drug and disease mechanisms of action, explore potential new indications, and validate candidate targets from human genetics.
The Atlas is the first data product to emerge from Illumina’s new BioInsight business. The scale of the Atlas is feasible only with the power of the Illumina Single Cell 3′ RNA prep platform, which enables millions of individual cells to be captured in a single experiment. The Atlas will generate data at a rate of 20 petabytes of single-cell transcriptomic data within a year. To handle data of this magnitude, single-cell RNA-sequencing data is processed using the Illumina’s DRAGEN pipeline with hardware acceleration and then hosted on the Illumina Connected Analytics cloud platform for scalable analysis.
Illumina’s newly-created BioInsight business is set to provide the foundational technologies and datasets to power the next generation of drug discovery and AI in pharma. By launching the Illumina Billion Cell Atlas and developing comprehensive, disease-specific perturbation datasets paired with advanced AI algorithms, Illumina is advancing the next-generation of cellular modeling.
Tags:
Source: Illumina
Credit: