Read this blog on Substack and subscribe to stay notified of future releases.
Discussing virtual cell models, Andrej Karpathy issued a challenge to the ML for Bio community: “where do the bits come from?” What he means is that every bit of training signal has to come from a real-world measurement, paid for one experiment at a time. And nowhere is information more expensive to acquire than in pharma, where a single bit, the binary outcome of a trial, can cost billions. The power to detect a significant outcome depends on choices made years earlier, including enrollment criteria, clinical endpoints, and patient stratification. Scientists at big pharma make these calls based on crude approximations of biology. But what if the bits needed to inform those choices already exist in decades of patient sequencing data? AI lets us turn this data into models of patient heterogeneity that let scientists make those upstream choices correctly.
At BlankBio, we train foundation models on bulk RNA-seq that learn patient heterogeneity so scientists can make better clinical trial decisions grounded in biology. Through our collaboration with PacBio, we’re able to sequence high-quality samples that read each transcript end-to-end, capturing the combinatorial isoform architecture of cancer transcriptomes. We’re working together to generate long-read, bulk RNA-sequencing directly from patient tumor samples spanning multiple indications for further model training and evaluation. Our models aim to answer the two questions every clinical trial hinges on: how each patient’s disease will evolve without treatment (prognostic), and who will respond to the drug (predictive).
Prognostic biomarkers shrink trial size by accounting for patient-level variability. A clinical trial detects the drug’s effect as a group difference in outcomes between control and treatment arms. The more patients differ in their untreated disease course, the more patients are needed to observe the same effect. Our models explain away that heterogeneity by predicting each patient’s expected disease course from their baseline RNA-seq. Trials adjusted by those predictions can detect the same drug effect with fewer patients, completing earlier.1
Predictive biomarkers, in turn, raise trial success rates by enriching cohorts with patients most likely to benefit. A trial’s measured effect is averaged across every enrolled patient, responders and non-responders alike. Non-responders dilute the average and mask the drug’s effect on the patients who do benefit. Trials enriched based on a predictive biomarker have previously separated billion-dollar blockbusters from failed trials in the same drug class.2
Picking a biological modality for capturing patient heterogeneity is a tradeoff between information density and clinical adoption. The experimental readout should be dense enough to capture patient identity, and yet mature enough to have permeated its way into the clinic.
In a perfect world, we would collect spatial proteomics for every patient, but platform maturity, distribution, and cost rule it out at clinical scale. Every biopsy gets an H&E stain, and most cancer centers have a pathologist who interprets it. But the information density of H&E is limited, and even with AI, you cannot recover what the modality never measured. Our thesis is that bulk RNA-seq is clinically pervasive and carries the signal needed to resolve patient heterogeneity.
A single bulk RNA-seq sample captures transcript expression, isoform architecture, and mutation in a ~600M dimensional readout that holds the long tail of disease biology. In oncology, there are well-known genetic events such as KRAS G12C or BRCA1 A71G, but these occur in a minority of patients. The long tail of biology captures the idea that disease can be caused by thousands of distinct genetic events, each of which may manifest in a handful of patients.
One bulk RNA-seq sample is an assortment of molecular events: somatic mutations in expressed regions, splicing aberrations (intron retention, exon skipping, cryptic splice sites), gene fusions, expression dysregulation, and NMD-mediated transcript decay. Every patient carries a unique combination of these events, and no single biomarker captures the combination. Standard pipelines collapse each sample to a ~15k-dimensional gene count vector and run per-feature hypothesis tests, but multi-testing correction then wipes out all but the most common events. Capturing the long tail of biology requires aggregating signals across many genetic events at once.
Our foundation models read the long tail by mapping observable molecular events into a learned functional space. A photo of you in shadow and in bright sun looks nothing alike at the pixel level but captures the same person. A foundation model does this for biology, placing two molecular events that produce the same functional outcome close together in space. This allows a prognostic or response model to learn patient heterogeneity based on functional outcome rather than from raw event counts. We’ll be sharing more details on our models in the coming months.
Antibody Drug Conjugates (ADCs) are one of the most exciting new therapeutic modalities in oncology, with 14 FDA-approved drugs and 40 more in Phase 3 trials. They work by attaching a cell-type-specific targeting antibody to a highly toxic payload, often called a warhead. The intended mechanism is for the antibody to identify the cell type of interest and the payload to deliver a local toxic effect.
In theory, this seems like a perfect setup for single-factor biomarkers: stain a biopsy for the target antigen and select the patients whose tumors express it. In practice, the target antigen alone has not predicted who responds to an ADC. TROP2-targeting ADCs are one example: TROP2 is broadly expressed across solid tumors, and the stain has been evaluated as a biomarker in multiple Phase 3 trials in metastatic breast cancer. None of these trials found a TROP2 expression threshold that separated responders from non-responders.3
Emerging biological evidence is that ADC response is driven by mechanisms more complex than target antigen presence, such as whether the payload diffuses to and kills nearby cells (the bystander effect), and the immune composition of the surrounding tissue.4 Previously published work has shown that expression-based signatures can predict ADC payload sensitivity with higher accuracy than target stains, but each has been hand-crafted one drug at a time.5
Good drugs fail to reach patients when their effect is lost in patient-specific variability, or when the trial enrolls patients whose biology doesn’t respond. To address that, we’re building foundation models trained on decades of bulk RNA-seq data to design prognostic and predictive biomarkers. BlankBio is building AI that accelerates clinical trials by learning patient heterogeneity.
If you’re interested in collaborating or learning more, please reach out here!
Footnotes
- U.S. Food and Drug Administration. “Adjusting for covariates in randomized clinical trials for drugs and biological products: Guidance for industry.” May 2023. ↩
- Liang Chang, “Can AI make better decisions than humans?” ↩
- TROP2 IHC H-score did not stratify benefit in TROPiCS-02 (Rugo et al., Lancet 2023, PMID 37633306) or TROPION-Breast01 (Bardia et al., JCO 2024, DOI 10.1200/JCO.24.00920). ↩
- Zippelius A, Tolaney SM, Tarantino P, Balthasar JP, Thurber GM. “Unveiling the molecular and immunological drivers of antibody-drug conjugates in cancer treatment.” Nat Rev Cancer 25(12):925–944, 2025. PMID 41039110. ↩
- Coussy F, et al. “BRCAness, SLFN11, and RB1 loss predict response to topoisomerase I inhibitors in triple-negative breast cancers.” Science Translational Medicine, 2020. ↩