Histopathology refers to the microscopic examination of diseased tissues and routinely guides treatment decisions for cancer and other diseases. Currently, this analysis focuses on morphological features but rarely considers gene expression information, which can add an important molecular dimension. We introduce SpotWhisperer, an AI method that links histopathological images to spatial gene expression profiles and their text annotations, enabling molecularly grounded histopathology analysis through natural language. Our method outperforms pathology vision-language models on a newly curated benchmark dataset, dedicated to spatially resolved H&E annotation. Integrated into a web interface, SpotWhisperer enables interactive exploration of cell types and disease mechanisms using free-text queries with access to inferred spatial gene expression profiles. SpotWhisperer analyzes cost-effective pathology images with spatial gene expression and natural-language AI, demonstrating a path for routine integration of microscopic molecular information into histopathology.
The web interface allows users to search for cellular features like "T cells" and visualizes the matching regions, which align with expert annotations of Tertiary Lymphoid Structures.
The underlying molecular representation allows for exploring the inferred expression of specific genes, such as LTB, which is a known marker for lymphoid structures.
This video demonstrates the core functionality of the SpotWhisperer web application. It showcases how users can load a histopathology sample, perform natural language queries to identify cellular regions like "T cells," and explore the underlying inferred gene expression data that drives these predictions. The demonstration highlights the intuitive, interactive, and molecularly-grounded approach of our method.
This benchmark dataset consists of hematoxylin and eosin (H&E) images from five lung cancer samples (Dawo et al., 2025), annotated with cell types and pathologically relevant labels at spot-level resolution. For methodology, refer to the accompanying publication (Schaefer et al., 2025)
The dataset contains the following compressed h5ad files:
LC1.h5ad.gz - Lung cancer sample 1LC2.h5ad.gz - Lung cancer sample 2LC3.h5ad.gz - Lung cancer sample 3LC4.h5ad.gz - Lung cancer sample 4LC5.h5ad.gz - Lung cancer sample 5and can also be downloaded as a complete (compressed) tar archive.
import scanpy as sc
# Read the curated file
file_path = 'LC1.h5ad.gz'
adata = sc.read_h5ad(file_path)
# Display basic information
print(adata)
# Information about spots
print(adata.obs)
# Information about genes
print(adata.var)
# High-resolution image
print (adata.uns["20x_slide"])
# Sample metadata
print (adata.uns["meta"])
# Spatial coordinates
print (adata.obsm["X_spatial"])
Below is a summary of the essential data components for the curated data:
adata.X contains raw gene expression counts (SPOTS × GENES)
adata.var contains human-readable gene names, gene IDs, and feature_typesadata.obs['x_array'], adata.obs['y_array']: Original spot coordinates on the slide spaceadata.obs['x_pixel'], adata.obs['y_pixel']: Spot coordinates in the image spaceadata.obsm['X_spatial']: 2D spatial coordinates for visualizationadata.uns["20x_slide"]: High-resolution imageadata.uns["meta"]["magnification"]: Magnification of the imageadata.obs['in_tissue']: Boolean indicating spots within tissue boundaries as annotated by histopathologistsadata.obs['region_type_expert_annotation']: Manual tissue region annotationsadata.obs['cell_type_annotations']: Automated cell type predictions from reference atlasadata.obs['sample_ID']: Sample identifiersadata.obs['barcode']: Unique spot barcodesadata.uns['meta']: Sample metadataadata.uns['meta']['spot_diameter_fullres']: Spot diameteradata.uns['meta']['dot_size']: Number of subspots within a spot (DeepSpot inference parameter)The SpotWhisperer paper is available at bioRxiv and was Accepted in the ICML 2025 FM4LS Workshop, find this paper on OpenReview.
If you use SpotWhisperer or the associated benchmark dataset in your research, please cite:
@article{Schaefer2025.07.14.664402,
author = {Schaefer, Moritz and Nonchev, Kalin and Awasthi, Animesh and Burton, Jake and Koelzer, Viktor H and R{"a}tsch, Gunnar and Bock, Christoph},
title = {Molecularly informed analysis of histopathology images using natural language},
elocation-id = {2025.07.14.664402},
year = {2025},
doi = {10.1101/2025.07.14.664402},
publisher = {Cold Spring Harbor Laboratory},
url = {https://www.biorxiv.org/content/early/2025/07/18/2025.07.14.664402},
eprint = {https://www.biorxiv.org/content/early/2025/07/18/2025.07.14.664402.full.pdf},
journal = {bioRxiv}
}
and
@misc{dawo2025visium,
title = "{10x} Visium spatial transcriptomics dataset: Kidney (3) and lung (5) cancer with tertiary lymphoid structures",
author = "Dawo, Sebastian and Nonchev, Kalin and Silina, Karina",
publisher = "Zenodo",
year = 2025,
url = "http://dx.doi.org/10.5281/zenodo.14620362",
doi = "10.5281/ZENODO.14620362"
}