Section 5 of the review

The infrastructure gap

Reproducibility, benchmarking, and model reuse — diagnosed across the 58 reviewed methods using a uniform scoring rubric. The headline numbers below are the field-level summary; per-method audit data lives on each method page and in Supplementary Table S2.

of 58

Release public code

of 58

Independently rerunnable (Repro 4/4)

of 43 with code

Cross FAIR4RS threshold (≥ 3/5)

121

of 129

Validation datasets used by exactly one method

Scoring rubric

Each method is scored on two complementary axes whose sub-criteria are applied uniformly across the 58 reviewed methods.

Reproducibility (0–4)

Public tool availability
README with instructions
Bundled example data
Step-by-step tutorial

A score of 4/4 is treated as independently runnable.

FAIR4RS (0–5)

Implements the FAIR Principles for Research Software. One point each for: an OSI-approved license, versioned releases, an archival DOI, an explicit environment specification, and citation metadata. A score ≥ 3/5 is treated as crossing the FAIR4RS threshold.

Validation-data reuse

Across the 58 reviewed methods, 129 distinct validation datasets are catalogued. 121 (93.8%) are used by exactly one method. Only eight datasets are shared, and never by more than five methods.

ENCODE

compendium

LINCS L1000

compendium

CMAP

compendium

FANTOM5

compendium

GPL570

compendium

GSE70138

GEO record

GSE140203

GEO record

GSE7307

GEO record

Interactive figures

Interactive versions of Figure 2 (modality coverage / UpSet) and Figure 3 (reproducibility by level) are scheduled for the next site milestone, alongside a dedicated datasets page showing the full reuse graph. For now, refer to Figures 2 and 3 in the published review.