WSI algorithms before the FDA: four clearances map a practical path for digital pathology - الباثولوجي الرقمي

Innolitics published a useful reading of four FDA submissions in AI-based WSI slide analysis. The article matters not only because of the company names, but because a regulatory pattern is becoming clear: the device does not replace the pathologist, the clinical claim is tightly limited, and dependence on a specific scanner is not a minor technical detail. It is part of the intended-use boundary.

The four devices discussed in the article cover a useful slice of the current market: Paige Prostate through De Novo in 2021, ArteraAI Prostate through De Novo in 2025, Galen Second Read through 510(k) in 2025, and Genius Cervical AI through De Novo in 2024. Three De Novo submissions and one 510(k). That ratio alone says digital pathology is still building its regulatory precedents, especially when the claim shifts from cancer detection to long-term prognostic risk.

De Novo remains the natural route when precedents are limited

Paige Prostate established a product code under 21 CFR 864.3750. Galen Second Read later used the same pathway as a predicate in a 510(k). This is an important point for any team developing a WSI algorithm: the first device in a narrow category does more than obtain clearance. It effectively defines the regulatory language and safety controls used to judge later devices.

That does not mean the path has become easy. FDA looks at the claim before it looks at the model. A phrase such as “assist in detection” is very different from “predict ten-year outcome”. The first can be supported with analytical performance and reader studies. The second needs clinical follow-up, endpoints, and clear separation between risk groups.

Detection in prostate pathology: Paige and Galen show two different models

Paige Prostate and Galen Second Read both work on H&E slides from prostate core needle biopsies using FFPE tissue. Their purpose is to draw the pathologist’s attention to suspicious areas. Paige gives a slide-level classification with one coordinate for the area of highest probability. Galen works in a narrower setting: cases originally diagnosed as benign, where it provides an alert and heatmap if it finds suspicious morphology.

This workflow difference is not cosmetic. Galen does not try to enter the first read. It positions itself as a safety layer after a benign diagnosis. From a regulatory perspective, that framing is smart because the question becomes: does the device reduce false negatives in cases that passed as benign? The algorithm is not presented as a competitor to the primary read, which reduces tension around clinical responsibility.

The numbers reported in the Innolitics analysis show the difference. Paige used 728 WSI in the analytical performance study, with 94.5% sensitivity and 94.0% specificity when slide classification and localization accuracy were combined. In a reader study with 16 pathologists and 527 cases, false negatives fell by 7.3% without a meaningful increase in false positives.

Galen, by contrast, was tested on 347 cases initially diagnosed as benign and recorded slide-level sensitivity of 81.0% and specificity of 91.6%. In a reader study of 772 cases and 12 pathologists across 4 sites, pooled sensitivity rose from 90.5% to 93.9%, while specificity fell from 91.1% to 87.9%. That tradeoff is understandable: a second read catches more cancers, but it also adds prompts that require human review.

Localization is not a cosmetic detail

The difference between a single coordinate and a heatmap has regulatory consequences. Paige’s coordinate is straightforward to measure: is it inside the annotated cancer region or not? It was inside the correct region in 94.5% of cancer cases. A Galen heatmap needs a stricter definition of what “the correct place” means. The study therefore used a two-stage measurement: high sensitivity for the full map region, then high specificity and PPV for the hottest region.

These details matter to the pathologist who will use the tool every day. A tool that points to “somewhere” on the slide is different from a tool that defines a hot zone the pathologist is expected to inspect first. FDA reads that as a difference in risk, not just a difference in user interface.

ArteraAI: when the claim moves from detection to prognosis

ArteraAI Prostate changes the nature of the discussion. The device does not look for cancer within the biopsy. It estimates the 10-year risk of distant metastasis and prostate cancer-specific mortality, and classifies patients as High, Intermediate, or Low. The intended population is men aged 55 years or older with non-metastatic prostate cancer who are eligible for treatment with curative intent.

The validation study included 886 patients at three US sites, with a median follow-up of 8.2 years. The separation between groups was clear: the 10-year risk of distant metastasis was 28.1% in the High group versus 3.3% in the Low group. For PCSM, it was 10.2% versus 0.6%. Here, a polished heatmap or high sensitivity is not enough. The claim is tied to a treatment decision, so it needs time-based data that cannot be replaced by a short study.

A locked algorithm is expected in these files. There is no continuous learning after release. The version reviewed is the version deployed, and any later change needs a controlled pathway. A Predetermined Change Control Plan has become a practical way to expand compatibility, such as adding new scanners, without turning every update into a full submission.

Genius Cervical AI: an integrated system, not standalone software

Hologic Genius Cervical AI works on ThinPrep Pap test slides and uses a CNN to select objects of interest for review within categories related to the Bethesda system. Here, the device is not just cloud software. The system includes the Genius Digital Imager, Image Management Server, Review Station, and a specified display. Hardware therefore becomes part of the clearance.

This model gives the company more control over image quality and display, but it expands the validation surface: illumination, mechanical motion, sensors, focus, the monitor, and the server. A software model that depends on external scanners reduces the hardware burden, but practical adoption is then tied to specific scanners. Across the four files discussed, the Philips Ultra Fast Scanner appears almost as a shared bridge. That is not a small observation for anyone buying or developing WSI-AI. Scanner compatibility may determine whether the tool can be used before anyone debates model accuracy.

What does this mean for the laboratory?

The first lesson: do not start from the model. Start from the clinical claim. Are you trying to detect a suspicious focus? Add a second read after a benign diagnosis? Estimate 10-year prognosis? Triage in cytology? Each claim brings a different study design, different endpoints, and different labeling.

The second lesson: adjunct language will remain central. All of these devices state that the final decision belongs to the pathologist or cytologist, and that the tool is not an independent replacement. This is not legal wording added at the end. It needs to appear in the workflow design itself: when the algorithm runs, what it displays, and who owns the final decision.

The third lesson: data diversity and subgroup analysis are now part of the regulatory expectation. The files describe training and test characteristics, sites, demographics, Gleason grade or NCCN risk categories when needed. If the data are narrow or biased, the problem will appear during review. It is better to fix it when building the cohort, not while writing the submission.

For the pathologist inside the laboratory, these files provide a practical way to evaluate any new product before purchase: ask about the exact indication, approved scanners, where the tool sits in the workflow, the type of validation study, its effect on sensitivity and specificity, and how subgroups were handled. Marketing promises are less useful than these questions.

Source: Innolitics.