Concepedia

Publication | Open Access

NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer

97

Citations

43

References

2022

Year

TLDR

High‑resolution cellular mapping underpins interpretable machine‑learning models in computational pathology, yet generating sufficient high‑quality labels remains a major bottleneck due to the time and effort required from pathologists. This work proposes a scalable crowdsourcing framework that engages medical students and pathologists to produce over 220,000 nucleus annotations and introduces a Decision Tree Approximation of Learned Embeddings (DTALE) to enhance model transparency. The framework combines weak‑algorithm‑guided suggestions, systematic inter‑rater agreement analysis, Mask‑RCNN model refinements, and the DTALE technique to generate accurate, transparent nucleus segmentation and classification data. The crowdsourced annotations, validated against expert tracings, improve non‑expert accuracy, provide valuable training data for segmentation algorithms, and are freely available at https://sites.google.com/view/nucls.

Abstract

High-resolution mapping of cells and tissue structures provides a foundation for developing interpretable machine-learning models for computational pathology. Deep learning algorithms can provide accurate mappings given large numbers of labeled instances for training and validation. Generating adequate volume of quality labels has emerged as a critical barrier in computational pathology given the time and effort required from pathologists. In this paper we describe an approach for engaging crowds of medical students and pathologists that was used to produce a dataset of over 220,000 annotations of cell nuclei in breast cancers. We show how suggested annotations generated by a weak algorithm can improve the accuracy of annotations generated by non-experts and can yield useful data for training segmentation algorithms without laborious manual tracing. We systematically examine interrater agreement and describe modifications to the MaskRCNN model to improve cell mapping. We also describe a technique we call Decision Tree Approximation of Learned Embeddings (DTALE) that leverages nucleus segmentations and morphologic features to improve the transparency of nucleus classification models. The annotation data produced in this study are freely available for algorithm development and benchmarking at: https://sites.google.com/view/nucls.

References

YearCitations

Page 1