Weakly-supervised convolutional neural networks for multimodal image registration

TLDR

Multimodal image registration suffers from a lack of voxel‑level ground truth, and typical anatomical labels such as organs, vessels, ducts, and other landmarks are used instead. The study proposes a method to infer voxel‑level transformations from higher‑level anatomical label correspondences. An end‑to‑end CNN predicts displacement fields aligning labelled structures during training, while inference uses only unlabelled image pairs, and multiple network variants were evaluated on T2‑weighted MRI and 3D transrectal ultrasound prostate images. The method achieves a median target registration error of 3.6 mm and a median Dice of 0.87 on prostate glands, runs in real time, and is fully automated without requiring anatomical labels or initialization.

Abstract

One of the fundamental challenges in supervised learning for multimodal image registration is the lack of ground-truth for voxel-level spatial correspondence. This work describes a method to infer voxel-level transformation from higher-level correspondence information contained in anatomical labels. We argue that such labels are more reliable and practical to obtain for reference sets of image pairs than voxel-level correspondence. Typical anatomical labels of interest may include solid organs, vessels, ducts, structure boundaries and other subject-specific ad hoc landmarks. The proposed end-to-end convolutional neural network approach aims to predict displacement fields to align multiple labelled corresponding structures for individual image pairs during the training, while only unlabelled image pairs are used as the network input for inference. We highlight the versatility of the proposed strategy, for training, utilising diverse types of anatomical labels, which need not to be identifiable over all training image pairs. At inference, the resulting 3D deformable image registration algorithm runs in real-time and is fully-automated without requiring any anatomical labels or initialisation. Several network architecture variants are compared for registering T2-weighted magnetic resonance images and 3D transrectal ultrasound images from prostate cancer patients. A median target registration error of 3.6 mm on landmark centroids and a median Dice of 0.87 on prostate glands are achieved from cross-validation experiments, in which 108 pairs of multimodal images from 76 patients were tested with high-quality anatomical labels.

References

Page 1

	Year	Citations

Page 1