NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images

NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

In the evolving landscape of medical technology, computational pathology has emerged as a cornerstone for advancements in diagnostic methods, specifically in the analysis of whole slide images.  The more information we can extract from these images, the better we can diagnose and treat diseases. A critical task in this domain is the segmentation of cell nuclei within histological images, a process that has seen significant improvements thanks to the advent of deep learning (DL) methods. This blog post deals with the NuInsSeg dataset, one of the most comprehensive resources for nuclei instance segmentation.

The Challenge of Nuclei Segmentation
Nuclei instance segmentation involves the identification and contouring of individual cell nuclei within histological images. Often, the underlying tissue sections are stained with hematoxylin and eosin (H&E) – the most common staining technique in medical histology – in which nuclei are stained blue, and the cytoplasm of the cells, collagen, and protein-containing solutions are stained nuanced red. The accurate segmentation of nuclei is crucial for a wide range of medical diagnostics, as it allows for detailed analysis of cell structures, which can reveal important information about the presence and progression of diseases.

Traditionally, this task has been approached through various computerized methods, including classical machine learning and image processing techniques. However, due to the complexity and variability of the tissue structures, these methods often fail to achieve the accuracy and precision required for reliable medical diagnosis. In recent years, supervised DL methods have demonstrated superior performance in nuclei segmentation tasks, offering a more effective solution to this challenge.

 Introducing NuInsSeg
Despite the promising capabilities of DL models, their effectiveness is heavily dependent on the availability of high-quality, fully annotated datasets for training. In the medical field, acquiring such datasets is particularly challenging due to the intricate nature of biological tissues and the expertise required for accurate annotation.

The NuInsSeg is one of the largest fully manually annotated datasets of nuclei in H&E-stained histological images, containing 665 image patches with over 30,000 manually segmented nuclei from 23 human and 8 mouse organs. This extensive dataset provides a robust resource for training DL models for nuclei segmentation, potentially improving the segmentation performance and reliability of computational pathology analyses.

One of the unique features of the NuInsSeg dataset is the inclusion of additional ambiguous area masks. These masks highlight areas of the images where precise and deterministic manual annotations are impossible, even for human experts. Acknowledging the inherent uncertainties in histological image analysis is crucial for developing DL models that can more accurately interpret and navigate the complexities of real-world medical images.


Potential Applications in Medicine
The applications of accurate nuclei segmentation in medicine are vast and varied. Here are a few key areas:

  • Personalized Medicine: Detailed cell analysis can also support the development of personalized treatment plans by identifying specific cellular characteristics that may influence a patient's response to certain therapies.
  • Disease Research: Researchers can use detailed nuclei segmentation to study the progression of various diseases at a cellular level, leading to new insights and potential treatments.
  • Cancer Diagnosis and Grading: By enabling precise analysis of cell structures, nuclei segmentation can greatly enhance the accuracy of cancer diagnoses and allow for more detailed grading of tumors based on their histological features.


By providing a detailed and accurately annotated resource for nuclei segmentation, the NuInsSeg dataset paves the way for more precise and effective analysis of histological images. As DL models continue to evolve, the importance of comprehensive datasets like NuInsSeg will only grow, underscoring the critical role of data in the intersection of technology and medicine.

Link to dataset:
Link to GitHub:
Link to paper:

Acknowledgment: Part of the text was generated by Open AI´s GPT-4 model (March 6, 2024) and subsequently reviewed and revised. The image was created with the assistance of  DALL·E version 3 (March 6, 2024).

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence
Machine Learning
Mathematics and Computing > Statistics > Statistics and Computing > Machine Learning
Medical Imaging
Life Sciences > Health Sciences > Clinical Medicine > Biomedical Devices and Instrumentation > Medical Imaging

Related Collections

With collections, you can get published faster and increase your visibility.

Medical imaging data for digital diagnostics

This Collection presents a series of articles describing annotated datasets of medical images and video. All medical specialities are considered and data can be derived from study participants, tissue samples, electronic health records (EHRs) or other sources.

Publishing Model: Open Access

Deadline: Dec 20, 2023

Remote sensing data for changes in land use

This Collection comprises a series of articles presenting data on changes to land use in urban areas, farmland, forests, and natural environments, as determined using remote sensing techniques.

Publishing Model: Open Access

Deadline: Jan 31, 2024