Behind the Paper

VinDr-Mammo: The largest public dataset of full-field digital mammography to evaluate and compare algorithmic support systems for breast cancer screening

We introduce a new benchmark dataset of FFDM for detecting and diagnosing breast cancer and other diseases in mammography. We make VinDr-Mammo publicly available as a new imaging resource to promote advances in developing CADe/x tools for mammography interpretation.

Published in Research Data

Mar 29, 2023

Hieu Pham and Ha Q. Nguyen

2 contributors

VinDr-Mammo: The largest public dataset of full-field digital mammography to evaluate and compare algorithmic support systems for breast cancer screening

Liked by India Ambler and 2 others

Explore the Research

Breast cancer is among the most prevalent cancers and accounts for the largest portion of cancer deaths. Interpreting mammography for breast cancer screening is a challenging task. Improving cancer screening results may help reduce the cost of follow-up examinations and unnecessary mental burdens on patients.

We introduce and release the VinDr-Mammo dataset, an open-access large-scale Vietnamese dataset of full-field digital mammography consisting of 5,000 four-view exams with breast-level assessment and extensive lesion-level annotations. The paper has been accepted for publication by Scientific Data. In this work, our aim is to enhance the diversity of the publicly available mammography data for a more robust AI system and to lean towards a more interpretable system via extensive lesion-level annotations. Mammographies were acquired retrospectively from two primary hospitals in Hanoi, Vietnam, namely Hospital 108 (H108) and Hanoi Medical University Hospital (HMUH). Breast cancer assessment and density are reported following Breast Imaging Reporting and Data System. Breast abnormalities that need short-term follow-up or are suspicious of malignancy are marked by bounding rectangles. Following European guidelines, mammography exams were independently double-read. Any discordance between the two radiologists would be resolved by arbitration with the involvement of a third radiologist. To the best of our knowledge, VinDr-Mammo is currently the largest public dataset (20,000 scans) of full-field digital mammography that provides breast-level BI-RADS assessment category along with suspicious or probably benign findings that need follow-up examination. By introducing the dataset, we contribute a benchmarking imaging dataset to evaluate and compare algorithmic support systems for breast cancer screening based on FFDM.

A sample mammography exam with the right breast assessed with BI-RADS 5, density B and the left breast with BI-RADS 1, density B. CC denotes craniocaudal and MLO denotes mediolateral oblique.

The VinDr-Mammo dataset was created for the purpose of developing and evaluating computer-aided detection and diagnosis algorithms based on full-field digital mammography. In addition, it can also be used for general tasks in computer vision, such as object detection and multiple-label image classification. To download and explore this dataset, users are required to accept a Date Usage Agreement (DUA) called PhysioNet Credentialed Health Data License 1.5.0. By accepting this DUA, users agree that the dataset can be used for scientific research and educational purposes only and will not attempt to re-identify any patients, institutions, or hospitals.

Multiple Contributors

Hieu Pham and Ha Q. Nguyen

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data

Research Communities > Community > Research Data

Scientific Data

Scientific Data

A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Data for crop management

This Scientific Data Collection welcomes submissions of Data Descriptors associated with datasets for crop management, which are essential for optimising agricultural productivity, sustainability, and food security.

Publishing Model: Open Access

Deadline: Apr 17, 2026

Explore this Collection

Invertebrate omics

This Scientific Data Collection welcomes Data Descriptors documenting the curation, validation, and open sharing of genomic, transcriptomic, and proteomic datasets for invertebrate species.

Publishing Model: Open Access

Deadline: May 08, 2026

Explore this Collection

VinDr-CXR: The largest public chest X-ray dataset with radiologist-generated annotations for machine learning-based computer-aided diagnosis (CAD)

Behind the Paper

PediCXR: Advancing the interpretation of common thoracic diseases in children

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

VinDr-Mammo: The largest public dataset of full-field digital mammography to evaluate and compare algorithmic support systems for breast cancer screening

Share this post

Share with...

...or copy the link