To generate the disease labels for this dataset, researchers utilized Natural Language Processing to extract disease classifications from the corresponding radiological reports. This process ensures that the labels are over 90% accurate, making them highly suitable for weakly-supervised learning. While the original radiology reports are not publicly available, you can find detailed information about the labeling process in the Open Access paper: “ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases” by Wang et al.