Due to the high cost of obtaining TB data, we have combined data from four smaller TB datasets: DA, DB, Montgomery, and Shenzhen. These datasets contain 156, 150, 138, and 662 X-ray images, respectively. We use parts of each dataset for training and validation, as indicated in the ‘imgs/extra/’ folder. The remaining images from these datasets, along with our 2,800 testing X-rays, form the new testing set listed in ‘all_test.txt’. We also merge our data with the training and validation X-rays from these four datasets to create updated training, validation, and combined training+validation sets, available as ‘all_train.txt’, ‘all_val.txt’, and ‘all_trainval.txt’.
Please note that the ground truth for the testing set will not be released, as it is used for an online competition in computer-aided tuberculosis diagnosis. For model development, we recommend using the ‘all_train’ set for training and ‘all_val’ set for validation. When submitting results, train your model on the ‘all_trainval’ set and test on the ‘all_test’ set.