Pathology diagnosis and analysis are known for being time-consuming and prone to errors. Digital pathology analysis tools have been introduced to address these challenges, but limitations still persist. Our project was undertaken to address a weakness in pathology analysis tools by leveraging artificial intelligence. Qupath, the tool in question, is widely acknowledged for its utility in assisting pathologists with the analysis of various pathology slides. It allows users to analyze slides/images mainly at cell-level. However, its functionality is constrained by its ongoing development phase and increasing demand. Notably, it lacks the capability to effectively detect specific areas, such as tumor regions. In response to this limitation, an artificial intelligence model was developed to automatically identify tumor areas, thereby enhancing the tool's functionality for cell-level operations. The decision was made to employ YOLO-based computer vision models for this purpose. These models were chosen due to their ease of use and widespread popularity within the field. Initially, consideration was given to object detection models; however, it was later determined that segmentation models within the YOLO framework would offer superior performance. Collaborating with Hacettepe University Pathology Department, we tailored the model to meet their specific needs and requirements. Utilizing data provided by the department in the form of whole image slides, we tiled these slides and trained a YOLO model, achieving performance with ~0.75 recall, ~0.85 precision and ~0.8 F1 score. The achieved performance of the model presents a promising foundation for assisting pathologists in the initial stages of utilization. Subsequent to model development, an extension for Qupath was created to facilitate seamless integration of the AI model. This extension enables users to automatically identify desired areas, particularly tumor regions, with a single click. The main aim of this project is to accelerate the workflow of pathologists and alleviate their workload through the deployment of innovative solutions. Ongoing efforts are directed towards further refinement of the model's performance and optimization of inference speed to enhance diagnostic capabilities.
Pathology diagnosis plays a crucial role in healthcare, guiding treatment decisions and prognoses for patients. However, traditional methods of pathology analysis are often time-consuming and prone to errors. With the advent of digital pathology analysis tools, there has been a significant advancement in the field, promising greater efficiency and accuracy. Nonetheless, certain limitations persist, hindering the full realization of the potential benefits of these tools. Our project seeks to address a specific weakness in pathology analysis tools through the integration of artificial intelligence (AI) techniques. The primary problem addressed by our project is the lack of robustness in current pathology analysis tools, particularly in the identification of specific regions of interest, such as tumor areas, within pathology slides. Despite the capabilities of existing tools like Qupath, which facilitate the analysis of pathology slides at a cellular level, they often fall short in accurately detecting and capturing important regions, leading to potential oversight and misdiagnosis by pathologists. The primary focus of this project is the segmentation of breast cancer areas within pathology images, with the objective of integrating these models into digital pathology analysis tools. It's noteworthy that much of the existing research in this field concentrates on nuclei-level segmentation, which diverges from the primary aim of our project. Instead, our emphasis lies on accurately capturing breast cancer areas within pathology images, with the ultimate goal of facilitating more effective diagnostic analysis within digital pathology platforms. Based on the identified problem and the insights gained from the literature review, our hypothesis is that integrating AI techniques, specifically YOLO-based computer vision models, into existing pathology analysis tools can enhance the detection of tumor areas and improve the overall efficiency of pathology analysis. We propose to develop a YOLO-based model trained on annotated pathology images to automatically identify tumor regions within slides. By leveraging the capabilities of YOLO models, which are known for their speed and accuracy in computer vision tasks, we aim to address the limitations of current pathology analysis tools and provide pathologists with a more efficient and reliable tool for diagnostic analysis. In summary, our proposed solution involves the integration of AI techniques, particularly YOLO-based computer vision models, into existing pathology analysis tools to improve the accuracy and efficiency of tumor segmentation. Through this approach, we aim to enhance the diagnostic capabilities of pathologists and ultimately improve patient outcomes in the field of pathology.
The progress in breast cancer diagnosis through image analysis is extensive and developed significantly over the years. Initially, researchers focused on traditional machine learning methods like KNN(Knearest neighbor with K = 5), NB (Naive Bayes with kernel density), DT ( decision tree), and SVM(Support Vector Machine). The main problem of these methods was they used small datasets and weak to extract complex image patterns. To solve small dataset problems they used labor-intensive feature engineering. However, deep learning techniques have emerged as a powerful alternative, capable of handling large datasets and automatically extracting abstract features from data. Several notable studies have proved the efficacy of both traditional machine learning and deep learning methods in breast cancer based on histopathological images. For instance, Zhang et al. (2013) introduced a cascade random subspace ensemble scheme for microscopic biopsy image classification, achieving a high classification accuracy of 99.25%. Similarly, Kowal et al. (2013) and Filipczuk et al. (2013) employed clustering algorithms and traditional machine-learning methods on 500 real-case medical images from 50 patients and achieved breast cancer image classification with approximately 96–100% high accuracy. While machine learning methods were giving great results in classifying breast cancer, the sizes of the datasets are growing over time. Spanhol et al. (2016b) introduced a dataset called BreakHis. BreakHis consists of 7909 breast cancer histopathology images acquired from 82 patients. Aksac et al.(2019) introduced a dataset of 162 breast cancer histopathology images, namely the breast cancer histopathological annotation and diagnosis dataset (BreCaHAD). With the progress of datasets, deep learning methods like Convolutional Neural Networks (CNNs), have shown remarkable performance in breast cancer diagnosis. Spanhol et al. (2016b) employed the BreaHis dataset for histopathology image classification. Initially adopting LeNet, their efforts failed to surpass prior achievements, which stood at 72%. Therefore they continued with a variant of AlexNet (Krizhevsky et al., 2012) that improved classification accuracy by 4–6%. and Bayramoglu et al. (2016) utilized CNNs to classify histopathological images, achieving competitive results compared to traditional machine learning approaches. Moreover, studies like Araújo et al. (2017) and Han et al. (2017) explored CNN-based methods for multi-class classification, providing valuable insights for diagnosis and prognosis. However, limitations regarding the size of available datasets persist. To address this limitation, transfer learning techniques have been proposed, leveraging pre-trained models on large datasets like ImageNet to improve performance on smaller datasets. Transfer learning is a method that has proven itself to deal with real-world problems. It is based on repurposing knowledge from one task to another, increasing model performance by using previously learned features, reducing the need for hand-labeled data, and accelerating training in domains like computer vision. For instance, Nawaz et al. (2018) and Motlagh et al. (2018) employed transfer learning with models like DenseNet and ResNet_V1_152 to achieve high accuracy in breast cancer classification. D. Sanchez-Morillo and M.A. Fernandez-Granero et al. (2020) employed the DeepLabv3+ segmentation method that takes modified versions of the Mobilenetv2 (Sandler, Howard, Zhu, Zhmoginov & Chen, 2018), Xception or ResNet architectures as a backbone network to achieve nearly 64% MIoU and 93% FWIoU.
The overall workflow of the extension is as follows:
Our best performing model was achieved with YOLOv8m-seg. The obtained validation loss graphs indicate that the training was consistent and reached a plateau, suggesting that the model learned everything it could and training concluded before overfitting occurred. Despite our efforts, our best-performing model achieved only a modest ~0.75 mAP@0.5 score, which falls below the desired standard for medical applications of AI. This score may be attributed to the limited quantity of data available for training. When inspecting the confusion matrix, it's important to note that there are no True Negatives (TN) since there is only a single class. From confusion matrix, we can deduce the following metrics:
Recall: TP / (TP + FN) = 0.735
Precision: TP / (TP + FP) = 0.842
F1 Score: 2 * (Recall * Precision) / (Recall + Precision) = 0.785
Additionally, since the number of False Negatives (FN) is greater than the number of False Positives (FP), it suggests that the model tends to miss more instances than it misclassifies. This observation may indicate that the model could benefit from either a larger and more diverse training set or improved annotations.
From these graphs, it can be inferred that while the model's performance may not be groundbreaking, it represents a solid starting point, particularly when considering the limited quantity of data available for training. Even though the model appears to capture most of the areas, this is largely due to the nature of the currently annotated data, which includes stained tumor areas. There are situations where tumor areas are not stained but can still be detected by a professional pathologist, as well as instances of stained areas that are not tumors. Our current aim is to enhance the model's ability to differentiate and successfully capture these challenging areas.
BibTex Code Here