Automated Pathology Image Analysis: Enhancing Digital Pathology with a YOLO-based Object Detection Extension for QuPath

Hacettepe University
AIN492 End of Project Report

*Indicates Equal Contribution

Authors

Author 1

Arif Enes Aydın

Author 2

Metehan Sarikaya

Abstract

Pathology diagnosis and analysis are known for being time-consuming and prone to errors. Digital pathology analysis tools have been introduced to address these challenges, but limitations still persist. Our project was undertaken to address a weakness in pathology analysis tools by leveraging artificial intelligence. Qupath, the tool in question, is widely acknowledged for its utility in assisting pathologists with the analysis of various pathology slides. It allows users to analyze slides/images mainly at cell-level. However, its functionality is constrained by its ongoing development phase and increasing demand. Notably, it lacks the capability to effectively detect specific areas, such as tumor regions. In response to this limitation, an artificial intelligence model was developed to automatically identify tumor areas, thereby enhancing the tool's functionality for cell-level operations. The decision was made to employ YOLO-based computer vision models for this purpose. These models were chosen due to their ease of use and widespread popularity within the field. Initially, consideration was given to object detection models; however, it was later determined that segmentation models within the YOLO framework would offer superior performance. Collaborating with Hacettepe University Pathology Department, we tailored the model to meet their specific needs and requirements. Utilizing data provided by the department in the form of whole image slides, we tiled these slides and trained a YOLO model, achieving performance with ~0.75 recall, ~0.85 precision and ~0.8 F1 score. The achieved performance of the model presents a promising foundation for assisting pathologists in the initial stages of utilization. Subsequent to model development, an extension for Qupath was created to facilitate seamless integration of the AI model. This extension enables users to automatically identify desired areas, particularly tumor regions, with a single click. The main aim of this project is to accelerate the workflow of pathologists and alleviate their workload through the deployment of innovative solutions. Ongoing efforts are directed towards further refinement of the model's performance and optimization of inference speed to enhance diagnostic capabilities.

Introduction

Pathology diagnosis plays a crucial role in healthcare, guiding treatment decisions and prognoses for patients. However, traditional methods of pathology analysis are often time-consuming and prone to errors. With the advent of digital pathology analysis tools, there has been a significant advancement in the field, promising greater efficiency and accuracy. Nonetheless, certain limitations persist, hindering the full realization of the potential benefits of these tools. Our project seeks to address a specific weakness in pathology analysis tools through the integration of artificial intelligence (AI) techniques. The primary problem addressed by our project is the lack of robustness in current pathology analysis tools, particularly in the identification of specific regions of interest, such as tumor areas, within pathology slides. Despite the capabilities of existing tools like Qupath, which facilitate the analysis of pathology slides at a cellular level, they often fall short in accurately detecting and capturing important regions, leading to potential oversight and misdiagnosis by pathologists. The primary focus of this project is the segmentation of breast cancer areas within pathology images, with the objective of integrating these models into digital pathology analysis tools. It's noteworthy that much of the existing research in this field concentrates on nuclei-level segmentation, which diverges from the primary aim of our project. Instead, our emphasis lies on accurately capturing breast cancer areas within pathology images, with the ultimate goal of facilitating more effective diagnostic analysis within digital pathology platforms. Based on the identified problem and the insights gained from the literature review, our hypothesis is that integrating AI techniques, specifically YOLO-based computer vision models, into existing pathology analysis tools can enhance the detection of tumor areas and improve the overall efficiency of pathology analysis. We propose to develop a YOLO-based model trained on annotated pathology images to automatically identify tumor regions within slides. By leveraging the capabilities of YOLO models, which are known for their speed and accuracy in computer vision tasks, we aim to address the limitations of current pathology analysis tools and provide pathologists with a more efficient and reliable tool for diagnostic analysis. In summary, our proposed solution involves the integration of AI techniques, particularly YOLO-based computer vision models, into existing pathology analysis tools to improve the accuracy and efficiency of tumor segmentation. Through this approach, we aim to enhance the diagnostic capabilities of pathologists and ultimately improve patient outcomes in the field of pathology.

Related Work

The progress in breast cancer diagnosis through image analysis is extensive and developed significantly over the years. Initially, researchers focused on traditional machine learning methods like KNN(Knearest neighbor with K = 5), NB (Naive Bayes with kernel density), DT ( decision tree), and SVM(Support Vector Machine). The main problem of these methods was they used small datasets and weak to extract complex image patterns. To solve small dataset problems they used labor-intensive feature engineering. However, deep learning techniques have emerged as a powerful alternative, capable of handling large datasets and automatically extracting abstract features from data. Several notable studies have proved the efficacy of both traditional machine learning and deep learning methods in breast cancer based on histopathological images. For instance, Zhang et al. (2013) introduced a cascade random subspace ensemble scheme for microscopic biopsy image classification, achieving a high classification accuracy of 99.25%. Similarly, Kowal et al. (2013) and Filipczuk et al. (2013) employed clustering algorithms and traditional machine-learning methods on 500 real-case medical images from 50 patients and achieved breast cancer image classification with approximately 96–100% high accuracy. While machine learning methods were giving great results in classifying breast cancer, the sizes of the datasets are growing over time. Spanhol et al. (2016b) introduced a dataset called BreakHis. BreakHis consists of 7909 breast cancer histopathology images acquired from 82 patients. Aksac et al.(2019) introduced a dataset of 162 breast cancer histopathology images, namely the breast cancer histopathological annotation and diagnosis dataset (BreCaHAD). With the progress of datasets, deep learning methods like Convolutional Neural Networks (CNNs), have shown remarkable performance in breast cancer diagnosis. Spanhol et al. (2016b) employed the BreaHis dataset for histopathology image classification. Initially adopting LeNet, their efforts failed to surpass prior achievements, which stood at 72%. Therefore they continued with a variant of AlexNet (Krizhevsky et al., 2012) that improved classification accuracy by 4–6%. and Bayramoglu et al. (2016) utilized CNNs to classify histopathological images, achieving competitive results compared to traditional machine learning approaches. Moreover, studies like Araújo et al. (2017) and Han et al. (2017) explored CNN-based methods for multi-class classification, providing valuable insights for diagnosis and prognosis. However, limitations regarding the size of available datasets persist. To address this limitation, transfer learning techniques have been proposed, leveraging pre-trained models on large datasets like ImageNet to improve performance on smaller datasets. Transfer learning is a method that has proven itself to deal with real-world problems. It is based on repurposing knowledge from one task to another, increasing model performance by using previously learned features, reducing the need for hand-labeled data, and accelerating training in domains like computer vision. For instance, Nawaz et al. (2018) and Motlagh et al. (2018) employed transfer learning with models like DenseNet and ResNet_V1_152 to achieve high accuracy in breast cancer classification. D. Sanchez-Morillo and M.A. Fernandez-Granero et al. (2020) employed the DeepLabv3+ segmentation method that takes modified versions of the Mobilenetv2 (Sandler, Howard, Zhu, Zhmoginov & Chen, 2018), Xception or ResNet architectures as a backbone network to achieve nearly 64% MIoU and 93% FWIoU.

Methodology



Our work began by investigating the requirements of the task and understanding the specific needs of the pathology department, given that they would be providing the data and conducting the annotations. Through meetings, we concluded that accurately capturing the precise shape of tumor areas is essential for pathologists. It became evident that the existing tool in question, QuPath, lacks the capability to differentiate between tumor and non-tumor areas when annotations are made using rectangles (bounding boxes). Consequently, the coexistence of both tumor and non-tumor regions within a single rectangle would be mostly unavoidable. As a result, the focus of our approach shifted towards segmentation rather than object detection. While awaiting annotations, our attention turned to preparing the data for compatibility with deep learning models, specifically YOLO. The annotations were being made on QuPath, since it offers the flexibility to annotate data in various formats relevant to computer vision, including bounding boxes for object detection and polygons for segmentation. However, the challenge arose from the size and resolution of pathology slides, often exceeding 5-10 gigabytes and reaching resolutions of up to 100,000 x 100,000 pixels. Moreover, these slides are stored in a specific file format, making direct usage with the model impossible without dividing it into smaller tiles.


Annotated pathology slide

Annotated pathology slide


To address this, we explored QuPath's built-in scripting tool, which utilizes the Groovy language. This tool provides functions to interact with QuPath, enabling diverse import/export and analysis capabilities. Our focus then shifted to leveraging this scripting tool to achieve tiling functionality. Through scripting within QuPath, we devised a method to extract annotated areas as tiles from the currently opened image, alongside their segmentation masks.


Tiling arguments

Tiling arguments


We used the parameters given in the code snippet:

classNames: An array containing the names of classes or labels. In our project, we have one class named "Tumor".
downsample: A factor by which the image is downsampled. It's set to 3, meaning the image is reduced in size by a factor of 3. We experimented with various downsampling values, but when we opted for a high factor, we ended up with a limited number of images. Given that our dataset was already smaller than the one used in the reference project, we aimed to avoid further loss of information.
patchSize: Defines the size of patches or segments of the image. It's set to 640, which corresponds to the input size of YOLO.
pixelOverlap: The amount of overlap between adjacent patches, measured in pixels. It's set to 160, suggesting that adjacent patches will overlap by 160 pixels.
imageExtension: Represents the file extension of the image files. It's set to ".png", denoting that the images are formatted in PNG. PNG format was chosen over JPG or TIFF due to its compatibility and superior quality.
multiChannel: A boolean variable indicating whether the images are multi-channel or not.
onlyAnnotated: Another boolean option. When set to true, it indicates that only annotated images will undergo processing. We set it to true to isolate the annotated regions from the background. Without this setting, the scanning of images resulted in a high presence of background, and the tiling process without background cleaning led to a lower frequency of annotated regions.
An example tile and its mask.

An example tile and its mask.


After the completion of annotations and the creation of tiles for each slide, all tiles and masks are accumulated within a directory for further processing. However, YOLO's data format differs from other models, as it represents each annotation as a text file with each object represented as a line in the following format:

To accommodate this format, we converted the masks by extracting polygons and saving these polygons in the appropriate format. Notably, the default YOLO format does not support the representation of polygons with holes. To address this limitation, we found a solution that involves connecting inner holes with outer circles, which has been published as open source. This approach enables us to accurately represent holes within the polygons with YOLO format, without deforming actual polygon coordinates. link


An example original mask, and visualization of the created YOLO format.

An example original mask, and visualization of the created YOLO format.


Model Training As described in the dataset and introduction part, the aim of this project was the segmentation of tumor-contained areas. Due to its notable speed and efficiency, we chose YOLO as the base model over state-of-the-art instance segmentation models like Fast R-CNN, Mask R-CNN, and Panoptic FPN. YOLO, short for "You Only Look Once," has emerged as a pivotal technology in various domains due to its real-time object detection capabilities. The model works based on the image’s grids. The algorithm of this process works by dividing the input image into a grid of cells. Each cell is responsible for predicting bounding boxes and class probabilities for an object within it. This approach allows YOLO to be computationally efficient. In this context, YOLOv8 represents a single-stage object detection model. The single-stage detector works by processing all grids at once. A key disparity between a two-stage detector and a single-stage detector lies in how they handle processing. Specifically, in the two-stage approach, the processing of regions of interest and classification occurs on different cycles.

Extension After obtaining the model, we explored various methods to integrate it into QuPath. Since QuPath is written in Java, which is not widely used in machine learning, it lacks the convenience of tools and libraries for seamless integration with machine learning models. Additionally, the ultralytics library, which facilitates YOLO model training and predictions in Python, presents challenges due to its complex class structures and design, making implementation of model inference through Java more difficult, especially considering time constraints. Our approach involved using a Python script to handle model inference and output conversion, while restricting the extension to input/output tasks only. Although this approach requires additional setup, including the presence of Python and the script on the computer, it offers significantly more flexibility in terms of model integration. By separating the model inference process from Java, we were able to overcome the limitations posed by the language and leverage the rich ecosystem of machine learning tools available in Python.


Extension interface. Containing inputs for script directory (which contains script, model and virtual environment), model confidence, model IoU. The outputs are displayed on the text field at the bottom.

Extension interface.


The overall workflow of the extension is as follows:

  • (1.) A region of interest (ROI) is selected within QuPath, which can be of any size but must be in the form of a rectangle.

  • (2.) Upon pressing the "Segment Selected Region" button:

  • (2.a.) The selected region is tiled to match the size required for model inference. These tiled images are temporarily saved within the script directory.

  • (2.b.) The script executes, inferring each tile and reconstructing the main mask matching selected ROI in size. Once the main ROI segment mask is generated, the polygons representing the segmented areas are extracted and converted to the GEOJSON format, which QuPath's input/output operations can process. This file is temporarily saved
  • (3.) After the script completes its execution, the extension reads the GEOJSON file and imports the predictions, which are then displayed on the QuPath screen. Subsequently, the extension cleans up the temporary files and prepares for the next prediction.

  • Workflow of the extension.

    Workflow of the extension.

    Results

    Our best performing model was achieved with YOLOv8m-seg. The obtained validation loss graphs indicate that the training was consistent and reached a plateau, suggesting that the model learned everything it could and training concluded before overfitting occurred. Despite our efforts, our best-performing model achieved only a modest ~0.75 mAP@0.5 score, which falls below the desired standard for medical applications of AI. This score may be attributed to the limited quantity of data available for training. When inspecting the confusion matrix, it's important to note that there are no True Negatives (TN) since there is only a single class. From confusion matrix, we can deduce the following metrics:
    Recall: TP / (TP + FN) = 0.735
    Precision: TP / (TP + FP) = 0.842
    F1 Score: 2 * (Recall * Precision) / (Recall + Precision) = 0.785
    Additionally, since the number of False Negatives (FN) is greater than the number of False Positives (FP), it suggests that the model tends to miss more instances than it misclassifies. This observation may indicate that the model could benefit from either a larger and more diverse training set or improved annotations. From these graphs, it can be inferred that while the model's performance may not be groundbreaking, it represents a solid starting point, particularly when considering the limited quantity of data available for training. Even though the model appears to capture most of the areas, this is largely due to the nature of the currently annotated data, which includes stained tumor areas. There are situations where tumor areas are not stained but can still be detected by a professional pathologist, as well as instances of stained areas that are not tumors. Our current aim is to enhance the model's ability to differentiate and successfully capture these challenging areas.

    Video Presentation

    Poster

    BibTeX

    BibTex Code Here