Document Type : Review Article

Authors

1 University of Jordan, Department of Computer Science, Amman, Jordan

2 University of Jordan, Department of Computer Information Systems, Amman, Jordan

Abstract

Lung diseases significantly impact the world regarding health, economic cost, and social and psychological well-being. X-ray images are a primary method for diagnosing lung diseases, but the manual analysis of these images can be time-consuming, subjective, and prone to inaccuracies. However, it is essential to diagnose lung diseases in a timely manner and with high accuracy to ensure effective treatment and management. This study introduces an innovative deep-learning version termed the "ESSDN-LN model" to overcome these challenges. It is a variant of the single shot detector (SSD) network. This model aims to rapidly and accurately detect and classify six types of lung disease: aortic enlargement, cardiomegaly, pleural thickening, pulmonary fibrosis, COVID-19, and pneumonia. The ESSDN-LD model was introduced in three versions: ESSDN-LDV1, ESSDN-LDV2, and ESSDN-LDV3. ESSDN-LDV1 incorporates the SSD with batch normalization, dropout regularization, and data augmentation techniques. ESSDN-LDV2 builds upon the advancements of ESSDN-LDV1 by incorporating the random search algorithm for adjusting model hyper-parameters and introducing the skip connections technique to enhance the detection performance. Furthermore, ESSDN-LDV3 further enhances the capabilities of ESSDN-LDV1 using the genetic algorithm for hyper-parameter tuning and incorporating feature fusion and skip connections techniques, thereby significantly improving the detection performance. The ESSDN-LDV3 model demonstrated exceptional performance compared to other versions, achieving a remarkable accuracy of 96.5% and a prediction time of 0.018 seconds in the seven-class classification. Furthermore, it achieved a total accuracy of 98.4% and a prediction time of 0.013 seconds in the three-class classification, encompassing Covid-19, pneumonia, and no-finding cases. These impressive results highlight the effectiveness and efficiency of the proposed method in accurately classifying lung diseases and can contribute to improved patient outcomes and treatment decisions.

Graphical Abstract

Lung Disease Detection and Classification Using Single Shot Multi-Box Detector Network: A Comprehensive Study

Keywords

Introduction

Lung diseases pose significant challenges to global health, impacting economic costs and social well-being. Annually, these debilitating conditions claim a million lives [1]. The early and accurate identification of lung diseases is crucial in improving patient outcomes and guiding effective treatment decisions. X-ray image analysis stands as the primary method for diagnosing lung diseases. However, manual assessments by highly skilled radiologists can be time-consuming and prone to errors [2]. Hence, there is an urgent need for efficient and precise diagnostic methods to address these challenges. One promising avenue extensively studied is the application of computerized models for rapid and accurate detection and classification of lung diseases, including aortic enlargement, cardiomegaly, pleural thickening, pulmonary fibrosis, COVID-19, and pneumonia. In recent years, there has been a notable increase in research focusing on deep learning methods in medical image analysis applications, particularly in detecting and classifying lung diseases. Deep learning techniques have proven highly effective in automatically detecting and classifying lung diseases from medical images, offering great potential for improved diagnosis and treatment [3]. Among the various deep learning architectures, the SSD network has demonstrated promising results in lung disease detection and diagnosis [4]. Notably, the SSD network boasts a fast inference speed, making it suitable for real-time and semi-real-time applications [5].

Research Objectives

The main aim of this research is to develop an automated application of computerized models for rapid and accurate detection and classification of lung diseases, including aortic enlargement, cardiomegaly, pleural thickening, pulmonary fibrosis, COVID-19, and pneumonia. Thus, the key objectives are:

  1. Enhance the SSD architecture for lung disease detection and diagnosis.
  2. Develop an innovative deep-learning model to address manual X-ray image analysis's challenges in diagnosing and detecting lung diseases.
  3. Detect, diagnose, and localize six lung diseases rapidly and accurately.
  4. Design a cost-effective version that requires minimal human intervention.
  5. Create a scalable model suitable for large-scale screening programs and public health initiatives.
  6. Reduce the strain on healthcare infrastructure and the risk of misdiagnosis.

The contribution of this research is a breakthrough deep learning model, the Enhanced Single Shot Multi-Box Detector Network for Lung Disease Detection and Diagnosis (ESSDN-LD). The ESSDN-LD is introduced in three versions: ESSDN-LDV1, ESSDN-LDV2, and ESSDN-LDV3.

The ESSDN-LDV1 comprises the SSD architecture with essential enhancements, including batch normalization, dropout regularization, and data augmentation. The ESSDN-LDV2 builds upon the advancements of ESSDN-LDV1, but it cooperates with the skip connections between lower-level and higher-level feature maps. These connections facilitate the seamless propagation of information across multiple scales, enabling the model to capture local and global features. Likewise, it introduced the random search algorithm to fine-tune the model's hyper-parameters. The ESSDN-LDV3 builds upon the advancement of ESSDN-LDV1, but it cooperates with feature fusion and skips connections between the SSD layers, which combine feature maps from various network layers. This fusion allows the model to harness low-level and high-level features simultaneously to enhance the detection of lung diseases. Also, it cooperates with the genetic algorithm to fine-tune the model's hyper-parameters.

The ESSDN-LD model contributes significantly to the global fight against lung diseases by providing a reliable and swift diagnostic model, laying the foundation for improved patient care and timely interventions.

Lung diseases

The lung is vulnerable to various illnesses, including chronic obstructive pulmonary disease (COPD), asthma, lung cancer, pulmonary fibrosis, pulmonary hypertension, tuberculosis (TB), pneumonia, sarcoidosis, and cystic fibrosis. These diseases manifest with symptoms like shortness of breath, coughing, chest discomfort, wheezing, and fatigue [6]. The impact of these diseases on an individual's health, well-being, and quality of life is substantial. Therefore, timely and accurate diagnoses are crucial in improving patient outcomes and guiding effective treatment decisions [7, 8]. Identifying these diseases involves various diagnostic methods, including physical examination, laboratory tests, bronchoscopy, biopsy, allergy testing, and imaging tests [8]. Among these methods, imaging tests such as X-rays, CT scans, and MRI scans are commonly utilized to gather detailed information about lung conditions. These imaging techniques aid in identifying abnormalities such as tumors, infections, fluid accumulation, or structural irregularities.

Literature review

In recent years, numerous studies have demonstrated the effectiveness of deep learning methods in the lung disease detection and classification.  Xie et al. [9] proposed a deep learning-based version using CNNs based on Faster R-CNN for pulmonary nodule detection in CT images. The proposed model achieved a sensitivity of 86.42% and demonstrated the potential of deep learning methods for lung disease detection. Hu et al. [10] presented a multi-kernel depth-wise convolution learning-based version for various types of lung disease classification. Their method achieved high accuracy, including a performance of 98.3% for classifying X-ray images into pneumonia or normal, highlighting the power of deep learning methods for lung disease classification. Sheykhivand et al. [11] introduced a deep learning model utilizing Generative Adversarial Networks, transfer learning, and LSTM networks to classify the viral, bacterial, and COVID-19 diseases in X-ray images. The proposed model achieved an accuracy of 90% in six functional scenarios classification and 99% in diagnosing COVID-19. Souid et al. [12] employed a CNN-based deep learning model that modified the MobileNet V2 with transfer learning and metadata leveraging to recognize 14 lung diseases. Their proposed model achieved an AUC score of 0.811 and an accuracy of around 90%, illustrating the potential of deep learning methods for lung disease detection and classification.

Arifin et al. [13] developed a deep-learning model designed for deployment in a mobile application. Utilizing MobileNet's Single Shot Detection, their lightweight model achieved a high classification performance of 93.24% for COVID-19, viral pneumonia, and normal cases, underscoring the effectiveness of SSD networks in lung disease classification. Lin et al. [14] deployed a deep learning-based version with the RRNet model, integrating the advantages of RepVGG and Resblock to diagnose 14 lung diseases. Their proposed network achieved high detection accuracy and inference speed, demonstrating the potential of Single-Shot Refinement Neural Networks in achieving both performance and efficiency in computer-aided diagnosis systems.

Goyal et al. [15] introduced a novel framework utilizing recurrent neural networks and long short-term memory for lung disease diagnosis. Their model achieved an accuracy of 95%, demonstrating the potency of deep learning methods in lung disease diagnosis.

These studies collectively showcase the significant advancements and potential of deep learning methods in lung disease detection, classification, and diagnosis, offering valuable insights for future research in medical imaging techniques.

Martials and Methods

The SSDN-LD (Single-Shot Detection Network for Lung Diseases) is a deep learning model designed to accurately and rapidly detect, diagnose, and localize six types of lung diseases. The methodology employed by the SSDN-LD involves several operations:

Dataset Acquisition: Obtained a large dataset of chest X-ray images with annotations indicating the presence or absence of the six lung diseases (aortic enlargement, cardiomegaly, pleural thickening, pulmonary fibrosis, COVID-19, and pneumonia).

Pre-processing: The acquired images undergo pre-processing to ensure that they have a consistent size and quality.

Model Training: The SSDN-LD model is trained using different architectures of SSD and various hyper-parameter settings.

Model Validation: The trained SSDN-LD model is validated using a separate dataset to assess its performance and generalization ability. This step ensures the model can detect and classify lung diseases in new and unseen X-ray images.

Deployment: Once the model is trained and validated, it can be deployed in a healthcare setting to assist medical professionals in the lung diseases detection of chest X-ray images, providing them with valuable support in making accurate diagnoses.

The proposed SSDN-LD was introduced in three versions: ESSDN-LDV1, ESSDN-LDV2, and ESSDN-LDV3. These versions utilize different components and hyper-parameter tuning methods within the SSD architecture to enhance the performance of lung disease detection and classification in X-ray images.

This section describes the SSD architecture used in the SSDN-LD model, the hyper-parameters of SSDN-LD, and the specific architectures of SSDN-LD1, SSDN-LD2, and SSDN-LD3. In addition, it provides a detailed description of the datasets used, the applied pre-processing techniques, and the performance metrics employed to evaluate the model's effectiveness.

Single shot multibox detector

The SSD uses a single deep neural network to detect and classify objects in images, making it faster and more efficient than many other object detection algorithms [5]. In addition, it works by dividing the input image into a grid of fixed-size boxes at different scales and aspect ratios. Each of these boxes is called an anchor box, and the network predicts the probability of each anchor box containing an object and the offset of the proposal box from the anchor box. As a result, the network can detect objects of different sizes and aspect ratios in the input image [5]. The proposed SSD architecture includes three main layers:

Base Feature Extraction Layers (BFELs): These layers consist of convolutional and pooling operations with ResNet architecture. BFELs were responsible for capturing low-level features, such as edges, corners, and textures, from the input images. BFELs were proposed using ResNet CNN architecture.

Intermediate Feature Layers (IFLs): These layers’ capture higher-level features with increasing receptive fields, enabling the model to capture more context and semantic information.

Prediction Layers (PLs): These layers consist of convolutional layers that produce feature maps specific to different scales. The SSD is trained using a multi-task loss function that combines classification and localization loss. The classification loss measures the difference between the predicted object class probabilities and the ground truth labels, and the localization loss measures the difference between the predicted box offsets and the ground truth box offsets [5].

Figure 1 displays the SSD network architecture. One of the advantages of the proposed SSD is its speed and efficiency. It can process images in real-time, and semi-real-time is well-suited for applications that require fast and accurate object detection.

SSD’s hyper-parameters

Hyper-parameters are arguments responsible for tuning the algorithms to learn and adjust the neural network performance. This study proposed several hyper-parameters to be tuned in SSD: Learning rate (K), Batch size (B), Training epoch (E), Padding (P), Optimizer (O), Momentum (M), and Decay (D).

K is an argument used to judge the speed at which the neural network model learns the values of a parameter. B is an argument defined as the number of samples before learning or updating the model parameters. Furthermore, the number of epochs is an argument-defined time that the learning model will do over the entire training dataset.

P is an argument responsible for keeping the spatial sizes fixed after the convolution operation by adding columns and rows of zero values. P has two types valid (without padding) and the same (with zero padding) [16, 17].

Moreover, D is a hyper-parameter that adjusts the moving averages. M speeds up the convergence of the optimization methods with the gradient technique. O is a technique that updates the model weights to minimize the loss function. O has three main types stochastic gradient descent (SGD), the Adam, and the Root Mean Square Propagation (RMSProp) [17].

The SGD is an iterative method that optimizes the objective function with suitable features. The SGD starts from a random point 𝐯 = (𝑣₁, …, 𝑣ᵣ) (where the 𝑣₁, 𝑣ᵣ refers to the point features) and travels down until reaching the best point of the desired function.

RMSProp is a strategy that accelerates gradient descent. The RMSProp calculates the loss function gradient for the model parameters and updates the parameter values in the gradient obverse direction. Furthermore, it settles the learning process and prevents optimization oscillation [17, 18].

Adam is a technique that combines momentum and RMSProp by storing both individual learning rates. The Adam finds a moving average of both the gradient and the squared gradient. Also, it uses beta1 and beta2 to adjust the decay values of the moving average. The beta1 is the first-moment estimate’s decay rate, and the beta2 is the second-moment estimate’s decay rate [19].

ESSDN-LDV1

This model incorporates batch normalization to normalize the input of each layer by subtracting the batch mean and dividing by the batch standard deviation, dropout regularization to prevent over-fitting and improve generalization, and data augmentation includes shifting, zooming, and rotation which introduces variations and increases the diversity of training samples. ESSDN-LDV1 used fixed hyper-parameters O: Adam, B: 16, P: valid, M: 0.9, E: 60, D: 0.001, and K: 0.001, and also K was reduced by a factor of 10 after 40 and 50 epochs.

ESSDN-LDV2

This model incorporates skip connections (SCs) into the SSD architecture to enhance detection and classification accuracy. SCs are established by connecting the output of BFELs to the corresponding PLs. That means the feature maps from early layers are directly combined with the feature maps from later layers.

SCs enhance the model's ability to make accurate predictions by utilizing local and global cues. The low-level features provide detailed information about specific regions or objects in the image, while the high-level features provide a broader understanding of the overall context. Combining these features gives more comprehensive details about the input image leading to improved detection and classification performance.

Likewise, The ESSDN-LDV2 uses the Random Search Algorithm (RSA) to tune the SSD hyper-parameters. It is trained with different combinations of hyper-parameters, evaluated its performance on the validation set, and collected the results for each hyper-parameter. Furthermore, ESSDN-LDV2 incorporates batch normalization, dropout regularization, and data augmentation, including shifting, zooming, and rotation. Table 1 presents the proposed RSA to find the optimal hyper-parameter settings for the model.

ESSDN-LDV3

This model incorporates Feature Fusion (FF) by combining feature maps from different layers in SSD to capture low-level and high-level features at different levels. The FF applied a concatenation operation between the desired feature maps to ensure that the model gets fine-grained details and semantic and contextual information from the low and high levels of features. Also, ESSDN-LDV3 incorporates SCs in the same way as ESSDN-LDV2.

ESSDN-LDV3 incorporates FF and SCs in the SSD architecture to leverage multi-scale features for object detection. The FF captures spatial details and contextual information, and SCs facilitate the flow of features across different levels in the model. This integration enables ESSDN-LDV3 to enhance the accuracy of detecting lung diseases. Likewise, ESSDN-LDV3 incorporates the Genetic Algorithm (GA) for fine-tuning the model's hyper-parameters, replacing the random search algorithm. The GA optimizes the hyper-parameters through an iterative process that simulates natural selection and evolution.

By leveraging this algorithm, ESSDN-LDV3 aims to achieve better performance by finding optimal hyper-parameter configurations that enhance the model's ability to detect and classify lung diseases accurately. Furthermore, ESSDN-LDV2 incorporates batch normalization, dropout regularization, and data augmentation, including shifting, zooming, and rotation. Table 2 lists the proposed GA to find the optimal hyper-parameter settings for the model.

ESSDN-LD in diagnosing process

In the diagnosis process, ESSDN-LD utilizes several pre-processing techniques, including resizing, normalization, noise reduction, and image enhancement, to ensure the consistency and quality of the input X-ray image. Once the input image is pre-processed, ESSDN-LD extracts relevant features and patterns from the input images using its deep learning architecture. By analysing these features, the model can identify potential disease indicators and test the presence of atelectasis, cardiomegaly, pleural thickening, pulmonary fibrosis, COVID-19, and pneumonia or the absence of these diseases.

The diagnostic output of ESSDN-LD provides valuable information to healthcare professionals, including the predicted diseases and their corresponding probabilities and localization. Figure 2 indicates the ESSDN-LD operations for detecting lung diseases.

Performance metrics

The effectiveness of the ESSDN-LD in detecting lung diseases was assessed using several metrics. These metrics include accuracy (ACC), precision (P), recall (R), and F1 score.

The performance metrics of the model are calculated based on four records. True Positives (TP) refer to the number of cases that are accurately classified as positive by the model. True Negatives (TN) represent the number of correctly classified cases as negative by the model. Thirdly, False Positives (FP) represent the number of incorrectly classified cases as positive when they are negative. Finally, False Negatives (FN) represent the number of incorrectly classifies cases as negative when they are positive.

The ACC represents the proportion of correctly predicted cases (TP and TN) out of all cases. A high accuracy signifies that the model has a substantial number of correct disease predictions compared to the overall predictions, indicating a good model performance. Equation (1) shows the ACC calculation.

The R represents the proportion of correctly identified positive cases (TP) out of all the positive cases (TP and FN). A high recall score indicates that the model effectively detects the positive cases, meaning it has a low rate of FN. Equation (2) indicates the R calculation.

The P represents the proportion of correctly identified positive cases (TP) out of all the cases predicted as positive (TP and TN). A high precision value indicates that the model accurately identifies positive cases (TP) with high confidence. Equation (3) computes the P value.

The F1 score combines P and R into a single metric. It represents the ability of the model to identify both positive and negative cases. A high F1 score indicates that the model has high P and R. Equation (4) represents the F1calculation.

Furthermore, the training accuracy chart was used to demonstrate the model's accuracy progression throughout successive training epochs. It provides insights into how effectively the model is learning and adjusting its parameters to fit the training data. As the training accuracy increases with each epoch, it indicates that the model has successfully acquired learning and improving its performance.

Data description

The ESSDN-LD model is trained and evaluated using two datasets. The first dataset consists of images for six diseases (aortic enlargement, cardiomegaly, pleural thickening, pulmonary fibrosis, COVID-19, and pneumonia) and health images (DSSD). The second dataset includes images for COVID-19, pneumonia, and health images (DSCP). The DSSD dataset is collected from three public datasets: The VinDr-CXR dataset [20, 21], the RSNA Pneumonia Detection dataset [22], and the SIIM-FISABIO-RSNA COVID-19 dataset [23]. Table 3 provides the number of images and objects of each disease in the DSSD dataset. Furthermore, the DSCP Dataset is collected from the RSNA Pneumonia Detection Dataset, and the SIIM-FISABIO-RSNA COVID-19 Dataset. The DSCP contains 1500 records for each COVID-19, pneumonia, and no-finding cases.

Results and Discussion

This section presented the performance evaluation of the ESSDN-LD model in all versions using two datasets: DSSD and DSCP. The ESSDN-LD model was implemented using Python programming language and PyTorch framework with several software libraries such as numpy, scipy, cv2, and pandas to support the development and evaluation process.

Training and Validation in the DSSD

During the training and validation of the ESSDN-LDV1, ESSDN-LDV2, and ESSDN-LDV3 models, it was observed that ESSDN-LDV3 performed the best in both phases. In the training phase, it consistently showed an increase in accuracy with each epoch, surpassing the performance of the other versions. ESSDN-LDV1 achieved a training accuracy of 88.1%, ESSDN-LDV2 achieved 97.4%, and ESSDN-LDV3 achieved an impressive training accuracy of 99.6%. Also, the ESSDN-LDV3 exhibited continuous learning and improvement in each epoch, consistently enhancing its performance over time. Figure 3 demonstrates the training accuracy of the three versions of the ESSDN-LD in the DSSD. In the validation dataset, ESSDN-LDV1 also outperformed the other versions, achieving an accuracy of 93.2%. ESSDN-LDV2 achieved an accuracy of 90.2%, while ESSDN-LDV1 achieved 80.4%. Furthermore, the optimal hyper-parameters for the ESSDN-LDV2 version were determined by the RSA: K= 0.0001, B = 32, E = 90, P = same, O = SGD, M= 0.5, and D= 0.001. For the ESSDN-LDV3 version, the optimal hyper-parameters were found by the GA: K = 0.001, B = 32, E = 120, P = same, O = SGD, M = 0.4, and D = 0.0001.

Test dataset in the DCSSD

The performance evaluation of the ESSDN-LD versions involves using unseen test data to predict multiple classification labels. The ESSDN-LDV1 achieved a correct classification rate of 83.6% for objects in all test images, with 16.33% misclassified objects. The ESSDN-LDV2 achieved a correct classification rate of 93.7% for objects in all test images, with 6.7% misclassified objects. The ESSDN-LDV3 demonstrated a higher performance, correctly classifying 96.5% of the objects with 3.5% misclassified objects. Total accuracy of 83.67% was achieved in the ESSDN-LDV1. Furthermore, the ESSDN-LDV1 version achieved an average accuracy of 83.7% in detecting lung diseases. Also, it achieved the highest accuracy of 96.94% in detecting pulmonary fibrosis. Table 4 shows the performance metrics of the ESSDN-LDV1 version in each lung disease of the DSSD.

The ESSDN-LDV2 achieved an average accuracy of 93.7%. The highest accuracy achieved by this version was 96.94% in the detection of pulmonary fibrosis. Table 5 shows the performance metrics of the ESSDN-LDV2 version in each lung disease of the DSSD.

The ESSDN-LDV3 achieved an average accuracy of 96.46%. The highest accuracy achieved by this version was 99% in the Aortic enlargement detection. Table 6 presents the performance metrics of the ESSDN-LDV3 version in each lung disease of the DSSD.

Training and Validation in the DSCP

During the training and validation of the ESSDN-LDV1, ESSDN-LDV2, and ESSDN-LDV3 models, it was observed that ESSDN-LDV3 performed the best in both phases. The ESSDN-LDV1 achieved a training accuracy of 91.5%, the ESSDN-LDV2 achieved 97.8%, and the ESSDN-LDV3 achieved an impressive training accuracy of 99.8%. Likewise, it demonstrated continuous learning and improvement in each epoch, steadily enhancing its performance over time. Figure 4 illustrates the training accuracy of three versions of ESSDN-LD in the DSCP. In the validation dataset, the ESSDN-LDV1 also outperformed the other versions, achieving an accuracy of 95.3%. The ESSDN-LDV2 achieved an accuracy of 93.2%, while the ESSDN-LDV1 achieved 84.6%. Furthermore, the optimal hyper-parameters for the ESSDN-LDV2 version were determined by the RSA: K= 0.0001, B = 32, E = 80, P = same, O = SGD, M= 0.5, and D= 0.001. For the ESSDN-LDV3 version, the optimal hyper-parameters were found by the GA: K = 0.001, B = 32, E = 90, P = same, O = SGD, M = 0.4, and D = 0.0001.

Test dataset in the DSCP

The ESSDN-LDV3 achieved the highest accuracy among all other versions, with an accuracy of 98.4%. In comparison, the ESSDN-LDV2 achieved an accuracy of 96%, and the ESSDN-LDV1 achieved 88.7%. The performance metrics for each version of the ESSDN-LD in the DSCP are presented in Tables 7, 8, and 9. Analysing the confusion matrices, the ESSDN-LDV3 demonstrated the superior performance, correctly classifying 97.9% of objects belonging to COVID-19, 98.4% of objects belonging to pneumonia, and 98.7% of the no-finding class. Figures 7, 8, and 9 depict the confusion matrices for the ESSDN-LDV1, ESSDN-LDV2, and ESSDN-LDV3 experiments in the DSCP, respectively. In these matrices, the labels '0' represent COVID-19, '1' represents pneumonia, and '2' represents no-finding.

ESSDN-LD outputs

The ESSDN-LD model demonstrates a high level of proficiency in identifying and precisely localizing the specific observations of aortic enlargement, cardiomegaly, pleural thickening, pulmonary fibrosis, COVID-19, and pneumonia within the lung region. Figure 8 shows sample outputs of the ESSDN-LD model depicting the detection and localization of abnormalities. Ground truth images (a), (b), and (c) are compared with the outputs of ESSDN-LDV1 (a1, b1, and c1), ESSDN-LDV2 (a2, b2, and c2), and ESSDN-LDV3 (a3, b3, and c3). The red boxes indicate cardiomegaly, the green boxes indicate pleural thickening, and the arctic boxes indicate aortic enlargement. Among the different versions, the ESSDN-LDV3 stands out as the best performer. In Figure 9, the ESSDN-LDV3 exhibits a high detection rate for COVID-19 in image (a). It successfully detects three diseases in image (b) pleural thickening, aortic enlargement, and cardiomegaly, while the other versions fail to do so. Furthermore, the ESSDN-LDV3 accurately diagnoses the image (c) as a healthy image.

This study introduces the ESSDN-LN model, which demonstrates its effectiveness in rapidly and precisely detecting, diagnosing, and localizing six lung diseases: Aortic enlargement, cardiomegaly, pleural thickening, pulmonary fibrosis, COVID-19, and pneumonia.

The proposed versions of the ESSDN-LD model show promising results in accurately detecting and classifying these diseases in chest X-ray images. The ESSDN-LDV3 version achieves scores of accuracies between 98.69% and 99.4% for each lung disease, while the ESSDN-LDV2 version achieves scores of accuracies between 97.72% and 98.4% for each lung disease.

The ESSDN-LDV2 version outperforms the ESSDN-LDV1 due to the robust tuning of hyper-parameters using the RSA and applying skip connections. Skip connections connected the output of the earlier layers to the later layers, combining feature maps from different levels to provide comprehensive details about the input image, thereby improving detection and classification performance. Also, the ESSDN-LDV3 version integrates feature fusion and skip connections in the SSD architecture to leverage multi-scale features for object detection. Feature fusion captures spatial details and contextual information, while kip connections facilitate the flow of features across different levels in the model. This integration enhances the accuracy of detecting lung diseases.

Furthermore, the performance of all ESSDN-LD models was better in the dataset with three classes. This observation highlights the importance of the number of images per class in influencing the performance of the SSD model. When the dataset includes more examples for each class, the model can learn better and effectively detect abnormalities. That means having a suitable number of images per class is crucial for the model to achieve optimal performance and accurately identify lung diseases. Hence, it provides more diverse examples and variations of the disease patterns, allowing it to learn and generalize better.

Notably, the ESSDN-LDV3 version misclassifies 3.5% of the test images in the seven-class classification of the DSSD dataset and 1.5% of the test images in the three-class classification of the DSCP Dataset. These results demonstrate that the ESSDN-LDV3 model can assist healthcare professionals in making informed decisions about patient care and treatment plans, thereby improving the accuracy and efficiency of the diagnostic process.

Moreover, the performance of the proposed ESSDN-LD model is compared with state-of-the-art models for the three-class classification of COVID-19, pneumonia, and no-finding. The comparison clearly demonstrates that the ESSDN-LDV3 version outperformed state-of-the-art models in terms of accuracy. These results highlight the potential of the ESSDN-LD model for automated diagnosis and detection of lung diseases. Table 10 lists a comparison of state-of-the-art models and the ESSDN-LDV3 version.

This study makes several significant contributions to the field of lung disease detection and classification:

Development of the ESSDN-LD model:

The ESSDN-LD model has effectively addressed the challenges associated with manual X-ray image analysis in the diagnosis and detection of lung diseases; time-consuming consuming, subjective, and prone to inaccuracies.

The ESSDN-LD model has demonstrated impressive accuracy in detecting and classifying lung diseases. In particular, it achieves a remarkable accuracy rate of 98.4% in detecting two specific diseases and 96.46% in a broader range of six.

The model demonstrates early detection capabilities with prediction times between 0.013 and 0.018 seconds. This rapid response can be crucial in providing timely treatment and preventing the worsening of diseases.

The ESSDN-LD model is non-invasive, eliminating the need for physical contact with patients. It provides a computer-aided detection and classification system based on X-ray images.

It is a cost-effective version that requires minimal human intervention. The model can process large amounts of data with a general-purpose PC, making it a viable option for lung disease screening and diagnosis.

The scalability of the ESSDN-LD model allows it to handle large data, making it suitable for use in large-scale screening programs and public health initiatives.

The study has successfully showcased the potential of the SSD network for automated diagnosis and detection of lung diseases. 

Conclusion

The prevalence of lung diseases, including pulmonary fibrosis, COVID-19, and pneumonia, has necessitated the development of accurate and efficient detection methods. This study addressed this need by proposing the enhanced SSD for lung disease detection and classification (ESSDN-LD) model. By incorporating various enhancement versions such as batch normalization, dropout regularization, early stopping, data augmentation, hyper-parameter tuning, feature fusion, and skip connections, the ESSDN-LD model achieved impressive accuracy in detecting and classifying lung diseases.

The obtained results demonstrated the superiority of the ESSDN-LD model over state-of-the-art models such as FAST-RNN, standard SSD, and Single Shot Detection MobileNet. The ESSDN-LD model achieved high accuracy scores for disease classification and demonstrated its capability to detect seven different lung diseases, with a remarkable accuracy rate of 98.4% for two disease detection and 96.46% for the broader range of six. These results highlight the effectiveness of the enhancement strategies employed in the ESSDN-LD model.

Moreover, the study showcased the potential of the SSD network for automated diagnosis and detection of lung diseases. The ESSDN-LD model, with its powerful performance and accuracy, has the potential to assist doctors and radiologists in the rapid and successful diagnosis of lung diseases based on X-ray images.

In future studies, researchers can explore the use of transfer learning to improve the performance of the deep learning model, investigating the multi-modal data, such as combining CT scans and X-rays, developing an online system for real-time diagnosis, and expanding the application of the ESSDN-LD model to other lung diseases or medical imaging applications. All of these are promising avenues for future exploration.

Acknowledgements

The authors wish to extend their sincere gratitude to all individuals who contributed to the research, including the reviewers and editors, for their valuable input. They also acknowledge the creators and providers of the publicly available X-ray image datasets used in the study for their efforts in making the data accessible for scientific research.

Disclosure Statement

No potential conflict of interest was reported by the authors.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Authors' Contributions

Each author played a significant role in the development of this study: Mansour Al-Hlalat: Made substantial contributions to various aspects of the article, including the introduction, previous work, methodology, programming, and implementation, as well as the discussion and investigation. Ahmad Sharieh: Provided valuable insights and input in reviewing the methodology, implementation, discussion, and administration. Mohammed Alzoubi: Reviewed the methodology, implementation, and discussion sections.

ORCID

Mansour Alhlalat

https://orcid.org/0009-0006-2664-6872

Ahmad Sharieh

https://orcid.org/0000-0002-0290-2468

Mohammed Alzoubi

https://orcid.org/0000-0003-4282-9506

 

HOW TO CITE THIS ARTICLE

Mansour Alhlalat, Ahmad Sharieh, Mohammed Alzoubi. Lung Disease Detection and Classification Using Single Shot Multi-Box Detector Network: A Comprehensive Study. J. Med. Chem. Sci., 2023, 6(11) 2849-2866.

DOI: https://doi.org/10.26655/JMCHEMSCI.2023.11.30

URL: https://www.jmchemsci.com/article_176195.html

[1]. Hiscott J., Alexandridi M., Muscolini M., Tassone E., Palermo E., Soultsioti M., Zevini A., The Global Impact of the Coronavirus Pandemic, Cytokine & growth factor reviews, 2020, 53:1 [Crossref], [Google Scholar], [Publisher]
[2]. Brady A.P., Error and discrepancy in radiology: inevitable or avoidable? Insights into imaging, 2017, 8:171 [Crossref], [Google Scholar], [Publisher]
[3]. Kieu S.T.H., Bade A., Hijazi M.H.A., Kolivand H., A Survey of Deep Learning for Lung Disease Detection on Medical Images: State-of-the-Art, Taxonomy, Issues and Future Directions, Journal of Imaging, 2020, 6:131 [Crossref], [Google Scholar], [Publisher]
[4]. Emara H.M., Shoaib M.R., El-Shafai W., Elwekeil M., Hemdan E.E., Fouda M.M., Taha T.E., El-Fishawy A.S., El-Rabaie E.M., El-Samie F.E.A., Simultaneous Super-Resolution and Classification of Lung Disease Scans, Diagnostics (Basel), 2023, 13:1319 [Crossref], [Google Scholar], [Publisher]
[5]. Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.Y., Berg A.C., Ssd: Single shot multibox detector, In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14(pp. 21-37). Springer International Publishing, 2016 [Crossref], [Google Scholar], [Publisher]
[6]. Blasi F., Lung Diseases: Chronic Respiratory International Journal of Molecular Sciences, 2018, 19:3051 [Crossref], [Google Scholar], [Publisher]
[7]. Cosgrove G.P., Bianchi P., Danese S., David J.L., Barriers to timely diagnosis of interstitial lung disease in the real world: the INTENSITY survey, BMC pulmonary medicine, 2018, 18:9 [Crossref], [Google Scholar], [Publisher]
[8]. Kieu S.T.H., Bade A., Hijazi M.H.A., Kolivand H., A Survey of Deep Learning for Lung Disease Detection on Medical Images: State-of-the-Art, Taxonomy, Issues and Future Directions, Journal of imaging, 2020, 6:131 [Crossref], [Google Scholar], [Publisher]
[9]. Xie H., Dongbao Y., Nannan S., Zhineng C., Yongdong Z., Automated pulmonary nodule detection in CT images using deep convolutional neural networks, Pattern Recognition, 2019, 85:109 [Crossref], [Google Scholar], [Publisher]
[10]. Hu M., Lin H., Fan Z., Gao W., Yang L., Liu C., Song Q., Learning to Recognize Chest-Xray Images Faster and More Efficiently Based on Multi-Kernel Depthwise Convolution, IEEE Access, 2020, 8:37265 [Crossref], [Google Scholar], [Publisher]
[11]. Sheykhivand S., Mousavi Z., Mojtahedi S., Rezaii T.Y., Farzamnia A., Meshgini S., Saad I., developing an efficient deep neural network for automatic detection of COVID-19 using chest X-ray images, 2021, Alexandria Engineering Journal, 60:2885 [Crossref], [Google Scholar], [Publisher]
[12]. Souid A., Nizar S., and Hedi S., Classification and Predictions of Lung Diseases from Chest X-rays Using MobileNet V2, Applied Sciences, 2021, 11:2751 [Crossref], [Google Scholar], [Publisher]
[13]. Arifin F., Artanto H., Nurhasanah, Gunawan T.S., Fast Covid-19 Detection of Chest X-Ray Images Using Single Shot Detection MobileNet Convolutional Neural Networks, Journal of Southwest Jiaotong University, 2021, 56:235 [Crossref], [Google Scholar], [Publisher]
[14]. Lin C., Zheng Y., Xiao X., Lin J., CXR-RefineDet: Single-Shot Refinement Neural Network for Chest X-Ray Radiograph Based on Multiple Lesions Detection, Journal of Healthcare Engineering, 2022, 2022:4182191 [Crossref], [Google Scholar], [Publisher]
[15]. Goyal S., Singh R., Detection and classification of lung diseases for pneumonia and Covid-19 using machine and deep learning techniques, Journal of Ambient Intelligence and Humanized Computing,, 2023, 14:3239 [Crossref], [Google Scholar], [Publisher]
[16]. Dogo E.M., Afolabi O.J., Twala B., On the Relative Impact of Optimizers on Convolutional Neural Networks with Varying Depth and Width for Image Classification, Applied Sciences, 2022, 12:11976 [Crossref], [Google Scholar], [Publisher]
[17]. Rabiya K., Nadeem J., A survey on hyperparameters optimization algorithms of forecasting models in smart grid, Sustainable Cities and Society, 2020, 61:102275 [Crossref], [Google Scholar], [Publisher]
[18]. Hassan E., Shams M.Y., Hikal N.A., Samir E., The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study, Multimedia Tools and Applications, 2023, 82:16591 [Crossref], [Google Scholar], [Publisher]
[19]. Abdulkadirov R., Lyakhov P., Nagornov N., Survey of Optimization Algorithms in Modern Neural Networks, Mathematics, 2023, 11:2466 [Crossref], [Google Scholar], [Publisher]
[20]. Theissler A., Thomas M., Burch M., Gerschner F., ConfusionVis: Comparative evaluation and selection of multi-class classifiers based on confusion matrices, Knowledge-Based Systems, 2022, 247:108651 [Crossref], [Google Scholar], [Publisher]
[21]. Nguyen H.Q., Lam K., Le L.T., Pham H.H., Tran D.Q., Nguyen D.B., Le D.D., Pham C.M., Tong H.T., Dinh D.H., Do C.D., VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations, Scientific Data, 2022, 9:429 [Crossref], [Google Scholar], [Publisher]
[22]. Shih G., Wu C.C., Halabi S.S., Kohli M.D., Prevedello L.M., Cook T.S., Sharma A., Amorosa J.K., Arteaga V., Galperin-Aizenberg M., Gill R.R., Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia, Radiology: AI, 2019, 1:180041 [Crossref], [Google Scholar], [Publisher]
[23]. Lakhani P., Mongan J., Singhal C., Zhou Q., Andriole K.P., Auffermann W.F., Prasanna P.M., Pham T.X., Peterson M., Bergquist P.J., Cook T.S., Ferraciolli S.F., Corradi G.C.A., Takahashi M.S., Workman C.S., Parekh M., Kamel S.I., Galant J., Mas-Sanchez A., Benítez E.C., Sánchez-Valverde M., Jaques L., Panadero M., Vidal M., Culiañez-Casas M., Angulo-Gonzalez D., Langer S.G., de la Iglesia-Vayá M., Shih G., The 2021 SIIM-FISABIO-RSNA Machine Learning COVID-19 Challenge: Annotation and Standard Exam Classification of COVID-19 Chest Radiographs, Journal of Digital Imaging, 2023, 36:365 [Crossref], [Google Scholar], [Publisher]
[24]. Reshan M.S.A., Gill K.S., Anand V., Gupta S., Alshahrani H., Sulaiman A., Shaikh A., Detection of Pneumonia from Chest X-ray Images Utilizing MobileNet Model, Healthcare, 2023, 11:1561 [Crossref], [Google Scholar], [Publisher]
[25]. Apostolopoulos I.D., Mpesiana T.A., Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks, Physical and engineering sciences in medicine, 2020, 43:635 [Crossref], [Google Scholar], [Publisher]
[26]. Karaci A., VGGCOV19-NET: automatic detection of COVID-19 cases from X-ray images using modified VGG19 CNN architecture and YOLO algorithm, Neural Computing and Applications, 2022, 34:8253 [Crossref], [Google Scholar], [Publisher]
[27]. Kaya Y., Gursoy E., A MobileNet-based CNN model with a novel fine-tuning mechanism for COVID-19 infection detection, Soft Computing, 2023, 27:5521 [Crossref], [Google Scholar], [Publisher]
[28]. Kedia P., Anjum, Katarya R., CoVNet-19: A Deep Learning model for the detection and analysis of COVID-19 patients, Applied Soft Computing, 2021, 104:107184 [Crossref], [Google Scholar], [Publisher]