Abstract

Deep learning in medical image analysis has indicated increasing interest in the classification of signs of abnormalities. In this study, a new convolutional neural network (CNN) architecture (MIDNet18) Medical Image Detection Network was proposed for the classification of retinal diseases using optical coherence tomography (OCT) images. The model consists of 14 convolutional layers, seven Max Pooling layers, four dense layers, and one classification layer. A multi-class classification layer in the MIDNet18 is used to classify the OCT images into either normal or any of the three abnormal types: Choroidal Neovascularization (CNV), Drusen, and Diabetic Macular Edema (DME). The dataset consists of 83,484 training images, 41,741 validation images, and 968 test images. According to the experimental results, MIDNet18 obtains an accuracy of 98.86%, and their performances are compared with other standard CNN models; ResNet-50 (83.26%), MobileNet (93.29%) and DenseNet (92.5%). Also, MIDNet18 with a p-value < 0.001 has been proved to be statistically significant than other standard CNN architectures in classifying retinal diseases using OCT images.

Key words: retinal image classification, convolutional neural network (CNN), deep learning, choroidal neovascularization, diabetic macular edema, drusen, medical image detection network (MIDNet18)

^*Corresponding author: Ramya Mohan, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, India. Email: [email protected]

©2022 Ramya Mohan, et al.
This article is distributed under the terms of the Creative Commons Attribution-Non Commercial 4.0 International License.This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). License (http://creativecommons.org/licenses/by-nc/4.0/)

INTRODUCTION

Eye disorders that are prominently seen in a huge proportion of the population lead to loss of vision. Vision loss¹ is an alarming threat which needs to be identified at an early stage for proper diagnosis and treatment. From the statistics,² it is observed that most eye impairments occur in the age group above 40 years.

One of the most common vision loss diseases affecting the retina is Choroidal NeoVascularization (CNV), Diabetic Macular Edema (DME), and Drusen. A laser-based technology, optical coherence tomography (OCT), helps to capture retinal images more precisely.³ It is further used to diagnose epiretinal membranes, macular holes, and macular swellings for many eye diseases, including glaucoma, diabetic retinopathy, macular degeneration, hypertension, and glaucoma.⁴ If these disorders are not addressed, they can lead to serious vision loss and blindness. The medical image analysis community contributes to the development of automated systems to aid ophthalmologists in using artificial intelligence for retinal image analysis.

This study is being conducted to assist physicians in providing correct and timely treatment using enhanced diagnostic processes. Artificial intelligence aids in diagnosing retinal disorders, allowing for effective and early treatment of retinal diseases.

Ref.⁵ suggested a convolutional neural network (CNN) approach, Lesion Aware Convolutional Neural Network (LACNN) for the classification of retina OCT images. They introduce the lesion attention map technique, which is used by the classification network to speed up the training process and to improve OCT classification with a good accuracy rate. Instead of using CNN,⁶ used a capsule network to learn positional information from images. The accuracy achieved was higher than that of classical CNN.³ used DCNN-based classifiers to classify retinal OCT images. The training process is enhanced by using Downsampling and weight-sharing techniques.⁷ used CNN to create an automated classification system for retinal diseases from OCT images. The approach successfully recognized retinal disease and normal vision. For OCT images.⁸ developed a surrogate-aided classification CNN model. Surrogate images for training are created using image denoising and morphological dilations⁹ provided a signed distance map and segmentation approach for automated segmentation of retinal diseases. Topology preservation was enhanced and post-processing was reduced using CNN with space regression.

MATERIALS AND METHODS

A CPU with 8 GB of RAM and an NVIDIA Tesla K80 GPU with 12 GB of memory were used in the experiment. SPSS software is used for all of the analyses.¹⁰ The input images (NORMAL, CNV, DME, DRUSEN retina images) are considered to be independent variables. The output variables or the correctly classified images are the dependent variables (Accuracy, Loss, and F1 score). To evaluate the performance of the algorithms, an independent t-test is used.

This dataset was retrieved from the Kaggle repository.¹¹ It has three folders (train, test, validation), each with subfolders for four categories of images (NORMAL, CNV, DME, DRUSEN). There are 1,26,193 OCT images of various shapes included. Training images comprise 83,484, validation images comprise 41,741, and test images comprise 968 OCT images, as shown in Table 1. Figure 1 gives the Sample Retina disease images.

TABLE 1. Dataset of retina diseases from retina OCT images.

Data	Training set	Validation set	Testing set
CNV	37,205	18,602	242
DME	11,348	5674	242
DRUSEN	8616	4308	242
NORMAL	26,315	13,157	242
TOTAL	83,484	41,741	968

FIGURE 1. Sample retina disease images.

Figure 2 depicts the MIDNet18 CNN architecture. The 14 convolutional layers in the MIDNet18 architecture have a 3 × 3 kernel (filter) size. According to a review of several research articles, the ReLu activation function is the most suited since it allows values greater than zero.¹² As a result, in convolutional layers, the ReLu activation function is applied. The image’s final feature map is derived from the 14th convolutional layer. The model makes use of a 224 × 224 input image. The model contains seven max-pooling layers with a pooling size of 2.¹³ Max pooling has been used in the MIDNET18 to emphasize the advantage of considering brighter pixels.¹⁴^–¹⁸ The proposed method does not favor average pooling since it smoothens the pixels in images and reduces the potential of predicting abnormalities in images. The proposed model additionally employs batch normalization, which aids in avoiding model overfitting and also assists each layer in learning more independently.

FIGURE 2. Proposed MIDNET18 Architecture for retinal OCT image classification.

MIDNET 18 ALGORITHM

Step 1:Selection of an appropriate dataset for the specified problem.

Step 2:Preparation of dataset for training and testing which involves the process of creating labels and resizing the image to an appropriate size.

Step 3:Defining the MIDNet Model

Input layer is defined with pixel size (224 × 224), filter number as 16, and kernel size as 3 × 3.
Defining MIDNet18 model with 14 convolutional layers
1. convolution layers with a kernel size of (3 × 3), stride value as 1, and padding as “nil,” and activation function as ReLu is defined.
2. convolutional layers are implemented with varying the number of filters for every convolutional layer from 16 to 512.

Step 4:Defining MIDNet18 model with four pooling layers with type max-pooling and pool size as 2 × 2.

Step 5:Defining the MIDNet18 model with a 20 batch normalization function by distributing the data, thereby stabilizing the network.

Step 6:The output after convolution is given to a fully connected layer for selecting the best or max value.

RESULT

The MIDNET18 model is evaluated using OCT retinal images, and the results were compared with existing standard models like ResNet50, DenseNet. According to Ranjbarzadeh et al. (2021),¹⁹ the sample size for the study was 14 with the parameters alpha 0.05, beta 0.2, and g-power 0.8, as indicated in Figure 3. Four study groups are considered for this study. The dataset used for this experiment is collected from the Kaggle repository (https://www.kaggle.com). As it was a public database, no ethical approval was required. The dataset is categorized into three folders (training, testing, and validation) with subfolders for each image category as NORMAL, CNV, DME, and DRUSEN. Images are labeled with the type of disease, patient ID, and image number, which are further organized into four directories based on image categories.

FIGURE 3. Sample size calculation using g-power calculator.

The dataset was executed for a maximum of 30 epochs in the MIDNet18 model. Table 2 shows the performance comparison of MIDNet with various algorithms.

TABLE 2. Comparison of MIDNET-18 with RESNET-50, MOBILENET, and DENSENET models on various metrics, namely, maximum training accuracy, maximum testing accuracy, F1_Score, training loss, testing loss, value for classification of retinal diseases.

Algorithms	Maximum training accuracy	Maximum testing accuracy	Training F1-score	Testing F1-score	Training loss	Testing loss
MIDNET18	96.69	98.86	95.21	98.86	0.09	0.0532
RESNET50	85.94	83.26	92.74	79.32	0.3996	0.4233
MOBILENET	91.59	93.29	87.50	93.20	0.2319	.01697
DENSENET	95.04	92.5	89.77	92.5	0.1528	0.2332

Comparison of MIDNET-18 with RESNET-50, MOBILENET, and DENSENET for various performance metrics is given in Table 2. The MIDNET-18 model obtained an improved testing accuracy of 98.86% in comparison with RESNET-50, MOBILENET, and DENSENET having a testing accuracy of 83.26%, 93.29%, and 92.5% respectively. In comparison with the testing loss rate, the MIDNET-18 model had achieved the lowest testing loss of value 0.0532, when compared with RESNET-50, MOBILENET, and DENSENET testing loss of 0.4233, 0.01697, and 0.2332, respectively. Similarly, MIDNET-18 achieved the testing F1_score of 98.86%, which is the highest in comparison with RESNET-50, MOBILENET, and DENSENET which got only 79.32%, 93.20%, and 92.5%, respectively. All the performance metrics indicate that the accuracy of the MIDNET-18 model performs better in classifying the retinal diseases from the Retina OCT dataset compared to other standard models (RESNET-50, MOBILENET, DENSENET).

Average loss for each epoch

Figure 4A represents the training and validation loss of MIDNET-18 for 30 epochs. It could be inferred from Figure 4A that the Validation loss for MIDNET-18 was high at 1.02 in initial epochs and later reduced to 0.1, which is the least loss value of all other standard models. Similarly, the training loss value was 0.7157 at the initial epochs and got reduced to 0.0969 in the 30th epoch.

FIGURE 4. (A) Training and validation loss of MIDNET-18, (B) training and validation loss of RESNET-50, (C) Training and validation of MOBILENET, (D) training and validation loss of DENSENET.

Figure 4B–D represent the training and the validation loss of RESNET-50, MOBILENET, and DENSENET, respectively. It is observed that the initial training loss for all these models was more than 50% while validation loss was higher than 90% for all the models. After the 10th epoch, it could be noticed that there was a gradual decrease in the loss value reaching less than 20% in 30 epochs.

Average accuracy for each epoch

Figure 5A demonstrates the training and validation accuracy of the MIDNet18 model for 30 epochs. It could be inferred from Figure 4, that the validation accuracy for MIDNET-18 was fluctuating between 80% and 90% in the initial epochs. After the seventh epoch, it constantly increased and achieved 96.3% in 30 epochs, which is a higher accuracy value compared to all other standard models. Similarly, the training accuracy value starts at 76% in the first epoch and gradually increases to 96.69% in the 30th epoch.

FIGURE 5. (A) Training and validation accuracy of MIDNET-18, (B) training and validation accuracy of ResNet50, (C) training and validation accuracy of MOBILENET, (D) training and validation accuracy of DenseNet.

Figure 5B–D represent the training accuracy and validation accuracy rate of RESNET-50, MOBILENET, and DENSENET, respectively. It is observed that the initial training accuracy and validation accuracy was higher than 50% for all the models. After the 6th epoch, it could be noticed that there was a constant increase in the accuracy value reaching higher than 85% in 30 epochs.

Average F1_Score for each epoch

Figure 6A shows the training F1_Score rate and validation F1_Score rate of MIDNet18 for 30 epochs. It could be inferred from Figure 5 that the F1_Score of the validation set was above 65% in the initial epochs. After the Fifth epoch, it constantly increased and achieved 94.74% in the 30th epoch, which is a higher F1_Score rate than all other standard models. Similarly, the F1_Score of the training set starts with 52.12% at the initial epochs and gradually increases to 95.21% in the 30th epoch.

FIGURE 6. (A) F1-Score of training and validation set using MIDNET-18, (B) F1-score of training and validation set using RESNET-50, (C) F1-Score of training and validation set using MobileNet, (D) training and validation F1-Score using DENSENET.

Figure 6B–D represent the F1_Score value of the training and the validation set using RESNET-50, MOBILENET, and DENSENET, respectively. It is observed that F1_Score at the initial epoch for the validation set was higher than 40% for all the models. After the 4th epoch, it could be noticed that there was a constant increase in the F1_Score rate and it reached higher than 90% in 30 epochs for all models.

The bar chart in Figure 7 depicts the comparison of the mean accuracy of MIDNET models with various standard models (RESNET-50, MOBILENET, DENSENET) for classifying the retinal diseases from the retina OCT images.

It shows that the MIDNET-18 is significantly more accurate than RESNET-50, MOBILENET, and DENSENET in classifying the retinal diseases from the retina OCT images. It could also be observed from the statistical test conducted (one way ANOVA with p < 0.05) that the MIDNET-18 model performs better in terms of prediction accuracy, and it is statistically significant as shown in Tables 3 and 4.

FIGURE 7. Bar chart to represent comparison of the mean accuracy rate of MIDNET-18 with various standard models (RESNET-50, MOBILENET, DENSENET). MIDNET18 is significantly more accurate than RESNET50, MOBILENET, and DENSENET algorithms in the prediction of diseases for the given retina dataset.

TABLE 3. Comparison of accuracy of MIDNET-18 with RESNET-50, MOBILENET, and DENSENET models in detecting retinal diseases in retina OCT images (one-way ANOVA sample test p < 0.05). MIDNET-18’s improved accuracy in prediction over other mentioned models is proved to be statistically significant.

Multiple comparisons Dependent variable: Bonferroni
(I) ALGORITHM		Mean difference (I-J)	Std. error	Sig.	95% Confidence interval
(I) ALGORITHM		Mean difference (I-J)	Std. error	Sig.	Lower bound	Upper bound
MIDNET-18	RESNET-50	0.1278075*	0.0136740	.000	0.091103	0.164512
	MOBILENET	0.0769737*	0.0134479	0.000	0.040876	0.113071
	DENSENET	0.0500167*	0.0135576	0.002	0.013624	0.086409
RESNET-50	MIDNET-18	–0.1278075*	0.0136740	0.000	–0.164512	–0.091103
	MOBILENET	–0.00508338*	0.0135652	0.002	–0.087246	–0.014421
	DENSENET	–0.0777908*	0.0136740	.000	–0.114495	–0.041086
MOBILENET	MIDNET-18	–0.0769737*	0.0134479	0.000	–0.113071	–0.040876
	RESNET-50	0.0508338*	0.0135652	0.002	0.014421	0.087246
	DENSENET	–0.0269570	0.0134479	0.284	–0.063054	0.009140
DENSENET	MIDNET-18	–0.0500167*	0.0135576	0.002	–0.086409	–0.013624
	RESNET-50	0.0777908*	0.0136740	0.000	0.041086	0.114495
	MOBILENET	0.0269570	0.0134479	0.284	–0.009140	0.063054

^*The mean difference is significant at the 0.05 level.

TABLE 4. Statistical comparison of MIDNET18 with RESNET-50, MOBILENET, and DENSENET models. The MIDNET18 achieved a better mean accuracy rate of 96.8% over other mentioned standard models.

ANOVA TRAIN_ACCURACY
	Sum of squares	df	Mean square	F	Sig.
Between groups	0.252	3	0.084	30.463	0.000
Within groups	0.320	116	0.003	—	—
Total	0.572	119	—	—	—

DISCUSSION

This research was carried out in the Artificial Intelligence Research Lab, at Saveetha School of Engineering. From the experiment results, it was observed that the MIDNet18 model’s performance was significantly better in terms of accuracy over RESNET-50, MOBILENET, and DENSENET (one-way ANOVA and Paired-Wise Comparison Bonferroni with p < 0.05).

Several studies on medical image classification using CNN have been published. In this study, the author uses four CNN architectures for retinal image classification, comprising AlexNet, GoogLeNet, VGG16, and ResNet50, and chooses the best-performing network. The chosen network is further fine-tuned and evaluated based on its performance. The overall loss of the network is equal to the sum of the losses of all channels. It is proved that the suggested approach performs better on a large dataset of retinal pictures, achieving an ideal accuracy of 97.12%.²⁰ The author’s suggested DR classifier algorithm provides an asymmetric optimization solution by combining Gaussian Mixture Model (GMM), Visual Geometric Group Network (VGGNet), Single Value Analysis (SVD), and principal component analysis (PCA), as well as softmax. For region segmentation and basic image classification, experiments were carried out using the publicly available KAGGLE dataset, which consists of 35,126 images. The performance of the experiments was evaluated in terms of classification accuracy and computation time. The proposed DR model outperforms AlexNet and the Spatial Invariant Feature Transform (SIFT), PCA, and SVD. The classification accuracy obtained is 92.21%, 98.3%, 97.96%, and 98.13% for FC7-PCA, FC7-SVD, FC8-PCA, and FC8-SVD, respectively.²¹ The CNN method is being used in the article to categorize DR images. They employed pretrained CNN models such as AlexNet, VGG16, and SqueezeNet, which obtains the accuracy rates of 93%, 91.82%, and 99%, respectively.

CNN algorithms are capable of learning abstract characteristics and operating with fewer parameters. Despite of its performances, it also have significant drawbacks while training the CNN model, such as overfitting, explosive gradients, and class imbalance. These issues may impair the model’s performance. Understanding and employing proper metrics can help to greatly overcome these hurdles and improve performance effectiveness.²²^,²³ Despite the fact that CNN models are generally utilized for image processing, sequential data necessitate the conversion of 1D data to 2D data. The usage of DCNN for sequential data is becoming more popular because of its outstanding feature extraction and efficient computation with a minimal number of parameters. Ensemble learning using CNN may be used to extract unique semantic representations, and the model’s generalization can be improved. Ensemble learning using CNN may be used to extract distinct semantic features, and by integrating varied architectures, the model can enhance the applicability and robustness of numerous image categories.²⁴

CONCLUSION

The proposed MIDNet-18 model outperformed the RESNET-50, MOBILENET, and DENSENET models in NORMAL, CNV, DME, and DRUSEN retinal image classification. The MIDNet18 model is trained well and obtained high accuracy in multiclass classification of more than 96.69%, which is significantly better with a p-value < 0.001 based on the Independent sample t-test. MIDNET18 performance is evaluated with different performance metrics like accuracy, loss, and F1_score. MIDNet18 has proven to perform significantly better than the other traditional CNN models in retina OCT image classification.

REFERENCES

1. Miller DG, and Singerman LJ. Vision loss in younger patients: a review of choroidal neovascularization. Optom Vis Sci. 2006; 83(5): 316–325. 10.1097/01.opx.0000216019.88256.eb

2. Ikuno Y, Jo Y, Hamasaki T, et al. Ocular risk factors for choroidal neovascularization in pathologic myopia. Invest Ophthalmol Vis Sci. 2010; 51(7): 3721–3725. 10.1167/iovs.09-3493

3. Sunija AP, Kar S, Gayathri S, et al. OctNET: a lightweight CNN for retinal disease classification from optical coherence tomography images. Comput Methods Programs Biomed. 2021; 200: 105877. 10.1016/j.cmpb.2020.105877

4. Trucco E, MacGillivray T, and Xu Y. Computational retinal image analysis: tools, applications and perspectives. Academic Press, 2019 [Online]. Available: https://play.google.com/store/books/details?id=a8i2DwAAQBAJ

5. Fang L, Wang C, Li S, et al. Attention to lesion: lesion-aware convolutional neural network for retinal optical coherence tomography image classification. IEEE Trans Med Imaging. 2019; 38(8): 1959–1970. 10.1109/TMI.2019.2898414

6. Tsuji T, Hirose Y, Fujimori K, et al. Classification of optical coherence tomography images using a capsule network. BMC Ophthalmol. 2020; 20(1): 114. 10.1186/s12886-020-01382-4

7. Mittal P. Automatic classification of retinal pathology in optical coherence tomography scan images using convolutional neural network. J Adv Res Dynam Contr Syst. 2020; 12(SP3): 936–942. 10.5373/jardcs/v12sp3/20201337

8. Rong Y, Xiang D, Zhu W, et al. Surrogate-assisted retinal OCT image classification based on convolutional neural networks. IEEE J Biomed Health Inform. 2019; 23(1): 253–263. 10.1109/JBHI.2018.2795545

9. Kepp T, Ehrhardt J, Heinrich MP, et al. Topology-preserving shape-based regression of retinal layers in Oct image data using convolutional neural networks. 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). 2019. 10.1109/isbi.2019.8759261

10. Kremelberg D. Practical statistics: a quick and easy guide to IBM® SPSS® statistics, STATA, and other statistical software. SAGE Publications, 2010 [Online]. Available: https://play.google.com/store/books/details?id=Scv0CAAAQBAJ

11. Mooney P. Retinal OCT images (optical coherence tomography). n.d. [Online]. Available: https://www.kaggle.com/paultimothymooney/kermany2018 [Accessed March 17, 2022].

12. Hang ST, and Aono M. Bi-linearly weighted fractional max pooling. Multimed Tools Appl. 2017; 76(21): 22095–22117. 10.1007/s11042-017-4840-5

13. Mohan R, Ganapathy K, and Rama A. Brain tumour classification of magnetic resonance images using a novel CNN-based medical image analysis and detection network in comparison to VGG16. J Popul Ther Clin Pharmacol. 2022; 28(2): e113–e125. 10.47750/jptcp.2022.873

14. Modi S, Guhathakurta R, Praveen S, Tyagi S, Bansod SN. Detail-oriented capsule network for classification of CT scan images performing the detection of COVID-19. Mater Today Proc. 2021 Jul 22. doi: 10.1016/j.matpr.2021.07.367. Epub ahead of print. PMid: 34312594; PMCID: PMC8295010.

15. Shafiq S, and Azim T. Introspective analysis of convolutional neural networks for improving discrimination performance and feature visualization. PeerJ Comput Sci. 2021; 7: e497. 10.7717/peerj-cs.497

16. Bahrami A, Karimian A, Fatemizadeh E, et al. A new deep convolutional neural network design with efficient learning capability: application to CT image synthesis from MRI. Med Phys. 2020; 47(10): 5158–5171. 10.1002/mp.14418

17. Shanmugamani R. Deep learning for computer vision: expert techniques to train advanced neural networks using TensorFlow and Keras. Packt Publishing Ltd, 2018 [Online]. Available: https://play.google.com/store/books/details?id=6tRJDwAAQBAJ

18. Brownlee J. Deep learning for computer vision: image classification, object detection, and face recognition in python. Machine Learning Mastery, 2019 [Online]. Available: https://books.google.com/books/about/Deep_Learning_for_Computer_Vision.html?hl=&id=DOamDwAAQBAJ

19. Valizadeh A, Ghoushchi SJ, Ranjbarzadeh R, et al. Presentation of a segmentation method for a diabetic retinopathy patient’s fundus region detection using a convolutional neural network. Comput Intellig Neurosci. 2021; 2021: 1–14. 10.1155/2021/7714351

20. Jing S, Sun X, Yu L, et al. Transcription factor StABI5-like 1 binding to the FLOWERING LOCUS T homologs promotes early maturity in potato. Plant Physiol. 2022; 189(3): 1677–1693. 10.1093/plphys/kiac098

21. Mateen M, Wen J, Nasrullah, et al. Fundus image classification using VGG-19 architecture with PCA and SVD. Symmetry. 2018; 11(1): 1. 10.3390/sym11010001

22. Mobeen-ur-Rehman, Khan SH, Abbas Z, et al. Classification of diabetic retinopathy images based on customised CNN architecture. 2019 Amity International Conference on Artificial Intelligence (AICAI). 2019. 10.1109/aicai.2019.8701231

23. Koziarski M. Two-stage resampling for convolutional neural network training in the imbalanced colorectal cancer image classification. 2021 International Joint Conference on Neural Networks (IJCNN). 2021. 10.1109/ijcnn52387.2021.9533998

24. Bhatt D, Patel C, Talsania H, et al. CNN variants for computer vision: history, architecture, application, challenges and future scope. Electronics. 2021; 10(20): 2470. 10.3390/electronics10202470

RESEARCH ARTICLE

Comparison of the proposed DCNN model with standard CNN architectures for retinal diseases classification