
Optical coherence tomography (OCT) has revolutionized the field of ophthalmic imaging, particularly in the posterior segments of the eye. Non-invasive imaging techniques have significantly enhanced our ability to document and understand the intricate structures and pathological processes of the retinal structures.
Deep learning (DL) models have been developed for the ophthalmology field. Previous studies have demonstrated the efficacy of DL models in detecting various diseases in fundus photographs [1-3]. DL models have also provided new insights into demographic and eye characteristics. Multiple DL models for predicting age and gender using fundus photographs have been reported. Recently, a model for predicting age and gender from OCT images has also been developed [4].
The classification of eye laterality in images is a crucial process in data acquisition for the development of DL models. Combining various image modalities and electronic medical records requires prior laterality classification of the images. It is a simple but time-consuming and labor-intensive task for a large dataset. Although attempts to distinguish the laterality of fundus photographs have achieved high accuracy [5-8], no studies have addressed the classification laterality of OCT images using DL. The purpose of this study is to develop a DL model to classify the laterality of OCT images.
This study was approved by the Institutional Review Board of Seoul National University Hospital (IRB; No. H-2202-069-1299). All procedures were conducted in compliance with the principles of the Declaration of Helsinki. The IRB waived the need for written informed consent due to the retrospective design of the study and complete anonymization of patient information.
We retrospectively enrolled all patients who visited the ophthalmology clinic at the Seoul National University Hospital between September 2018 and March 2023 and who underwent OCT examination using Spectralis OCT (Heidelberg Engineering). We removed all patient-specific information (e.g., patient identification number and name).
Data acquisition was performed manually using a Heidelberg eye explorer. First, OCT data were acquired in the E2E format and converted to “png” files using Python libraryOCT-Converter v0.5.8 (https://github.com/marksgraham/OCT-Converter). We used horizontal and vertical sections of the OCT images, instead of three-dimensional volume scan images. Horizontal and vertical linear 8.8 mm OCT scans centered on the fovea were acquired at high resolution usimng a 100 auto real time resolution Spectralis HRA + OCT. The OCT images were displayed in a 1:1 pixel model.
We excluded low-quality images using the maximum tissue contrast index (mTCI), which was calculated according to a previous report [9]. The mTCI is the ratio of the signal intensities of the foreground and background based on histogram density modeling and decomposition of OCT images. The mTCI is computing using the following procedure. 1) Determine the histogram mode intensity (N1). 2) Find the first location where the frequency was greater than or equal to 95% of the peak frequency at N1 and calculate the cumulative density function value (cN1*) at this location. 3) Assuming cN2 = cN1* × 0.999 / cN1B*, set cN1B* as 0.6, which is greater than that in the original article (0.45) to include more real-world images. 4) Determine the intensity of the separation point (N2). 5) Determine the saturation intensity (N3). 6) Set mTCI = (N3 – N1 ) / (N2 – N1). Images with any calculatable mTCI values were included to reflect real-world datasets. Among the 103,149 images, we were unable to calculate an mTCI value for 1,384 (1.34%) due to the narrow range of histogram densities, and we defined those images as low quality. We included OCT images with and without macular abnormalities, such as an epiretinal membrane (ERM), age-related macular degeneration (AMD), central macular edema (CME), a macular hole (MH), or other macular abnormalities.
Among the included images, those of 2,500 patients who underwent examination of both eyes were randomly selected due to computational limitations. The total number of the selected images was 10,000 (2,500 of the right eye and 2,500 of the left eye, horizontal and vertical section images for each eye).
The dataset was divided into a development set and a test set at an 8:2 ratio. The development set and test set were constructed exclusively on aspects of both the patient and the eye to confirm that images from the same patients did not belong to both the development and test sets. Five-fold cross-validation was used in the development set.
Development of the DL model consisted of classification of horizontal/vertical OCT images, classification of laterality in horizontal OCT images, and classification of laterality in vertical OCT images.
Preprocessing was conducted as follows: 1) The grayscale raw images were converted to RGB by tripling each pixel value into three channels using the Pillow library; 2) Each image was resized to 224 × 224 pixels. 3) Augmentations were applied: random affine (degrees = 0, horizontal range 80%-120%, vertical range 80%-120%), brightness adjustment (80%-120%), contrast adjustment (80%-120%), saturation adjustment (80%-120%), and random rotation (−90˚ to +90˚). 4) The pixels were normalized between 0 and 1. The augmentations were applied to the images randomly at each epoch. For the test dataset, no augmentations were applied.
All development processes were performed using Pytorch (torch version 1.10.0, and torchvision version 0.11.0) and Python (version 3.7.2). A private server equipped with a CPU with 32 GB of RAM and a 24 GB RTX3090 graphics processor (NVIDIA) was used for development. ResNet-18 was used as a backbone network [10]. Instead of the last fully-connected-layer of the models, the outputs of the average pooling layers were flattened to a single dimension. A dropout layer with a dropout rate of 0.2 was added, followed by a fully-connected-layer, a rectifier linear unit layer, and another fully-connected-layer with one output. For ResNet-18 backbones, we used pre-trained weights from the Image-Net dataset. An Adam optimizer was used for optimization with a learning rate of 1 × 10-3. The model was trained over 100 epochs and validated by a cross-entropy loss function. The learning rate was reduced to 10% when validation loss did not improve within 10 epochs. If the learning rate was reduced to less than 1 × 10-6, we stopped the training process to avoid overfitting, and the model with the lowest validation loss was selected as the final model. For all development processes, the random state and seed were fixed.
For each fold, we calculated the mean area under the receiver operating characteristic curve (AUROC). The best threshold value for the classification of laterality for each fold was determined using the Youden index [11]. The Youden index is defined as sensitivity + specificity − 1, and the threshold value where the Youden index was maximized was selected. For each model of the fold, the best threshold value from the development process was applied to determine the final prediction of the image for the test dataset. Five output values were voted on for the final prediction. We then calculated the accuracy, specificity, and sensitivity. To compare the prediction results between the five fold models, a Cochrane Q test was used. To visualize the results, we used gradient-weighted class activation mapping (Grad-CAM) to highlight regions on which the network focuses for classification [12].
A total of 5,000 eyes of 2,500 patients (10,000 images) was included in the development process. The test dataset consisted of 1,000 eyes of 500 patients and the other 4,000 eyes of 2,000 patients were used for the development dataset. The test dataset consisted of 590 eyes without macular abnormalities, 208 ERMs, 111 AMDs, 56 CMEs, 23 MHs, and 12 other macular abnormalities.
The prediction results for OCT image section classification are summarized in Table 1. The DL model predicted the OCT section of the eyes in the test dataset with a mean AUROC of 0.9967. The accuracy, sensitivity, and specificity were 0.9835, 0.9870, and 0.9800, respectively. There was no statistical difference between the results from the five folds and overall mean results (p > 0.05 by Cochrane Q test). The prediction results for OCT image laterality classification are summarized in Table 2. The DL model predicted the laterality of the eyes in horizontal OCT images with a mean AUROC of 1.0000. The accuracy, sensitivity, and specificity were 0.9970, 1.0000, and 0.9940, respectively. There was no statistical difference between the results from the five folds and overall mean results (p > 0.05 by Cochrane Q test).
Using vertical OCT images, DL models did not show any predictive performance in laterality classification. Despite increasing model complexity and use of augmented data, neither training loss nor validation loss decreased sufficiently throughout the training epochs. They fluctuated around a minimum, and neither training accuracy nor validation accuracy showed any significant improvement.
Representative images of Grad-CAM are shown in Fig. 1, indicating the region of importance in the prediction of OCT section. Fig. 2 shows representative images of Grad-CAM in laterality classification using horizontal OCT images. All misclassification cases for laterality using horizontal OCT images are shown in Fig. 3.
We developed a DL model predicting horizontal or vertical sections of OCT images and classifying the laterality in horizontal OCT images with good accuracy. To the best of our knowledge, this was the first study to apply a DL model to OCT section classification and laterality prediction.
The classification of laterality in the OCT images is crucial for the various DL projects using OCT images. As the process is simple but time-consuming, automatic classification of laterality in OCT images could streamline the process. Our DL model achieved strong and robust accuracy, and it is expected to be a fundamental tool in subsequent deep-learning research.
Most commercial devices save laterality information automatically. However, our further goal was to classify images from not only commercial devices, but also publicly available sources, such as PubMed Central Open Access. To assign laterality information to all available OCT online images, we needed a three-channel architecture. Although our current model has only been validated for one OCT device, the three-channel model architecture will be used for re-training using large-scale data. High accuracy in the laterality classification of fundus photographs has been reported [6-8]. These studies emphasize the importance of optic discs and vessels, similar to human assessments. We used Grad-CAM to validate the significant role of the papillomacular bundle retinal layer in the prediction of laterality. Only three cases of misclassification occurred, which may be due to significant structural deformities in the retina.
We were unable to develop deep learning models in laterality classification using vertical section OCT images. The learning curve did not converge despite the number of epochs. Compared to the horizontal OCT images, which contain an optic disc and papillomacular bundle area, vertical OCT images lack specific structure regarding laterality. In the present study, we trained our model with various augmentations and hyperparameters, but none predicted the laterality of vertical OCT images.
The limitations of our study should be noted. Due to limited computational power, we randomly sampled 5,000 patients. Moreover, we did not include all volume scans of the OCT images. The primary purpose of the model was not to investigate the region associated with the laterality classification but to accurately and rapidly classify the laterality of the images with high accuracy. Despite the limitations, our model and dataset reflect real-world situations as we did not exclude any pathologic images.
In conclusion, we developed a DL model to classify the horizontal and vertical sections of OCT images and predicted the laterality of horizontal OCT images with high accuracy, sensitivity, and specificity. Subsequent research could benefit from the models as fundamental tools.
The authors declare no conflicts of interest relevant to this article.
Conception (R.O., C.K.Y.); Design (R.O., C.K.Y.); Data acquisition (E.K.L., K.B., U.C.P., K.H.P., C.K.Y.); Analysis (R.O., C.K.Y.); interpretation (R.O., E.K.L., K.B., U.C.P., K.H.P., C.K.Y.); writing (R.O.); review (E.K.L., K.B., U.C.P., K.H.P., C.K.Y.); Final approval of the article (All authors)
![]() |
![]() |