Manuscript Title:

MULTIMODAL DEEP LEARNING FRAMEWORK COMBINING IMAGE AND CLINICAL DATA FOR ACCURATE SKIN DISEASE PREDICTION

Author:

NIVEDHA S, BIJU J, SATHYARAJ S, PARTHASARATHI P

DOI Number:

DOI:10.5281/zenodo.17877457

Published : 2025-12-10

About the author(s)

1. NIVEDHA S - Assistant Professor, Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, India.
2. BIJU J - Assistant Professor, Division of Data Science and Cyber Security, Karunya Institute of Technology and Sciences, Coimbatore, India.
3. SATHYARAJ S - Assistant Professor, Department of Artificial Intelligence and Data Science, NPR College of Engineering and Technology, Natham, Dindigul, Tamil Nadu.
4. PARTHASARATHI P - Associate Professor, Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, India.

Full Text : PDF

Abstract

The visual similarity between lesions and the lack of access to expert dermatological examination remains a clinical problem because it is challenging to diagnose with precision and early enough to treat skin diseases. Although the latest developments in artificial intelligence made the image-based diagnostics more efficient, the unimodal systems that only use a dermoscopic image do not necessarily see all the important details of the patients that could be used to improve the diagnostic accuracy. This paper presents a multimodal deep learning system that combines both dermoscopic images of the skin and structured clinical information in order to obtain a more accurate classification of skin diseases. In the approach, HAM10000 dataset is used and it consists of 10,015 annotated dermoscopic images with variables including patient age, sex, and lesion location. A Convolutional Neural Network (CNN) backbone is used to extract image features whereas Multilayer Perceptron (MLP) network is used to encode clinical attributes. They are merged with an attention-directed mechanism to extract complementary information of modalities. The experimental analysis proves that the proposed model attains 94.7% accuracy, F1-score of 0.93 and AUC of 0.96, which is better than image-only and clinical-only baselines. The findings prove the hypothesis that clinical metadata combined with visual features can greatly improve the classification robustness and interpretability. The suggested framework has a high potential to be a clinical decision-support system among dermatologists, which will help detect skin diseases earlier and more accurately.


Keywords

Deep Learning, Multimodal Learning, Dermatology AI, Clinical Data Fusion, Skin Disease Classification, Medical Imaging, Machine Learning in Healthcare.