A COMPREHENSIVE FRAMEWORK FOR VIDEO INFORMATION  RETRIEVAL WITH ADVANCED CHARACTERIZATION USING  CONVOLUTIONAL NEURAL NETWORKS

TALAL ASLAM, DOST MUHAMMAD KHAN, FAISAL SHAHZAD, NAJIA SAHER, KHALID MAHMOOD

Manuscript Title:

A COMPREHENSIVE FRAMEWORK FOR VIDEO INFORMATION RETRIEVAL WITH ADVANCED CHARACTERIZATION USING CONVOLUTIONAL NEURAL NETWORKS

Author:

TALAL ASLAM, DOST MUHAMMAD KHAN, FAISAL SHAHZAD, NAJIA SAHER, KHALID MAHMOOD

DOI Number:

DOI:10.17605/OSF.IO/7N3SW

Published : 2023-07-23

About the author(s)

1. TALAL ASLAM - Department of Information Technology, The Islamia University of Bahawalpur, Pakistan.
2. DOST MUHAMMAD KHAN - Department of Information Technology, The Islamia University of Bahawalpur, Pakistan.
3. FAISAL SHAHZAD - Department of Information Technology, The Islamia University of Bahawalpur, Pakistan.
4. NAJIA SAHER - Department of Information Technology, The Islamia University of Bahawalpur, Pakistan.
5. KHALID MAHMOOD - ICIT, Gomal University, D. I. Khan.

Full Text : PDF

Abstract

In the rapidly evolving world of digital content, video consumption is significantly increasing. Despite many studies on text and image mining, research on comprehensive video mining for information retrieval remains sparse. This paper presents the development and implementation of a universal video mining architecture, capable of retrieving information from both archived and live video content. Our system, designed with the robustness to detect humanoid objects, also determines their respective genders and emotions. Additionally, it is equipped to recognize human ethnicity, further discerning race, region, and complexion attributes. This research incorporates convolutional neural network (CNN) models, specifically trained for five distinct video mining tasks: ethnicity, emotion, age, gender, and multi-object detection. To ensure effective model training, we utilized various task-specific datasets. For instance, the FER2013 dataset, consisting of 35,000 grayscale facial images showcasing seven emotions, was used for emotion detection. The Appa-real dataset was employed for age and gender detection, while the COCO dataset was utilized for multi-object detection. Standard CNN training methodologies were adopted, yielding a 74% accuracy. However, to enhance the performance, we introduced multilayer backpropagation, improving the accuracy to 94%. The dropout technique and data augmentation were further implemented to achieve a remarkable accuracy of 98%. This paper provides an in-depth account of the process of CNN model training for video mining and elucidates the techniques used to accomplish high levels of accuracy for each task. Overall, our work represents a significant stride towards the development of an efficient and effective video mining system for comprehensive information retrieval.

Keywords

CNN, Ethnicity, Emotion, Age, Gender, Video Mining, Information Retrieval.