Integration of Visual and Text-Based Approaches for the Content Labeling and Classification of Photographs

Abstract

Annotating photographs automatically with content descriptions facilitates organization, storage, and search over visual information. We present an integrated approach for scene classification that combines image-based and text-based approaches. On the text side, we use the text accompanying an image in a novel TF*IDF vector-based approach to classification. On the image side, we present a novel OF*IIF (object frequency) vector-based approach to classification. Objects are defined by clustering of segmented regions of training images. The image based OF*IIF approach is synergistic with the text based TF*IDF approach. By integrating the TF*IDF approach and the OF*IIF approach, we achieved a classification accuracy of 86%. This is an improvement of approximately 12% over existing image classifiers, an improvement of approximately 3% over the TF*IDF image classifier based on textual information, and an improvement of approximately 4% over the OF*IIF image classifier based on visual information.