Publication | Closed Access
Evaluating bag-of-visual-words representations in scene classification
815
Citations
20
References
2007
Year
Unknown Venue
EngineeringMachine LearningObject CategorizationImage RetrievalText MiningNatural Language ProcessingImage ClassificationImage AnalysisPattern RecognitionText RecognitionTerm WeightingMachine VisionAutomatic ClassificationDeep LearningComputer VisionPascal CollectionScene InterpretationScene ClassificationSalient Image PatchesLinguistics
Bag‑of‑visual‑words representations describe images as collections of keypoint‑derived patches, yet the effects of dimensionality, selection, and weighting on scene‑classification accuracy remain underexplored. The study aims to adapt text‑categorization techniques—term weighting, stop‑word removal, and feature selection—to construct alternative visual‑word representations for scene classification. These representations are evaluated through extensive experiments on the TRECVID and PASCAL datasets to assess how different dimensionality, selection, and weighting choices influence classification performance. The results establish an empirical foundation for choosing visual‑word configurations that improve scene‑classification accuracy.
Based on keypoints extracted as salient image patches, an image can be described as a "bag of visual words" and this representation has been used in scene classification. The choice of dimension, selection, and weighting of visual words in this representation is crucial to the classification performance but has not been thoroughly studied in previous work. Given the analogy between this representation and the bag-of-words representation of text documents, we apply techniques used in text categorization, including term weighting, stop word removal, feature selection, to generate image representations that differ in the dimension, selection, and weighting of visual words. The impact of these representation choices to scene classification is studied through extensive experiments on the TRECVID and PASCAL collection. This study provides an empirical basis for designing visual-word representations that are likely to produce superior classification performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1