Publication | Closed Access
A Baseline Investigation: Transformer-based Cross-view Baseline for Text-based Person Search
26
Citations
25
References
2023
Year
Unknown Venue
EngineeringMachine LearningCommunicationNatural Language ProcessingMultimodal LlmTransformer-based FrameworkImage AnalysisInformation RetrievalData ScienceText-to-image RetrievalPattern RecognitionMachine TranslationText-based Person SearchMachine VisionBaseline InvestigationFeature LearningVision Language ModelComputer ScienceDeep LearningComputer VisionHuman IdentificationArtsBaseline Approach
This paper investigates a baseline approach for text-based person search by using a transformer-based framework. Existing methods usually treat the visual and textual features as independent entities for speeding up the model inference process. However, the attention to the same images should be changed according to different texts. In this paper, we use a commonly employed framework with a fused feature as the baseline, which overcomes the misalignment problem introduced by fixed features. A thorough investigation is conducted in this paper. Moreover, we propose Cross-View Matching (CVM) to provide challenging, positive text-image pairs that enable the model to learn cross-view meta-information. Furthermore, we suggest a novel evaluation process to reduce the inference time and GPU memory demand. The experiments are conducted on CUHK-PEDES, ICFG-PEDES, and RSTPReid benchmarks. Through extensive parameter analysis, the potentials of a transformer-based framework are fully explored. Although the proposed scheme is a simple framework, it achieves significant performance improvements compared with other state-of-the-art methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1