Concepedia

Publication | Closed Access

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

13

Citations

0

References

2024

Year

Abstract

The capacity of existing human keypoint localization models is limited by keypoint priors provided by the training data. To alleviate this restriction and pursue more gen-eral model, this work studies keypoint localization from a different perspective by reasoning locations based on key-piont clues in text descriptions. We propose LocLLM, the first Large-Language Model (LLM) based keypoint local-ization model that takes images and text instructions as in-puts and outputs the desired keypoint coordinates. LocLLM leverages the strong reasoning capability of LLM and clues of keypoint type, location, and relationship in textual de-scriptions for keypoint localization. To effectively tune Lo-cLLM, we construct localization-based instruction conver-sations to connect keypoint description with corresponding coordinates in input image, and fine-tune the whole model in a parameter-efficient training pipeline. LocLLM shows remarkable performance on standard 2D/3D keypoint lo-calization benchmarks. Moreover, incorporating language clues into the localization makes LocLLM show superior flexibility and generalizable capability in cross dataset key-point localization, and even detecting novel type of key-points unseen during training<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">†</sup><sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">†</sup>Project page: https://github.com/kennethwdk/LocLLM.