Feb. 7, 2024, 5:47 a.m. | Yichen Shi Yuhao Gao Yingxin Lai Hongyang Wang Jun Feng Lei He Jun Wan Changsheng Chen

cs.CV updates on arXiv.org arxiv.org

Multimodal large language models (MLLMs) have demonstrated remarkable problem-solving capabilities in various vision fields (e.g., generic object recognition and grounding) based on strong visual semantic representation and language reasoning ability. However, whether MLLMs are sensitive to subtle visual spoof/forged clues and how they perform in the domain of face attack detection (e.g., face spoofing and forgery detection) is still unexplored. In this paper, we introduce a new benchmark, namely SHIELD, to evaluate the ability of MLLMs on face spoofing and …

benchmark capabilities cs.cv detection evaluation face fields forgery language language models large language large language models mllms multimodal problem-solving reasoning recognition representation semantic vision visual

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town