EPIC-Bench: A Perception-Centric Benchmark for Fine-Grained
Embodied Visual Grounding in Vision-Language Mode

Haozhe Shan1,2,*, Xiancong Ren1,*, Han Dong1,*, Haoyuan Shi1,3,*, Yingji Zhang4
Jiayu Hu1, Yi Zhang1, Yong Dai1,†, Bin Shen6, Lizhen Qu5, Zenglin Xu2, Xiaozhu Ju1,‡
1X-Humanoid, 2Fudan University, 3University of Science and Technology of China, 4University of Manchester, 5Monash University, 6Celonis AI
*Core contributors   Project leader   Correspondence
GitHub Paper 🤗Huggingface modelscope ModelScope

Overview

Overview

Abstract

This work presents EPIC-Bench, Embodied PerceptIon BenChmark, a grounding benchmark designed to systematically evaluate the visual perceptual capabilities required for large vision-language models (VLMs) in embodied environments. We construct a dataset of 6.6k meticulously annotated (Image, Text, Mask) tuples, to answer the question: Can VLMs perceive the embodied real-world? EPIC-Bench is characterized by three key design principles. First, it encourages genuine visually grounded perception without exploiting linguistic priors. Second, it comprises 23 fine-grained tasks spanning the embodied interaction pipeline from Target Localization to Navigation and Manipulation. Third, its fine-grained taxonomy supports diagnostic analysis of embodied visual perception. Comprehensive experiments show that VLMs still struggle to align visual–text information for downstream physical interactions, especially in affordance region detection, where the target is only part of an object.

🏆 Leaderboard

📊 Benchmark

Key Features and Statistics

Representative VLM performance on 23 tasks of EPIC-Bench

Representative VLM performance on 23 tasks of EPIC-Bench

Data Collection and Annotation Pipeline

Data Collection and Annotation Pipeline

Why EPIC-Bench: Comprehensive Coverage of Embodied Perception

Why EPIC-Bench
📝 Citation
    @article{2026,
    title={EPIC-Bench: A Perception-Centric Benchmark for Fine-Grained Embodied Visual Grounding in Vision-Language Mode},
    journal={xxx},
    year={2026}
    }
                
logo2 logo