A research team led by Professor Hyunjung Shim at KAIST AI, affiliated with the National AI Research Lab (NAIRL), has released a new study that identifies a perceptual bias in multimodal large language models (MLLMs) used as automated evaluators and proposes a novel training framework to mitigate it.
Multimodal AI models are increasingly deployed as “judges” that score the answers produced by other AI systems. The team observed that these judges tend to rely on plausible-sounding textual narratives rather than what is actually visible in the image, and formalized this phenomenon as Perceptual Judgment Bias.
In the study, co-first authored by Seojeong Park and Jiho Choi alongside fellow researchers at KAIST AI, the team designed controlled visual perturbation experiments and quantitatively demonstrated that existing judge models fail to trust their own visual perception. This finding points to a core factor that undermines the consistency and verifiability of automated evaluation.
To address the issue, the team built a new dataset called the Perceptually Perturbed Judgment Dataset (PPJD). It generates counterfactual responses with minimal visual edits to isolate perceptual errors during training. Building on this dataset, the team developed a training framework using reinforcement learning (GRPO), which delivered meaningful improvements on key benchmarks, including perceptual fidelity and alignment with human evaluation.
This work resonates with the broader research vision of KAIST CVML Lab, which centers on building AI that is safe, physically grounded, and trustworthy. The lab has pursued a wide research agenda across fairness in generative AI, perception and reasoning for safety-critical applications, physical grounding, and efficient learning.
As multimodal systems rapidly expand into science, industry, and decision-making tools, ensuring the reliability and verifiability of judge models stands among the core research agendas that NAIRL has prioritized. Professor Shim’s team’s contribution is regarded as a meaningful result that continues to take shape within the Global Open Innovation Ecosystem that NAIRL is building.
Project page: https://perceptionjudge.github.io/
KAIST CVML Lab: https://kaist-cvml.github.io/