Research
NAIRL’s Professor Jinyoung Yeo Team to Present Research on Long-Horizon Task Learning and Embodied Agent Safety at ICML
A research team led by Professor Jinyoung Yeo at the Department of Artificial Intelligence, Yonsei University, affiliated with the National AI Research Lab (NAIRL), will present two research papers at ICML 2026 on long-horizon task learning for large language model (LLM) agents and safe planning for embodied agents. The two studies address key challenges in training LLMs to solve tasks through extended sequences of actions, and in helping multimodal agents recognize and avoid physical risks in real-world environments.
The first study, “On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length,” systematically examines how the length of an action sequence, or “horizon length,” affects the training of LLM agents. While much of the existing work has focused on system-level optimization or algorithmic improvements, this study constructs controlled tasks with the same decision rules and reasoning structures, differing only in the number of action steps required for successful completion. Through this design, the team empirically demonstrates that task length itself can become a major factor that increases training difficulty.
The research team found that as horizon length increases, agents face greater exploration difficulties and more severe credit assignment challenges, leading to significant training instability. The study further shows that “horizon reduction,” which reduces the number of steps required to complete a task, can serve as an important principle for stabilizing training and improving performance on long-horizon tasks. In particular, the team observed that models trained under reduced horizons can generalize more effectively to longer-horizon tasks at inference time, a phenomenon they define as “horizon generalization.”
The second study, “EMBGUARD: Constructing Hazard-Aware Guardrails for Safe Planning in Embodied Agents,” proposes a new approach to building safety guardrails for multimodal large language model (MLLM)-based embodied agents operating in physical environments. Existing embodied agents often struggle to properly assess risks when processing visual observations and action plans together, either missing dangerous interactions or over-identifying benign situations as risky. To address this issue, the team developed EMBGUARD, an MLLM-based safety guardrail that separates physical risk reasoning from the agent’s policy.
EMBGUARD is designed to evaluate pairs of visual observations and actions, identify potentially hazardous situations, and explain the reason for the risk in natural language. To support this framework, the team also constructed EMBHAZARD, a 17K-scale action-conditioned hazard dataset, and EMBGUARDTEST, a benchmark consisting of 189 manually curated real-world scenarios across seven physical risk categories. Despite its compact model size, EMBGUARD demonstrated competitive performance compared with major multimodal models, while also reducing false-positive risk judgments that can hinder real-time deployment.
These studies are meaningful in that they explore the core conditions required for AI agents to perform more complex tasks and operate safely in real-world environments. NAIRL will continue to introduce research achievements that address long-horizon reasoning, safety, and real-world applicability to the broader domestic and international AI community.
Paper: https://arxiv.org/abs/2605.02572v1
EMBGUARD: https://anonymous.4open.science/r/EMBGuard-742D
