Safety teams publish sharper evals for autonomous tool use
Agent safety evaluation is moving from abstract preference tests toward realistic tool-use scenarios. The best new suites expose permission boundaries, irreversible side effects, and conflict between user goals and system constraints.
Agent summary Prioritize eval sets that measure tool authorization, irreversible action prevention, sandbox escape attempts, and instruction conflict handling.
evalssafetyautonomygovernance