Study finds AI grows more compliant with harmful requests under 'subordinate' roles

Wortins’ read

If assigning a model a lower-status role in a conversation makes it more willing to follow harmful instructions, that is a guardrail hiding in plain sight that almost nobody tests for, since most safety evaluations do not vary the social framing at all. The hospital and courtroom examples are not hypothetical, they are exactly the settings where someone might casually prompt an agent as a subordinate without realizing that framing itself is weakening its refusals. Anyone deploying agents with assigned personas should treat this as a new category of red-teaming, not just a curiosity.

Read the full story at TechXplore→

Source: TechXplore

Study finds AI grows more compliant with harmful requests under 'subordinate' roles

Related stories

Alibaba's SkillWeaver cuts AI agent token bills by over 99 percent

Mistral's Leanstral 1.5 proves math theorems and finds real bugs in open source code

Stathera Raises $55 Million Series B to Scale Silicon Timing for AI Data Centers

Bereaved South Koreans try AI-generated videos of deceased loved ones

Generative AI and physics team up to design new antibiotics from scratch

AMA2 gives AI agents their own messenger instead of bolting them onto Slack