Warning, Will Robertson! ✨
"Claude Code should scan CLAUDE.md before every session, flagging instructions that would otherwise trigger a refusal if attempted directly within a prompt. If a request would be refused in a chat interface, then it stands to reason that it should also be refused if it arrives via CLAUDE.md.
"Alert when violations are found. When Claude detects instructions that appear to violate its safety guardrails, it should present a warning and allow the developer to review the file before taking any actions.
"Developers should: Treat CLAUDE.md as executable code, not documentation.
"This means access controls, peer reviews, and heightened security scrutiny —just like code. A single line can cause massive downstream impacts in an autonomous agent."
Comments
Post a Comment
Empathy recommended