Yi's Blog

不图新奇,但问优劣

Claude Code Complexity: Safety, Safety, Safety

I tried Claude Code this week, and instantly felt the empowerment from the tool, and was stunned by how naturally it blends into developer workflows.

It demonstrated how easy the LLM model makers can disrupt the application makers (Cursor in this case). This reminds me of the analogy Andrej Karpathy made in Software Is Changing (Again) presentation that LLM has strong analogies to operating systems. The LLM model makers can easily disrupt app makers like Apple can sherlock other softwares running on top of macOS.

With a similar tool from Google called Gemini CLI released, I begin to question about what is the main complexity Claude Code has, and whether that complexity is challenging enough to support companies relying on building agentic tools.

I found the following video where Boris Cherny (who is the creator of Claude Code) answered my first question:

Audience: I was wondering what was the hardest implementation, like part of the implementation for you of building it?

Boris: I think there’s a lot of tricky parts. I think one part that is tricky is the things that we do to make bash commands safe. Bash is inherently pretty dangerous and it can change system state in unexpected ways. But at the same time, if you have to manually approve every single bash command, it’s super annoying as an engineer.

Boris: … the thing we landed on is there’s commands that are read-only, there’s static analysis that we do in order to figure out which commands can be combined in safe ways, and then we have this pretty complex tiered permission system so that you can allow list and block list commands at different levels.

This highlights a key insight: In agentic systems, safety isn’t an afterthought—it’s the core challenge.

How do we know if a command is safe to run? How can these tools predict the consequences of an action? Currently, the burden is shifted to the developer via permission dialogs. But eventually, developers will expect these tools to act more autonomously—without compromising safety.

For commands that only affect local environments, Docker might offer a partial solution. But many real-world use cases involve remote effects—like modifying a task in Linear or changing a GitHub label. These remote side effects raise thorny questions about trust, auditability, and failure handling.

After exploring Claude Code and Gemini CLI, I’m excited about where this space is headed. The next breakthroughs may come not just from smarter agents—but from safer ones.

– EOF –