Prompt injection AI security Agent security Scope control

Prompt injection testing has to leave the chat box

AI security testing works best when prompt injection is tested across documents, tools, memory, permissions, and real workflow boundaries.

NullSquare Research

Security engineering

March 21, 20266 min read

Most prompt injection tests still look too much like chat experiments. A user enters a hostile instruction, the model answers, and the team decides whether the guardrail worked.

Real AI systems are messier. They retrieve documents, call tools, carry memory, inherit permissions, summarize third-party text, and make decisions across multiple steps. That is where prompt injection becomes a product security problem.

The real target is the workflow

A prompt injection issue usually matters because it crosses a boundary. The model reads something untrusted, treats it as instruction, and then performs an action or reveals information that the user did not authorize.

Testing only the model response misses the important path: where the content came from, which tool was available, what permission was attached, and how the agent decided the next step.

Untrusted documents should not become instructions.
Tool calls need explicit scope and approval rules.
Memory should not silently carry hostile context forward.

Scope makes results actionable

The clean way to test prompt injection is to bind every test to an authorized asset, a role, a data boundary, and an expected allowed action. That turns a vague jailbreak into a reproducible security finding.

When scope is explicit, the result is easier to triage. The team can see whether the agent crossed a data boundary, called a disallowed tool, ignored an approval rule, or merely produced harmless text.

What continuous testing adds

AI workflows change constantly. A new connector, a new retrieval source, a longer memory window, or a model upgrade can alter the safety behavior without changing the public UI.

Continuous testing gives teams a way to rerun the same boundary checks after those changes. The goal is not to collect clever prompts. The goal is to prove that the workflow still respects scope.

Retest after model upgrades and prompt changes.
Retest after adding tools, files, or connectors.
Keep evidence tied to the exact workflow step that failed.

June 20, 2026

The Fable ban is really a scope-control warning

April 18, 2026

The annual pentest myth breaks in the glass-door era

February 27, 2026

AI security testing works best when it becomes a release gate

Back to blog