NullSquare
workflowbeginnerReviewed May 18, 2026

Testing modes

Black-box, gray-box, white-box, and runner-backed assessment — what each one is, when to pick it, and how they combine.

The "mode" of an assessment is shorthand for how much context you have given the agent. Black-box means the agent sees only what an external attacker would see. Gray-box adds credentials. White-box adds source code. Runner-backed describes where the traffic originates from, not what the agent knows.

You do not pick a mode in a settings menu — you pick it implicitly by what you add to the scope. This page explains what each mode is good for, what context unlocks it, and how to stack them when one is not enough.

What you will learn

  • Decision. How to choose the right mode based on the scope and goal.
  • What unlocks each. The exact configuration that flips a run from one mode to the next.
  • How they combine. When to stack credentials, source, and a private runner in the same scope.

Picking a mode

The default is black-box, especially for a first run. As you learn more about an environment, you add context and the mode upgrades naturally.

  • Use black-box for first discovery and unauthenticated external review.
  • Use gray-box when authenticated workflows, role boundaries, or API authorization need testing.
  • Use white-box when source code should inform analysis and remediation.
  • Use a private runner whenever the target is internal, VPN-only, or otherwise unreachable from the cloud — independent of which mode you pick.

Black-box — external view

Black-box testing assesses a target the way an unauthenticated external attacker would: no credentials, no source code, only what the public surface reveals. It is the right starting point for almost every new scope because it shows the platform — and your team — what is reachable before you invest in deeper context.

A discovery run is the canonical black-box assessment. The agent maps services, fingerprints technologies, identifies authentication surfaces, probes for common exposures, and recommends where to point follow-up work.

  • Best for first discovery, external attack surface review, public exposure checks, and unauthenticated baselines.
  • Authenticated workflows remain untested until you add credentials.
  • Business criticality may be unclear until discovered assets are enriched.
  • No additional setup beyond a verified target in scope.

Gray-box — authenticated view

Gray-box testing means the agent has working credentials. With even a single test account, the agent can exercise authenticated workflows, probe authorization boundaries between roles, and find issues that are invisible from the outside — broken access control, tenant leakage, missing rate limits on internal endpoints, weak session handling.

You provide access material in the scope: a login plus a flow description, a bearer token, a static header, or a cookie session. Tell the agent where each one applies (the host, the path, the role) so it uses them correctly.

  • Best for customer portals, admin surfaces, role-boundary tests, and API authorization.
  • Always use least-privilege test accounts created specifically for assessment use.
  • Limit each credential to the host or path where it applies.
  • Remove or rotate access material when it is no longer needed.

Treat test credentials like production credentials

NullSquare stores access material securely and limits how it is displayed, but the safe practice is to use accounts that have only the privileges needed for the test, in environments where compromise has no business impact.

White-box — source-aware view

White-box testing means the agent can read the source code that backs the running system. You enable this by installing the GitHub integration, syncing repositories, and mapping the relevant repositories to the scope. Read-only access; only the repositories you map.

With source context, findings get sharper: the agent can point to exact files and lines, root-cause analysis is faster, and remediation guidance is concrete. White-box also unlocks PR review automation, where a security pass runs on each pull request against the changed files.

  1. 1Open Settings → Integrations and install or reconnect GitHub.
  2. 2Sync the repositories you want available.
  3. 3Open the scope, open Repositories, and map the relevant repository.
  4. 4Optional: enable PR review for the mapped repository so each pull request gets a focused security pass.

Private-runner backed — internal network reach

Runner choice is not a testing mode in the same sense — it controls where the assessment traffic originates from. But it is the only way to assess targets the cloud cannot reach, and most "internal pentest" work is private-runner-backed by definition.

  • Required for any internal CIDR target (RFC1918, link-local, internal hostnames).
  • Required for VPN-only or office-only applications.
  • Required for machine pentesting on internal Linux or Windows hosts.
  • The runner attaches to a scope; runs in that scope can then choose private-runner execution.

How modes combine

These are not exclusive — they are layers of context. A single scope can be black-box on day one (just a verified domain), gray-box on day three (credentials added), white-box on day five (repository mapped), and private-runner-backed throughout (internal API).

When you stack them, the agent uses everything it has. An authenticated, source-aware, internal-runner assessment is what a senior pentester would do with full read-only access to a system, and it is what NullSquare aims for once a scope is mature.

A typical progression

Most teams follow this arc, scope by scope. It works because each step builds on what the previous one revealed.

  1. 1Discovery (black-box) — see what is reachable and which assets matter.
  2. 2Add context — promote assets to managed, fill business criticality, identify authenticated surfaces worth testing.
  3. 3Gray-box pass — add credentials for the authenticated workflows discovery surfaced.
  4. 4White-box pass — map the repository if you want code-aware findings and PR review.
  5. 5Automate — schedule routine discovery or targeted re-assessment to keep coverage continuous.

Related articles