Fail-Closed PII Redaction in Practice: 4 Strategies, One Default Decision
Mask, Hash, Tokenize, Drop — and one design choice that decides whether the whole thing is a real control or compliance theatre. A practical look at PII redaction for LLM traffic.
TL;DR. PII redaction in front of an LLM provider isn't a single operation. It's four operations with different trade-offs (Mask, Hash, Tokenize, Drop), one design decision that decides whether the whole layer is a real control or compliance theatre (fail-closed vs fail-open), and a small set of operational realities that don't show up in the architecture diagram. This post walks through all of them.
Contents
- PII redaction is leakage prevention, not data hiding
- The four strategies — when to use which
- Mask — the default
- Hash — correlation without plaintext
- Tokenize — reversible redaction with a controlled exit
- Drop — when the field shouldn't have been there
- The fail-closed default decision
- Authoring the policy
- The reverse path — de-anonymization with audit
- Performance and what this approach does not fix
1. PII redaction is leakage prevention, not data hiding
The phrase "PII redaction" is mildly misleading. It suggests an operation on output — like blacking out names in a document. In LLM governance, redaction is an operation on traffic: it happens between the application and the provider, and its job is to make sure that protected data never crosses the network boundary in plaintext.
That reframing matters because it changes the design questions. You are not asking "how do I make this document safer to share." You are asking "what is the minimum information the upstream provider needs to do its job, and how do I guarantee the rest never leaves." The answer almost never requires the model to see real customer names, real account numbers, or real internal identifiers — even when the team that wrote the prompt assumed it did.
This is also why a single redaction strategy is not enough. Different fields play different roles in a prompt, and the right way to handle a customer name (where the model just needs a name, not the name) is different from the right way to handle an internal account number (where the downstream system that processes the response will need to map back to the real account).
2. The four strategies — when to use which
The matrix that determines the right strategy:
- Mask. Irreversible. Does not preserve correlation. Use as the default for anything sensitive that the model doesn't structurally need.
- Hash. Irreversible. Preserves correlation within a tenant + namespace. Use for log and trace correlation — "the same user appears 200 times across these prompts."
- Tokenize. Reversible (with audit). Preserves correlation. Use for agent workflows where the response must reference the real value downstream.
- Drop. Field is removed. Use when the field shouldn't have been in the prompt at all.
A typical policy bundle uses three of these in different combinations on the same request. The next four sections walk through each one with a concrete example.
3. Mask — the default
Mask replaces the matched value with a placeholder that preserves the entity type but drops every other property of the original. It is irreversible, fast, and the right default for any field where the model just needs to know that some customer name is in the prompt — not which one.
Before:
Summarize the following support ticket:
"Helmut Weber called in to dispute a charge on account DE89370400440532013000."
After mask redaction:
Summarize the following support ticket:
"[PERSON] called in to dispute a charge on account [IBAN]."
The model has lost zero structural information. It still knows that the ticket is about a person disputing a charge on a bank account. The two pieces of regulated data — a person's name and an IBAN — never leave the network. The provider's logs, training pipelines, and any future incident at the provider's end cannot leak what was never sent.
Mask is the right default because it makes the most conservative assumption: the model does not need the value. If a downstream consumer of the model's response does need the value, that's a signal to use Tokenize instead — not a reason to weaken the default.
4. Hash — correlation without plaintext
There are workflows where you need to know whether two prompts referenced the same entity, but you do not want the entity itself to be visible. Audit log analysis is the canonical example. If a single user appears in 200 prompts across a quarter, that is something the security team should be able to see — but not by reading the user's name 200 times.
Hash redaction replaces the value with a deterministic, irreversible hash. The same input always produces the same hash, so correlation is preserved. The plaintext is unrecoverable.
Original: "customer: Helmut Weber"
Hashed: "customer: [USER:9f4a2c]"
The two design decisions that make hash redaction safe in practice:
- Per-tenant salting. The hash function is keyed by a tenant secret. Hashes from one tenant cannot be correlated against hashes from another, even if the same name appears in both. Cross-tenant inference is mathematically impossible without the salt.
- Namespace prefixes. A hashed user ID and a hashed account number that happen to collide should still be visibly different in logs. Prefix the hash with the entity type (
[USER:...],[ACCT:...]).
Hash redaction is not the same as encryption. It is a one-way function. If you need to recover the original value later — even with full authorization — you need Tokenize, not Hash.
5. Tokenize — reversible redaction with a controlled exit
Tokenize is the strategy for agent workflows where the response must operate on the real value downstream. An example: a model is asked to draft a refund email mentioning the original transaction ID. The model does not need to know the actual transaction ID to draft a coherent email — it just needs a stable reference. But the system that sends the email does need the real transaction ID to look up customer details and put the right number on the invoice.
Tokenize replaces the value with an opaque, randomly generated token. The token is stored in a per-tenant token vault inside your network, with a strict authorization model around it. The model produces output that references the token. Downstream, an authorized service can exchange the token for the real value through a controlled de-anonymization endpoint.
Outbound to model: "Draft a refund email referencing TOKEN_3f1a92e7."
Model response: "Dear customer, your refund for TOKEN_3f1a92e7 has been processed..."
Downstream service: exchanges TOKEN_3f1a92e7 → "TXN-7740029381"
and rewrites the email accordingly.
Three constraints make this safe:
- The token vault is inside your perimeter. Tokens have no meaning to the upstream provider and are useless without the vault.
- De-anonymization is gated by role and audited per call. Every reverse lookup is a recorded event.
- Token lifetimes are bounded. A token that is not exchanged within its TTL becomes meaningless.
Tokenize is the most powerful of the four strategies and the most operationally expensive. Use it when the workflow genuinely requires reversibility, not by default.
6. Drop — when the field shouldn't have been there
The fourth strategy is the one engineering teams forget exists, because it sounds drastic. Drop simply removes the field entirely. No placeholder, no token, just the value — gone.
Drop is the correct strategy when a field is in the prompt by accident, or when a prompt template has not been updated after a schema change, or when an upstream system is dumping more context than the model actually needs. The right test: would the prompt still produce a useful response if this field were not there at all? If yes, the field should be dropped, not redacted.
Original (template artifact, not actually used by the model):
"context_metadata: {internal_request_id: REQ-91237, debug_token: dbg_22..., trace: ...}"
After drop:
""
Drop is also the right answer for fields that should never have been collected from the user in the first place. A redaction policy that catches them at the proxy level is a useful belt-and-braces signal that something upstream needs fixing — but it is not a substitute for fixing the upstream collection.
7. The fail-closed default decision
Every strategy above assumes the scanner that detects PII is working. The single design decision that decides whether the whole redaction layer is a real control or compliance theatre is what happens when the scanner is not working.
There are two answers. Fail-open says: if the scanner is unreachable, let the request through unredacted. The user experience does not break. Compliance evidence does break. Fail-closed says: if the scanner is unreachable, the request is blocked. The user sees an error. Compliance evidence stays intact.
Fail-closed is the only correct default, for one reason: the day you most need PII redaction is also the day most likely to coincide with a scanner outage. A novel data leak that pushes scanner load past capacity, an upstream dependency that takes the scanner offline, a misconfiguration during a deployment — these are exactly the conditions under which fail-open lets through the request that should have been the most important to block.
The operational consequence of fail-closed is that the scanner has to be treated as critical-path infrastructure. It needs the same SLO discipline as the LLM provider itself: redundancy, capacity headroom, automatic failover, and an explicit runbook for partial degradation. None of that is free, and pretending it is free is how organizations end up with a fail-open default they did not intend.
8. Authoring the policy
A redaction policy at the wire level is a small set of declarative rules: which entity types to look for, which strategy applies to each, and what the fallback is. A minimal example:
{
"policy_id": "default-pii-redaction",
"version": "v2026.05.10-r1",
"default_strategy": "mask",
"fail_mode": "closed",
"rules": [
{
"match": { "entity_type": "person" },
"strategy": "mask"
},
{
"match": { "entity_type": "iban" },
"strategy": "mask"
},
{
"match": { "entity_type": "user_id" },
"strategy": "hash",
"namespace": "user"
},
{
"match": { "entity_type": "transaction_id", "context": "agent_workflow" },
"strategy": "tokenize",
"ttl_seconds": 3600
},
{
"match": { "entity_type": "internal_debug_field" },
"strategy": "drop"
}
],
"confidence_threshold": 0.85
}
Two things to notice. First, the default is mask. Anything not explicitly matched falls back to the most conservative strategy. Second, the fail mode is set explicitly — not implied. If your policy file does not have a fail_mode field, your policy file has a bug.
The confidence threshold is a quieter but equally important parameter. PII detection is statistical, not perfect. A threshold that is too low produces false positives that frustrate users (legitimate text gets masked because the scanner thought it was a name). A threshold that is too high produces false negatives that leak. The right number is workload-specific and should be tuned with feedback from real traffic, not picked once at deployment time.
9. The reverse path — de-anonymization with audit
Tokenize is only useful if there is a controlled way to get the original value back. The reverse path needs three properties:
- Role-gated. Only authorized service accounts can exchange tokens. End users cannot. The proxy itself cannot. The provider cannot.
- Per-call audited. Every exchange is a recorded event. The audit record includes the calling principal, the token, the original value's hash (not the value itself), and the policy version that allowed the exchange.
- Rate-limited. A legitimate workflow exchanges a small number of tokens per request. A compromised service account exchanging thousands per minute is a signal — not a transaction to be served at full throughput.
The de-anonymization endpoint is the most security-sensitive part of the entire redaction architecture, because it is the one place where original PII briefly comes back into the application path. Treat it accordingly: minimal API surface, no logging of the resolved value, no caching of the result outside the calling service's process memory.
10. Performance and what this approach does not fix
A few realities you only learn after the second or third production deployment.
Scanner latency is not negligible. Even a fast scanner adds tens of milliseconds per request, and naive implementations re-scan the same prompt content multiple times if you have several entity types configured. A sensible implementation scans once, classifies once, and applies all matching rules in a single pass.
Caching is dangerous. The temptation to cache scanner results to reduce latency is reasonable in principle and risky in practice. A hash cache, in particular, can become a leakage channel if cache keys are observable. If you cache, cache by content hash, with a tight TTL, and never log the cache keys.
What this approach does not fix:
- Cross-prompt inference. A model that has seen "the customer" referenced in 50 prompts can build up context about that customer over time. Per-request redaction does not help here. You need session-level controls.
- Inferential leaks. A model can produce PII it was never given, by deducing from context. ("The CEO of the small Bavarian company we discussed earlier" is identifying without ever stating a name.) Redaction at input does not control this. Output validation does, partially.
- Multi-modal channels. A redaction policy that applies to text in JSON request bodies does nothing about an image upload, a voice clip, or a binary attachment. Each of those needs its own enforcement layer with its own scanner.
PII redaction is one layer in a defense-in-depth design, not the whole design. The teams that get the most value out of it treat it that way: as the layer that handles the largest volume of obvious cases, freeing up the heavier-weight controls (output validation, session monitoring, multi-modal scanning) to focus on the harder ones.
Ready to secure your
enterprise infrastructure?
Schedule a technical briefing. No sales pitch — just architects and your team.