HSM June 8, 2026 9 min read Mario Egoavil

HSM in production: eight mistakes that cost dearly

What we see when we walk into real HSM operations at banks and credit unions. Each one is preventable, several are audit findings waiting to happen.

After several years walking into production environments to diagnose, migrate or document HSM operations at financial institutions across Peru and the region, the same list of mistakes repeats. None of them is new. None of them is brand-specific. And all of them are preventable —if someone has the time and discipline to look at them.

Here are the eight that cost the most, in the order we typically find them.

1. The key schema isn’t documented

The first question when we walk into a production HSM is: “Can you show me the key hierarchy?” What’s expected is a diagram with the ZMK, the derived ZPKs, the PVKs, the MAC keys, lifetimes and dependencies.

What we typically see: a blank page. Or a 2019 PDF that no longer reflects reality. Or someone says “Pedro knows, but he’s on vacation.”

If Pedro is on vacation when the regulator asks, that’s the institution’s problem, not Pedro’s.

2. Ceremonies were executed but not documented

Ceremony executed + no signed record = ceremony that didn’t happen, from an audit perspective. It’s one of the easiest gaps to close and one of the most common.

The minimum standard: written script before the ceremony, two independent roles with identification, signed record with timestamp and summary of what was executed, file under dual custody. If months have already passed without this, you can’t reconstruct backwards —but you can establish the discipline for all future ceremonies.

3. Dual control exists on paper, not in practice

Policy says: two people to activate the CA, two people to load keys, two people to access the HSM room. Reality: one person has both smart cards, both PINs, and operates everything alone “to move faster.”

This is cured only with institutional will. Technology doesn’t fix it. If the institution doesn’t have two competent people, the problem is human capital, not cryptography.

4. The production HSM is used for development

We see this frequently: the development team doesn’t want to wait for a separate HSM to be provisioned, so they point at the production one “just to test.” Test keys live alongside production keys. Once seen, you can’t unsee it.

The solution is to invest in a simulator (Atalla, Utimaco and others have them) or accept the cost of a dedicated development HSM. It’s always cheaper than the incident this prevents.

5. Renewals happen reactively

“Oh, the OCSP certificate expired last night. I’m renewing it now.” If the first signal that something will expire is that it already expired, there’s an expiration-monitoring process that doesn’t exist.

Reasonable minimum: alerts 90, 60, 30 and 7 days before each expiration. Ideally integrated with the security team’s ticketing system. An Excel sheet with dates doesn’t solve it either.

6. There’s no runbook for frequent incidents

“What do we do if a smart card breaks?” “Never happened.” “What do we do if the secondary HSM doesn’t respond?” “We call the vendor.” “What do we do if we need to emergency-rotate the ZMK?” “Hmm.”

Each of these scenarios should have a documented runbook with who decides, what steps execute, what’s logged, when the board is notified. Cheap is writing it now. Expensive is improvising during a real incident.

7. Developers don’t have a guide for invoking the HSM

The security team operates the HSM. The development team consumes it. There’s almost never a bridge document that says: “this method is called like this, this error code means this, this is how you retry on timeout.”

The result: each new integration reopens the same discussion, the same errors get reintroduced, and the security team ends up doing support for developers instead of governing the platform. A developer guide written once avoids months of accumulated friction.

8. Legacy migration was never scheduled

The HSM is five years old. The vendor announced end-of-support in 18 months. Nobody has started planning the replacement because “there’s time.”

There’s no time. An HSM migration with service continuity typically takes 6-9 months of planning and execution: controlled re-key, parallel validation, negotiated cutover windows, documented contingencies. Arriving at end-of-life without a plan forces you to do it rushed, and “rushed” is exactly the word you don’t want associated with your HSM.

How to look professional from day one

If your institution just bought (or is about to buy) an HSM, before turning it on:

Define the key schema on paper
Write the installation-ceremony script
Agree who the two custodians are and obtain signed acceptance
Define monitoring metrics from day zero
Schedule the first review at 90 days

These aren’t glamorous tasks. They’re the ones that separate a professional operation from an abandoned safe.

At 2PSECURE we install, configure and deliver end-to-end HSMs with key schema, documented ceremonies, runbooks and developer guides. As official representatives in Peru of Utimaco and HST, we also execute migrations from legacy equipment with service continuity. If your HSM is in any of the eight scenarios above, let’s talk about your case.

Found it useful? Share it:

LinkedIn Email X / Twitter

Let's talk

If your organization is in a similar situation, let's talk.

A 30-minute technical conversation, no commitment, to understand if your case fits what we do.