Somewhere in your organization there is an incident response plan. It might be a forty-page document in a ring binder that nobody has opened since the last audit. It might be a PDF on a SharePoint site that three people know how to find. It might be a set of procedures that were written by the managed service provider who won the contract four years ago, reviewed once by the IT manager, and filed. Whatever form it takes, it almost certainly won't work when you need it.

This isn't a criticism of the people who wrote the plan. Incident response plans fail not because they're poorly written but because they're untested. A plan that has never been exercised is a theory. And theories are fragile things when confronted with the chaotic reality of a 2 AM phone call telling you that your systems are encrypted and there's a ransom note on every screen.

The military learned this lesson centuries ago: no plan survives first contact with the enemy. The value of planning isn't the plan itself but the planning process. The thinking, the preparation, the muscle memory that comes from rehearsal. An organization that has practiced its incident response, even imperfectly, will outperform one with a perfect plan that's never been tested. Every time.

Why plans fail: the gap between paper and practice

The most common incident response plans fail for predictable reasons that have nothing to do with the technical content.

Contact information is wrong. The plan lists the incident response team with names, phone numbers, and email addresses. Half the people listed have changed roles or left the organization. The phone numbers include office landlines that nobody answers outside business hours. The email addresses point to accounts on the system that's been compromised. This sounds trivial, but in the chaos of an active incident, the inability to reach the right people in the first thirty minutes can turn a containable event into a catastrophe.

The plan assumes resources that don't exist. "The network team will isolate the affected segments." Your network team is two people, one of whom is on vacation and the other doesn't have the access credentials for the core switches because the previous network engineer left six months ago and the credentials were never updated. "The backup team will initiate recovery from the most recent clean backup." Nobody has tested a full restore since the backup system was installed. The backup admin who configured it left in March.

Communication channels are part of the blast radius. The plan says to use Microsoft Teams for incident coordination. Your Microsoft 365 tenant has been compromised. The plan says to email the executive team. Email is down. The plan says to call the conference bridge number. The phone system is hosted in the same cloud environment that's been hit. An incident response plan that depends on communication infrastructure that could itself be affected by the incident is a plan that will fail when you need it most.

Decision authorities aren't clear. Who can authorize shutting down a production system to contain a breach? Who decides whether to pay a ransom? Who approves the external communication to customers? In theory, these decisions are made by specific named executives. In practice, at 3 AM on a Saturday, those executives may be unreachable, unaware of the context, or paralyzed by the unfamiliarity of the situation. The plan says "the CEO will decide." The CEO has never been in this situation before and has no framework for making the decision.

The plan was written for one type of incident. Most incident response plans are implicitly written for a generic "cyber incident" that looks suspiciously like a ransomware attack. A data exfiltration that's discovered weeks after it occurred requires a completely different response. A supply chain compromise where the threat actor is embedded in your software update pipeline doesn't match any playbook. An insider threat involves HR and legal in ways that a technical incident doesn't. A single monolithic plan that tries to cover everything covers nothing well.

The first 60 minutes of a real incident

Understanding what actually happens in the first hour of a significant incident is essential for building a plan that works. The first hour is characterized by uncertainty, confusion, and the overwhelming urge to do something (anything) even if the situation isn't yet understood.

The initial detection is usually ambiguous. Someone notices something unusual: a system is running slowly, files have strange extensions, a security alert fires that might be a false positive. The person who notices it may not recognize it as a security incident. They might restart the system and go back to what they were doing. They might mention it to a colleague who says "yeah, it's been doing that all week." Minutes or hours can pass between the first sign of an incident and the point where someone recognizes it for what it is and escalates.

Once the incident is recognized, the first challenge is triage. Is this a single compromised endpoint or a network-wide compromise? Is the threat actor still active, or did this happen days ago and you're just now finding the evidence? Is data being exfiltrated right now, or has the damage already been done? The answers to these questions determine the appropriate response, but in the first thirty minutes you probably don't have enough information to answer any of them with confidence.

The temptation during triage is to investigate thoroughly before taking action. This is almost always wrong. If there's evidence of an active compromise, containment takes priority over investigation. Isolate the affected systems. Cut the network connections. Disable the compromised accounts. You can investigate later. What you can't do is un-exfiltrate the data that left the network while you were carefully analyzing log files.

The counterargument (that hasty containment destroys forensic evidence) is valid but secondary. Preserving evidence matters for insurance claims, legal proceedings, and understanding the root cause. But it doesn't matter more than stopping ongoing damage. The priority order is: contain, preserve, investigate. Organizations that reverse this order end up with excellent forensic evidence of an attack they failed to stop.

Parallel to the technical response, the communication machine needs to start. The incident commander (whoever that is) needs to establish a communication channel that is independent of the compromised environment. A personal mobile phone group chat. A pre-established Signal group. A conference call on a telecom service that's separate from the corporate phone system. Something that will work regardless of what's been compromised. This is where pre-planning pays off enormously: if the out-of-band communication channel is already set up and tested, activating it takes seconds. If it needs to be improvised during the incident, it takes precious minutes and risks being insecure.

Tabletop exercises that actually test capability

The solution to the gap between plan and practice is exercising. Not fire drills or checkbox compliance exercises, but realistic scenario-based exercises that test the organization's actual capability to respond.

A tabletop exercise assembles the incident response team (and the executives who would be involved in a real incident) and walks them through a realistic scenario. The facilitator presents an unfolding situation, and the participants discuss what they would do at each stage. No systems are touched, no actual response is initiated, but the discussion reveals gaps in the plan, confusion about roles, and assumptions that don't hold up under scrutiny.

The key word is "realistic." A tabletop exercise that walks through a textbook incident with obvious indicators and clear decision points is a waste of everyone's time. Real incidents are ambiguous, messy, and involve incomplete information. A good tabletop exercise introduces ambiguity deliberately. The initial alert might be a false positive. Or it might not. The scope of the compromise is unclear. The attacker's objectives are unknown. External parties (regulators, media, customers) start calling before the internal team has answers. The exercise should feel uncomfortable, because real incidents feel uncomfortable.

Effective tabletop exercises follow a structure that maximizes learning. Start with a scenario briefing that sets the scene without giving away too much. Then introduce "injects" (new pieces of information that change the situation) at intervals throughout the exercise. An inject might be: "The attacker has now posted what appears to be customer data on a dark web forum." Or: "A journalist has called asking for comment on a data breach at your organization." Or: "Your cyber insurance provider says you need to use their approved forensic firm, but that firm has a 48-hour wait list." Each inject forces the team to reassess, reprioritize, and make decisions with imperfect information.

The participants should include everyone who would be involved in a real incident, not just the IT security team. The CEO or managing director, because they'll need to make decisions about public communication, regulatory notification, and business continuity. The CFO, because incident response costs money and someone needs to authorize the spending. The head of HR, because incidents involving employees have personnel implications. The head of communications or marketing, because external messaging needs to be managed. Legal counsel, because almost every significant incident has legal implications. If these people aren't in the room for the exercise, they won't be prepared for the real thing.

After the exercise, the debrief is where the real value emerges. What worked? What didn't? Where did the plan break down? Where were roles unclear? What resources were assumed but don't actually exist? What decisions took too long because the authority wasn't defined? The debrief should be documented, and the findings should drive updates to the plan. Then you exercise again, incorporating the lessons learned. This cycle of exercise, debrief, update, and re-exercise is what builds genuine incident response capability.

How often should you exercise? At minimum, annually. Ideally, twice a year with different scenarios. And after any significant change to the IT environment, the organizational structure, or the threat environment. An organization that exercises regularly develops the muscle memory that carries it through the first chaotic hours of a real incident. An organization that exercises once for compliance and files the report has a plan. The difference between the two is the difference between preparation and paperwork.

Ransomware-specific response considerations

Ransomware deserves its own section because the response dynamics are different from other incident types, and because the decisions involved are among the hardest any executive will face.

The first decision in a ransomware incident is whether to contain or to observe. If encryption is actively spreading across the network, containment is urgent: isolate segments, shut down systems if necessary, stop the encryption from reaching critical assets and backup systems. If the encryption has already completed and the threat actor has moved on, the urgency shifts to assessment: what's encrypted, what's still accessible, are the backups intact, and what's the recovery path?

The ransom demand itself is a distraction during the initial response. The decision about whether to pay comes later, after you understand the full scope of the impact and the viability of recovery from backups. Too many organizations fixate on the ransom demand in the first hours when they should be focusing on containment and assessment. The demand will still be there tomorrow. Your window to contain the spread might close in minutes.

Backup assessment is critical and must happen immediately. The threat actor knows that organizations with working backups don't pay ransoms, so modern ransomware operations specifically target backup systems. Check whether the backup infrastructure has been compromised. Verify the integrity of the most recent backups. Determine whether the backups were taken before or after the initial compromise (attackers often dwell in the network for weeks before encrypting, meaning your recent backups may contain the malware). If the backups are clean and complete, you have a recovery path. If they're compromised, encrypted, or incomplete, the situation is significantly worse.

The decision to pay or not pay a ransom is a business decision, not a technical one. It should be made by the board or the CEO with input from legal counsel, the insurer, and law enforcement. The factors include: the viability of recovery from backups, the sensitivity of any data that has been exfiltrated (ransomware operations now routinely steal data before encrypting it, to add a double-extortion pressure), the financial impact of extended downtime, the reputational implications of paying versus not paying, and the legal and regulatory constraints on ransom payments. There is no universally right answer. Organizations with good backups and manageable downtime can refuse to pay. Organizations facing existential threats from extended outages or data exposure sometimes conclude that paying is the least-bad option.

Law enforcement engagement is strongly recommended in all ransomware incidents. Organizations hesitate because they fear the investigation will disrupt recovery or attract media attention. In practice, law enforcement agencies (particularly the FBI in the US and the National Crime Agency in the UK) have dedicated ransomware units that understand the urgency and work to support recovery rather than hinder it. They may also have decryption keys from previous operations against the same ransomware group, intelligence about the group's reliability in providing decryption tools after payment, and relationships with other victims of the same campaign that provide useful context.

Insurance notification: timing matters more than you think

Cyber insurance policies have notification requirements that are stricter than most policyholders realize. The typical policy requires notification "as soon as practicable" after discovery of an incident, and many define this as 24 to 72 hours. Miss the notification window and the insurer may reduce or deny coverage. This has happened, in litigated cases, even when the delay was caused by the chaos of responding to the incident itself.

The practical implication is that insurance notification should happen very early in the response. Ideally within the first few hours, even before the full scope is understood. The notification doesn't need to include a complete analysis. It needs to say: "We have discovered what appears to be a security incident. We are investigating. We will provide updates as more information becomes available." That's enough to start the clock in your favor.

Many cyber insurance policies include access to an incident response panel. Pre-approved vendors for forensics, legal counsel, communications, and credit monitoring. Using the panel vendors is often a condition of coverage. Engaging your own forensic firm without checking the policy first can result in a fight with the insurer over costs later. Read the policy before the incident. Know who the panel vendors are. Have their contact details in the incident response plan.

The insurer will also assign a claims adjuster who may want to be involved in response decisions, particularly the decision about whether to pay a ransom. This can feel intrusive during a crisis, but the adjuster's involvement is typically a condition of coverage for ransom payments. Shutting them out of the decision process risks having the payment denied after the fact. Include the insurer as a stakeholder in the response, keep them informed, and document everything. The claims process starts during the incident, not after it.

Board communication during an incident

When a significant security incident occurs, the board needs to be informed. The challenge is communicating effectively when the situation is fluid, the information is incomplete, and the pressure to say something reassuring is enormous.

The first communication to the board should happen within hours of the incident being confirmed, even if you have very little information. The board should not learn about a significant incident from the media, from a regulator, or from a panicked employee. The initial communication should cover: what has happened (to the extent known), what the immediate response actions are, who is leading the response, and when the next update will be provided. It should explicitly state what is not yet known. Board members who are given incomplete but honest information will be far more supportive than board members who feel they were kept in the dark or given prematurely reassuring messages that turned out to be wrong.

Subsequent communications should follow a regular cadence (daily during the acute phase, then reducing to every few days as the situation stabilizes. Each update should cover: what has changed since the last update, what new information has been discovered, what decisions are needed from the board, and what the current priorities are. Avoid the temptation to sugarcoat. If the situation is bad, say it's bad. If you don't know the full scope yet, say you don't know. Board members are experienced business leaders who can handle bad news. What they can't handle) and won't forgive (is being misled during a crisis.

)

The board's role during an incident is governance, not management. They should be informed, consulted on significant decisions (ransom payments, public disclosures, regulatory notifications), and available to authorize expenditure. They should not be directing the technical response, second-guessing the incident commander, or calling their nephew who "works in cybersecurity." Define these boundaries before the incident, ideally as part of the board's cyber risk governance framework, so that the expectations are clear on both sides when the crisis hits.

Post-incident review: learning without blaming

Every significant incident should be followed by a thorough post-incident review, conducted once the acute response is complete and the organization has returned to normal operations (or whatever passes for normal after an incident). The purpose of the review is learning, not blame. Get this wrong and the review becomes either a whitewash that identifies no meaningful lessons or a witch hunt that identifies scapegoats rather than systemic problems.

The review should reconstruct the timeline of the incident from initial compromise through detection, response, and recovery. It should identify what worked well, what didn't work, and why. It should examine the root cause. Not just the technical vulnerability that was exploited, but the organizational factors that allowed the vulnerability to exist. Was it a patching failure? Why was patching delayed? Was it a resource constraint? A prioritization decision? A process gap? Following the chain of causation to its root usually reveals systemic issues rather than individual failures.

The output of the review should be a prioritized set of improvements. To the incident response plan, to the security controls, to the organizational processes that contributed to the incident. Each improvement should have an owner and a timeline. And the list should be realistic: ten well-defined improvements that actually get implemented are more valuable than fifty aspirational recommendations that gather dust.

Share the findings broadly. Not the sensitive details of the incident itself, but the lessons learned and the improvements being made. This serves two purposes: it demonstrates to the organization that incidents lead to improvement rather than punishment (which encourages future reporting), and it educates people across the organization about real threats and real responses (which is more effective than any security awareness training program).

Some organizations resist post-incident reviews because they fear the findings will be used against them in litigation. This is a legitimate concern, and legal counsel should be involved in structuring the review to protect privilege where appropriate. But fear of litigation should not prevent honest examination of what went wrong. An organization that responds to an incident, conducts a thorough review, and implements improvements is in a far stronger legal and regulatory position than one that sweeps the incident under the carpet and hopes nobody notices.

Building muscle memory

The organizations that respond well to incidents share a common characteristic: they've practiced. Not once, not as a compliance exercise, but regularly and realistically. Their incident response capability isn't a document in a folder. It's a set of skills, relationships, and reflexes that have been built through repeated exercise.

This doesn't require enormous investment. A tabletop exercise takes half a day and costs nothing beyond the time of the participants. A technical exercise where the IT team practices isolating a network segment or restoring from backup takes a few hours. A communication exercise where the executive team practices drafting an external statement under time pressure takes an hour. These small, regular investments in preparation pay extraordinary dividends when a real incident occurs.

The alternative is to wait for the real incident and hope the plan works. That hope is rarely rewarded. The organization that discovers its plan's weaknesses during a tabletop exercise can fix them at leisure. The organization that discovers them during a real incident pays for the lesson in downtime, data loss, regulatory penalties, and reputational damage. The choice between those two outcomes is not a difficult one. It just requires the discipline to invest in preparation before the crisis rather than after it.

Start with a tabletop. Schedule it this quarter. Invite the executives, not just the IT team. Use a realistic scenario. Prepare to be uncomfortable with what it reveals. Then fix what's broken and do it again. That cycle (exercise, learn, improve, repeat) is the difference between an organization that has an incident response plan and an organization that has incident response capability. The plan is a piece of paper. The capability is what saves you.

Want to test your incident response plan before a real incident does?

Let's talk