All databases were suddenly running dramatically slow causing a very high variance in latency causing several core systems and numerous end-user applications to halt. Several millions of transactions were lost or corrupted. A full disaster recovery process was initiated.
The situation was detected by internal NOC technicians, but it was later understood that the situation had started much earlier than first anticipated as the contact center had ahead of the incident received several calls from end-clients complaining on low system performance and data loss.
In final incident report was the root cause defined as a HW failure in a micro-controller.
IT and telecom companies have a huge disadvantage when it comes to incident management and that is that everything produced is produced in real-time. In relation to other production industries cannot data and transactions be stocked and used as a buffer when production system fails.
If we then move forward to crisis management time becomes a very scarce resource where you always will feel the shortage breathing down your neck as you try to handle the situation at hand. From real client experiences is the IT and telecom industry segment often facing the situation where incidents usually are detected by end-users and customers before your own organization does – increasing the stress factor dramatically.
ITIL/ITSM vs. Crisis Management
In short does ITIL describe processes, procedures, tasks, and checklists that can be applied for establishing your organization’s value creating process and its maintaining. ITSM refers to the activities that are performed to design, plan, deliver, operate and control service delivery to customers and end-users. These activities are usually directed by internal policies and organized and structured in processes and supporting procedures.
So, ITIL and ITSM can be seen as a toolset for defining your service creation and service delivery process where crisis management is on the other hand all about what you should do when you have a severe failure in one or more of your core value creation processes. Here ITIL and ITSM has little to offer whereas Business Continuity Management (BCM) gives you a solid foundation for preparing your organization for critical incidents.
Business Continuity Management and Crisis Management
Business continuity management (BCM) is a framework (ISO 22301:2012) for identifying an organization’s risk of exposure to internal and external threats where the goal of BCM is to provide the organization with the ability to effectively respond to threats. BCM includes disaster recovery, business recovery, crisis management, incident management, emergency management and contingency planning where contingency planning is the creation of your strategy through a crisis. For many is the result of you contingency planning you corporate Emergency Response Plan.
Emergency Response Plan (ERP)
The plan will define the procedures, roles, responsibilities and actions of the various organizations/departments and key personnel for all adequate IT scenarios. You need to have a defined set of roles manned with qualified personnel. And you need procedures covering a set of scenarios that are trained regularly.
For many IT and Telecom companies is the common approach of defining one scenario per value chain not a valid way forward as the number of difference value chains may be just too many to handle uniquely. Here, the alternative is to implement a crisis management process that in short becomes your generic method of handling any IT related critical incident.
A common process is divided into the below distinct steps:
This first step is all about verifying the situation and decide to a) await situation, b) establish Emergency Response Team or c) initiate no further action
This step is equivalent to establish your Emergency Response Team and from a broad perspective understand implications, consequences and immediate actions. This step is repeated throughout your handling until situation is under control where the frequency will vary depending on critically and need for team coordination.
This is your normalization process containing all activities needed to get from a crisis to normal operations.
This is your final step and contains all post-emergency activities such as debrief, reporting and filing of incident.
Emergency Response Design – how to build Your plan!
Too many organizations jump too fast to the action – that is nominating Scenarios, building the Team and creating check-lists – before analyzing how this should operate as a entity within your existing organization. Via the blow few steps you will conclude with a solid, operational and corporate aligned Emergency Response Plan
Emergency Response Design phase
The objective with the design is to secure that you Emergency Response structure is aligned with your corporate governance.
Emergency Response Organization phase
Here you will assess how to organize your organization. Very common is to have a three-level emergency structure; Strategical, Tactical and Operational.
Emergency Response Team phase
The Emergency Response Team is the aggregation of all functions and roles that may / will contribute during an Incident. Each function / role should have a mandate, areas of responsibilities and tasks where some tasks will be mandatory.
For more detailed information on Emergency Response Team, Alerting the team, initial response procedures and more, see our previous post on Accountable Managers.