Safety

Abnormal-Situation Management in Offshore Operations

The Abnormal Situation Management (ASM) Consortium performed a root-cause analysis on 32 incident reports gathered from public documents and member companies.

jpt-2013-12-166584hero-92401688.jpg

The Abnormal Situation Management (ASM) Consortium performed a root-cause analysis on 32 incident reports gathered from public documents and member companies. The analysis identified common failure categories and manifestations in these incidents. Consequently, the consortium completed three case studies on potential deployment solutions: one supporting better shift-handover communication, a second supporting better alarm-flood situations, and a third supporting better situation awareness by use of overview displays using qualitative gauges.

Introduction

The ASM Consortium has been working for almost 20 years with an emphasis on human-factor engineering to address process-safety and operational challenges for downstream hydrocarbon-processing and petrochemicals companies. These consortium members have jointly invested in research and development to create knowledge, tools, and products designed to prevent, detect, and mitigate abnormal situations that affect process safety in the control/operations environment. The consortium has placed a continual emphasis on the humans that use automation and technology to operate the production processes, focusing extensively on human factors, human reliability, and process safety. Recently, the consortium has expanded its focus to upstream enterprises, with many of the operating company members having extensive oil and gas businesses.

What Are Abnormal Situations?

The ASM Consortium has defined abnormal situations as “undesired plant disturbances or incidents with which the control system is not able to cope, requiring a human to intervene to supplement the actions of the control system.” An abnormal situation could be a simple upset condition quickly recognized and rectified by operator action, or an abnormal situation could escalate to a critical process-safety incident, where a safety system must be engaged for an emergency shutdown and evacuation is required. The objective of ASM is to bring the process back to normal before safety-shutdown-control systems or other safety-protection systems are engaged.

This definition is specifically used to distinguish between normal, abnormal, and emergency situations from the perspective of console operations. Often, operator error is blamed for causing incidents, and companies rush to increase training as a panacea. The work of the consortium has shown that incidents are caused by a multitude of factors, and solutions to address incidents need to consider the human operator’s role, the technology involved, and the system design, as well as the environment.

ASM Consortium Effective Operations Practices and Process Safety

The ASM Consortium’s mission is to empower and enable operating teams to manage their plants proactively to maximize safety and minimize environmental impact while allowing them to operate their plants optimally. Prevention, early detection, and mitigation are key elements to managing abnormal situations in order to reduce unplanned outages and process variability that increase production, safety, and environmental risk to plant employees and local communities. While the consortium recognizes that mechanical integrity is important to process safety, its focus on the operations team has led to an emphasis on operational integrity and human reliability to reduce the likelihood of abnormal situations and ultimately improve process safety as well as plant performance.

The relation between ASM and process safety can be illustrated with a safety pyramid (Fig. 1). At the bottom of the pyramid, unsafe behaviors can lead to near-miss events that have the potential to become process-safety incidents.

jpt-2013-12-abnormalsitfig1.jpg
Fig. 1—Illustration of the safety pyramid in process-safety management.

Operational Failures and Human Reliability

In an effort to improve the understanding of the impact of ineffective operations practices and management systems on operations and process safety, the ASM Consortium conducted a systemic root-cause analysis of existing major-incident reports. Thirty-two incident reports were gathered from public documents and from consortium member companies. The study team defined an operations failure as a practice flaw that, if corrected, could have prevented the incident or mitigated its impact. While several of the reports were from the oil and gas sector, the majority of the incident reports were from the downstream and chemicals sectors.

Table 1 shows the top 10 common failure modes across all 32 incidents. The failure modes (first column) are presented as methods of redressing the problems, and these are rank ordered on the basis of the frequency of occurrence across the incident reports—that is, with the most frequent at the top and the least frequent at the bottom. The observed frequency of each failure mode is listed in the second column. The last column of the table shows the percent contribution of each failure mode relative to all observed failure modes across the sample of incident reports. The top 10 failure modes accounted for 70% of the total number of failure modes across all incident reports. The top three failure modes were hazard analysis and hazard communication, first-line leadership, and continuous-improvement systems, which accounted for 38% of the failures identified. Each operations failure was categorized in terms of failure of one of the effective operations practices, such as ineffective first-line leadership.

jpt-2013-12-abnormalsittab1.jpg

Case Studies

Effective Communications: Structured Checklist for Effective Shift Hand-over. In this case study, the benefits of a structured shift handover were evaluated, and minimum requirements for an effective handover were identified. The main difference between the two structured logbooks evaluated was that the checklist-integrated logbook had an additional 16 subcategory items (i.e., checklist items) within the main logbook categories for operators to consider when making logbook entries, compared with the two additional subcategories found in the current site semistructured logbook. Specifically, the study examined the relative effectiveness of operator communications at shift handover when using an integrated checklist and structured logbook compared with using a structured logbook alone. The study conclusion was that the use of the checklist-integrated logbook to organize the shift handover increases the effectiveness of communication.

Effective Alarm-Flooding Response: Strategies. Another study evaluated user-interface techniques for improving operators’ ability to handle plant disturbances that generate alarm floods. Specifically, the research study compared operator performance in responding to alarm-flood situations with three different alarm summary displays and an algorithm for adjusting the rate of alarm presentation (i.e., alarm-load throttling). The three summary displays differed in the visual presentation of alarm information in the summary pane of the display. Operators were asked to interpret alarms in several flood scenarios generated from the site historical alarm database. The alarm-flood scenarios generated 85 to 316 alarms (consisting of 13 to 28 unique alarms) within a 10-minute period. After 2 hours of familiarity practice sessions, all operators responded to four alarm-flood scenarios with either the traditional alarm summary display or one of the two ASM alarm summary displays. For each display type, the operator had one scenario with alarm-load throttling and one without.

Operator performance in response to the alarm-flood scenarios was evaluated through two metrics: (1) awareness of the location of an alarm in terms of the process area (i.e., orienting) and (2) understanding of the underlying abnormal process condition (i.e., evaluating). For both of these metrics, overall accuracy was calculated by use of a weighted composite score of hits, misses, and false positives relative to an expert’s analysis of the alarm-flood scenario. The analysis of the effectiveness of the operator-interface techniques used an objective measure of alarm-response performance. A consistent, significant positive effect was found across all measures for the alarm-load-throttling technique. Operator alarm-response performance was more accurate, as shown in the 190% improvement in awareness of the location of alarm conditions and 46% improvement in understanding the underlying abnormal condition associated with the alarms, when the ASM technique of alarm-load throttling was used to adjust the rate of alarm presentation.

Ultimately, it was found that operators were more successful, regardless of user-interface design, when they used a response strategy that used the summary view to determine what equipment area required attention, selecting just that area for viewing in the alarm list, evaluating the pattern of alarms in that area, then acknowledging all alarms shown (as opposed to acknowledging individual alarms).

Effective Operator Situation Awareness: Designing At-a-Glance Span of Control-Overview Displays. A final case study concerned effective design of overview displays, focusing on the value of applying the ASM interaction-requirement (IR) -analysis methodology to the design of an overview display relative to the traditional approach. Specifically, the research study compared operator performance using three overview displays that differed in design on the basis of two dimensions: (1) the method of identifying critical variables to present and (2) the method of presenting variables on displays. For the first dimension, two methods were used to identify critical variables. The traditional (TR) method is based on the industry standard practice of having an engineer or an operator design an overview display on the basis of what they believe are the critical variables based on process design or operating experience. The IR method stresses human-centered design methods of eliciting the critical-information needs of the operator for the task of proactive monitoring, based on a range of experience.

The second display-design dimension was based on the dynamic shapes and the layout of those shapes in the display itself. There were two approaches for the dynamic shapes—either a numerical indicator (N) typically used in industry for most operating displays, or qualitative gauges (Q) inspired by the ASM Visual Thesaurus design research. Three overview-display designs were developed: numeric traditional, numeric IR-based, and qualitative IR-based (QIR). The study compared the abnormal-situation responses of six operators using the three overview-display types on a simulation-based console with four screens, where the top two screens presented the overview displays and the bottom two screens were used to access their existing operating-schematic displays.

Operator performance was faster, as shown in the 29% reduction in abnormal situation-detection time, when using the QIR overview display rather than the two TR overview displays. Operator performance was more accurate, as shown in the 100% improvement in likelihood of detecting the plant upset using the same comparison. Most of the benefits of faster abnormal-situation detection derived from the ASM method of presenting variables (Q), and one-third of the benefits derived from the ASM method of identifying critical variables.

This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 166584, “Abnormal-Situation Management and Its Relevance to Process Safety for Offshore Operations,” by Dal Vernon Reising and Peter Bullemer, Human Centered Solutions, and Bruce Colgate (retired), prepared for the 2013 SPE Offshore Europe Oil and Gas Conference and Exhibition, Aberdeen, 3–6 September. The paper has not been peer reviewed.