Loading documents preview...
Reliability-Centered Maintenance
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 1
Reliability-Centered Maintenance Version: RCM Fundamentals Training.doc Copyright © Meridium, Inc. 2008 All Rights Reserved This training material is provided under a license agreement containing restrictions on use and disclosure. All rights, including reproduction by photographic or electronic process and translation into other languages, are reserved by Meridium. Meridium is a registered trademark of Meridium, Inc.
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 2
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 3
Reliability-Centered Maintenance
Table of Contents Table of Contents ............................................................................. 4 Foreword ....................................................................................... 7 Reliability-centered Maintenance ......................................................... 10 RCM-DO-01 Fundamentals of Managing Maintenance................................. 11 The Expectations of Maintenance ..................................................... 11 Understanding Failure ................................................................... 12 The Objective of Maintenance ......................................................... 15 What is RCM?.............................................................................. 16 The RCM Structure ....................................................................... 17 Functions .................................................................................. 18 The FMEA.................................................................................. 19 Consequences............................................................................. 20 Failure Management Strategies ........................................................ 22 Default Actions ........................................................................... 24 RCM-DO-02 Preparing for Analysis....................................................... 25 RCM-DO-03 Functions and Functional Failures ........................................ 26 Operating Context ....................................................................... 27 Writing Functions ........................................................................ 28 Performance Standards ................................................................. 29 Exercises .................................................................................. 31 Secondary Functions..................................................................... 32 RCM-DO-03b Air Conditioner ........................................................... 33 Functional Failures ...................................................................... 36 Failed States .............................................................................. 37 Exercise ................................................................................... 38 Exercises .................................................................................. 39 RCM-DO-04 Failure Modes and Effects .................................................. 40 Reasonably Likely ........................................................................ 41 Causality .................................................................................. 42 Writing a Failure Mode .................................................................. 44 Types of Failures ......................................................................... 45 The Problem with Data.................................................................. 46 Effects ..................................................................................... 48 RCM-DO-05 Consequences and Effectiveness .......................................... 50 Hidden or Evident?....................................................................... 52 Safety ...................................................................................... 54 Environmental ............................................................................ 55
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 4
Reliability-Centered Maintenance
Operational ............................................................................... 56 Repair Only ............................................................................... 57 RCM-HO-05a Assigning Consequences ................................................. 58 Applicable and Effective ................................................................ 61 Tolerable levels of Risk ................................................................. 62 Hidden Failures........................................................................... 64 The Famous Pump Example ............................................................ 67 Exercise 1 ................................................................................. 78 Exercise 2 ................................................................................. 79 Exercise 3 ................................................................................. 80 Case Study - BP refinery Incident...................................................... 82 Managing Safety and Environmental Consequences................................. 83 Economic Consequences ................................................................ 84 RCM-DO-06 Applicability and Task Selection .......................................... 87 Types of Maintenance ................................................................... 90 Preventive Maintenance (PM’s) ........................................................ 91 Predictive Maintenance ................................................................. 94 Detective Maintenance.................................................................101 Exercise 1 – Task Categories...........................................................102 Exercise 2 – Which type of maintenance? ...........................................103 The Basis of Task Preference..........................................................104 RCM-DO-06c Uses of MTBF...............................................................105 What MTBF can tell us?.................................................................105 At what level can we apply MTBF? ...................................................106 How can MTBF add value to Reliability Initiatives? ................................108 Summary .................................................................................110 RCM-DO-06d Advanced Detective Maintenance Techniques........................112 Exercise 1 – Steam Turbine ............................................................121 Exercise 2 – Steel Plant ................................................................122 Common Cause Failure Modes.........................................................123 Exercise 4 - Hoist .......................................................................125 Options for redesign ....................................................................126 Multiple Redundant Devices ...........................................................130 Exercise 5 – Pumps and PSV’s .........................................................130 Managing Risk in Hidden Failures .....................................................132 Voting Systems ..........................................................................133 Economic Consequences ...............................................................134 Exercise 6 – Economic Hidden Failures ..............................................138 RCM-DO-07 The Value of RCM...........................................................139 The Cashable Results of RCM..........................................................139 The Non-cashable Results of RCM ....................................................146 The Principal Barrier to Value Realization ..........................................148 The Role of the RCM Facilitator/Analyst ............................................149 © Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 5
Reliability-Centered Maintenance
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 6
Reliability-Centered Maintenance
Foreword The Reliability-Centered Maintenance (RCM) approach was first documented in the detailed book on the subject by F. Stanley Nowlan, Director, Maintenance Analysis, and Howard F. Heap, Manager, Maintenance Program Planning, both of United Airlines1. The book was sponsored by the Office of the Assistant Secretary of Defense (Manpower, Reserve Affairs and Logistics) and was published in 1978. From that book: For years maintenance was a craft learned through experience and rarely examined analytically. As new performance requirements led to increasingly complex equipment, however, maintenance cost grew accordingly. By the late 1950's the volume of these cost in the airline industry had reached a level that warranted a new look at the entire concept of preventive maintenance. By that time studies of actual operating data had also begun to contradict certain basis assumptions of traditional maintenance practice. One of the underlying assumptions of maintenance theory has always been that there is a fundamental cause-and-effect relationship between scheduled maintenance and operating reliability. This assumption was based on the intuitive belief that because mechanical parts wear out, the reliability of any equipments directly related to operating age. It therefore followed that the more frequently equipment was overhauled, the better protected it was against the likelihood of failure. The only problem was in determining what age limit was necessary to assure reliable operation. In the case of aircraft it was also commonly assumed that all reliability problems were directly related to operating safety. Over the years, however, it was found that many types of failures could not be prevented no matter how intensive the maintenance activities. Moreover, in a field subject to rapidly expanding technology it was becoming increasingly difficult to eliminate uncertainty. Equipment designers were able to cope with this problem, not by preventing failures, but by preventing such failures from affecting safety. In most aircraft essential functions are protected by redundancy features which ensure that, in the event of a failure, the necessary function will still be available from some other source. Although fail-safe and "failure-tolerant" design practices have not entirely eliminated the relationship between safety and reliability, they have dissociated the two issues sufficiently that their implications for maintenance have become quite different. A major question still remained, however, concerning the relationship between schedule maintenance and reliability. Despite the time-honored belief that reliability was directly related to the intervals between scheduled overhauls, searching studies based on actuarial analysis of failure
1
F. Stanley Nowlan and Howard F. Heap, Reliability Centered Maintenance, United Airlines and Dolby Press, sponsored and published by the Office of Assistant Secretary of Defense, 1978
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 7
Reliability-Centered Maintenance
data suggested that the traditional hard-time policies were, apart from their expense, ineffective in controlling failure rates. This was not because the intervals were not short enough, and surely not because the teardown inspections were not sufficiently through. Rather, it was because, contrary to expectations, for many items the likelihood of failure did not in fact increase with increasing operation age. Consequently a maintenance policy based exclusively on some maximum operating age would, no matter what the age limit, have little or no effect on the failure rate. In 1960 a task force of FAA and airline personnel was formed to investigate scheduled maintenance and resulted in an FAA/Industry Reliability Program in 1961. Building upon this work, in 1965 United Airlines developed a rudimentary decision-diagram technique. This technique was refined and embodied in the 747 Maintenance Steering Group (MSG) Handbook: Maintenance Evaluation and Program Development (MSG-1) from the Air Transport Association in 1968. MSG-1 was used to develop the maintenance program for the Boeing 747, the first maintenance program to apply RCM concepts. Subsequent improvements led to MSG-2, which was used to develop the maintenance programs for the Lockheed 1011 and the Douglas DC-10. A similar document, European Maintenance System Guide, served as the basis for development of the initial programs for the Concorde and the Airbus A-300. The objective of the approach outlined in MSG-1 and MSG-2 was to develop a scheduled maintenance program that assured the maximum safety and reliability of equipment at the lowest cost. An example of the success of this approach can be seen comparing the Douglas DC-8, which had a scheduled overhaul of 339 items in a traditional maintenance program to the DC-10, based upon MSG-2, which only had seven items to be overhauled. The latest commercial aircraft maintenance guidance is based upon MSG-3 (Rev 2) for the Boeing 757 and 767 aircraft. In the early 1970's this work attracted the attention of the office of the Secretary of Defense. The Navy was the first military organization to apply RCM to both new design and in-service aircraft. Also in the early 1970's, the Navy embarked on a major program to change the way nuclear submarines were maintained. Over the next 20 years the Navy would virtually eliminate scheduled overhaul on the nuclear submarine based upon an aggressive Condition Monitoring Program and other technical advances to the ship systems. RCM is currently being used on all new ship designs. The RCM methodology has subsequently been applied in a wide variety of commercial and military applications. The Electric Power Research Institute (EPRI) has tested the methodology at several nuclear power utility sites of Florida Power & Electric, Duke Power, and Southern California Edison. Puget Sound Power and Light Co. has been using RCM since 1991 in both substations and line maintenance. NASA has long used RCM in analyzing Space Shuttle and
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 8
Reliability-Centered Maintenance
Shuttle Support Systems. In the early 1990's, NASA embarked on a process of basing the approach to facilities maintenance on RCM. And in 1995, Boeing Commercial Airplane Group embraced RCM as one of the tools in implementing a more robust and standardized facilities maintenance program.2 This was significant in that one of the key groups in fomenting RCM in complex systems (Boeing Aircraft) was now applying the approach to common industrial facilities equipment. More recently, issues surrounding RCM seem more focused on applying the technique and less on proving its value. Must a group perform a classical/rigorous analysis, or is a more streamlined approach acceptable? An excellent article regarding the variations in the methodology was presented at the 2003 International Maintenance Conference.3 Regardless of the approach selected, the outcome of RCM analysis is focused on selecting the most effective maintenance strategy and, when maintenance can not deliver the needed reliability, identifying redesign requirements.
2
Westbrook, Dennis, Boeing Commercial Airplane Group, and William H. Closser, C&A Consulting, “Transition of an Organization to a Reliability Based Culture”, Proceedings of 14th Annual International Maintenance Conference, August 3-7, 1997, Atlanta, GA 3 Nicholas, Jack R. “The Controversy about Reliability Centered Maintenance Methodology, Its Variants and Derivatives”, Proceedings of the 18th International Maintenance Conference, Dec. 7-10, 2003, Clearwater, FL.
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 9
Reliability-Centered Maintenance
Reliability-centered Maintenance
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 10
Reliability-Centered Maintenance
RCM-DO-01 Fundamentals of Managing Maintenance The Expectations of Maintenance
• Productivity – How much are we producing?
• Cost-Effectiveness – What is it costing us to do so?
• Safety & Environment – Are we hurting anybody or damaging the environment in the process?
• Quality – Are we producing at a consistent high level of quality?
• Corporate Learning – How can I make sure that I will be able to sustain/improve this into the future?
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 11
Reliability-Centered Maintenance
Understanding Failure The “Wear-out” Curve
The Thebelief beliefthat thatall allassets assetshave haveaa“life”. “life”.That Thatisis–– aaperiod periodofoffew fewrandom randomfailures failuresfollowed followedby byaa wear out zone. wear out zone.
Eventually Eventuallypeople peoplestarted startedto tobelieve believethat thatmany many assets actually suffered early life failures. assets actually suffered early life failures. The The“bath-tub” “bath-tub”curve curvemakes makesup upthe thebasis basisofof many engineers beliefs in asset performance many engineers beliefs in asset performance
The “Bathtub” Curve
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 12
Reliability-Centered Maintenance
•
Only 11% of failures were related to age… 89% had no direct correlation with the age of the assets at all! • And only 6% had a wear out curve
•
So what? • If our maintenance schedule has been developed based on principles of “life” then we are achieving…? Or worse… • 64% of failures were infant mortality failures..
The 6 Failure Patterns
A.
4%
B.
2%
C.
5%
D.
7%
•
14% of all failures were seen as random, therefore we are often doing absolutely nothing to manage these!
E.
14%
•
F.
66%
Do different assets fail differently? • Complex assets… • Simple assets have dominant failure modes (Wear, erosion, corrosion, evaporation etc)
•
Regardless of the status in your industry – it will increase, as automation, mechanization and asset complexity increases.
•
# Reliability-Centered Maintenance, (Nowlan and Heap) Exhibit 2:13 Age related Patterns
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 13
Reliability-Centered Maintenance
UAL 1968
Broberg 1973
MSP 1982
A
4%
3%
3%
B
2%
1%
17%
C
5%
4%
3%
D
7%
11%
6%
E
14%
15%
42%
F
68%
66%
29%
The 6 Failure Patterns
# U.S. Navy Analysis of Submarine Maintenance Data and the Development of Age and Reliability Profiles - Timothy M. Allen, Department of the Navy
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 14
Reliability-Centered Maintenance
The Objective of Maintenance Initial Capability (What it can do) Margin for Deterioration
Performance
Desired Performance (What its users want it to do)
• So, if the objective of maintenance is to keep the asset running between what it “can” do and what the users “want” it to do. Then we must: – First, define what the users want the asset to do in its present operating context – Second, determine if the asset is able to meet these requirements – Third, determine the maintenance interventions required
# SAE JA1012 Figure 2 # SAE JA1012 Section 6.2 Performance Standards
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 15
Reliability-Centered Maintenance
What is RCM?
RCM is a process to ensure that assets continue to meet their user requirements in their present operating context. ~John Moubray RCM applies to any equipment where there is a need to realise maximum operating reliability at the lowest cost ~ Stan Nowlan and Howard Heap
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 16
Reliability-Centered Maintenance
The RCM Structure 1. What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions) 2. In what ways can it fail to fulfil its functions? (Functional Failures) 3. What causes each functional failure? (Failure Modes) 4. What happens when each failure occurs? (Failure Effects) 5. In what way does each failure matter? (Failure Consequences) 6. What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals) 7. What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 17
Reliability-Centered Maintenance
Functions
The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context?
(Functions) (All Functions)
2.
In what ways can it fail to fulfil its functions? (Functional Failures)
3.
What causes each functional failure? (Failure Modes)
4.
What happens when each failure occurs? (Failure Effects)
5.
In what way does each failure matter? (Failure Consequences)
6.
What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)
7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 18
Reliability-Centered Maintenance
The FMEA
The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)
2.
In what ways can it fail to fulfil its functions?
(Functional Failures) 3.
What causes each functional failure?
(Failure Modes) 4.
All failed states, causes of failure, and the effects of each failure
What happens when each failure occurs?
(Failure Effects) 5.
In what way does each failure matter? (Failure Consequences)
6.
What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)
7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 19
Reliability-Centered Maintenance
Consequences
The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)
2.
In what ways can it fail to fulfil its functions? (Functional Failures)
3.
What causes each functional failure? (Failure Modes)
4.
What happens when each failure occurs? (Failure Effects)
5.
In what way does each failure matter?
(Failure Consequences)
How it matters
6.
What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)
7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 20
Reliability-Centered Maintenance
HN
HO
HE No
Does the failure have a direct adverse effect on operational capability?
No
Yes
Yes Predictive Task
Is a Preventive Restoration task technically feasible and effective?
HO2 HN2
Yes No
HN3
Yes Preventive Replacement Task
Is a Detective task to detect the failure technically feasible and effective? Yes
HO4 HN4
Detective Task
HE2
No
Is a Preventive Replacement task technically feasible and effective?
HE3
Preventive Replacement Task
Run-to -Fail
Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?
ES
Yes
RCM Decision Algorithm Based on Example 2 SAE JA1012
Yes Preventive Restoration Task
EE
Is there an intolerable risk that the failure could kill or injure someone?
No
EO
Is there an intolerable risk that the failure could breach a known environmental standard or regulation?
Yes
ES1
Yes Predictive Task
Is a Preventive Restoration task technically feasible and effective?
ES2
Yes No
ES3 No
EE3
Preventive Yes Replacement Task
No
Is a Preventive Restoration task technically feasible and effective?
EN2
Yes Preventive Restoration Task
No
Is a Preventive Replacement task technically feasible and effective?
EO3 No
No
Yes Predictive Task
EN1
EO2
Preventive Restoration task
Is a Preventive Replacement task technically feasible and effective?
Yes
Yes Is a Predictive task technically feasible and effective?
EO1 No
EE1
EE2
Does the failure have a direct adverse effect on operational capability?
Yes
Is a Predictive task technically feasible and effective?
EN
No
EN3
Preventive Replacement Task
Yes No
Is a Detective task to detect the failure technically feasible and effective? Yes
HS4 No
No
HE4
Detective Task
No
Yes
HO5 HN5
Yes Predictive Task
Is a Preventive Restoration task technically feasible and effective?
HS3 No
Yes
HE1
HS2
Preventive Restoration Task
Is a Preventive Replacement task technically feasible and effective?
HO3
No
Is there an intolerable risk that the failure could kill or injure someone?
Is a Predictive task technically feasible and effective?
HS1 No
HN1
HS No
Yes
Is a Predictive On-Condition task technically feasible and effective?
HO1
Is there an intolerable risk that the failure could breach a known environmental standard or regulation?
ES4 Run-to-Fail ?
EE4
Yes Combination of tasks
Is a combination of tasks technically feasible and effective?
Yes
EO4 EN4
Run-to -Fail
Run-to-Fail ?
No No
HO6 HN6
Redesign may be desirable
No
HS5
Redesign is compulsory
HE5
ES5 EE5
Redesign is compulsory
EO5
Redesign may be desirable
EN5
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 21
Reliability-Centered Maintenance
Failure Management Strategies
The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)
2.
In what ways can it fail to fulfil its functions? (Functional Failures)
3.
What causes each functional failure? (Failure Modes)
4.
What happens when each failure occurs? (Failure Effects)
5.
In what way does each failure matter? (Failure Consequences)
6.
What should be done to predict or prevent each failure?
(Proactive Tasks and Task Intervals) 7.
Each task must be applicable and effective
What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 22
Reliability-Centered Maintenance
What types of maintenance are there? RCM Term
Alternative Term
What it is…
Abbreviations
Predictive Maintenance
• On-Condition Maintenance • Condition Based Maintenance (CBM) • Condition Monitoring (CM) • Inspections
Check an item for signs of potential failures and leave it in place on the condition that it will make it to it’s next inspection interval.
PTIVE
Preventive Restoration
• Overhaul • Scheduled Restoration • Restorative tasks • Rework
A task to restore an assets original resistance to failure prior to its failure, this is a preventive task
PRES
Preventive Replacement
Replacement Overhauls (Also)
A task to replace an asset prior to its failure, this is a preventive task
PREP
Detective Maintenance
Failure finding Function testing
A task to detect whether an item has failed or not.
DTIVE
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 23
Reliability-Centered Maintenance
Default Actions The The Seven Seven Questions Questions of of RCM RCM (SAE (SAE JA1011 JA1011 5a. 5a. -5g. -5g. 2002 2002 ))
1.
What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)
2.
In what ways can it fail to fulfil its functions? (Functional Failures)
3.
What causes each functional failure? (Failure Modes)
4.
What happens when each failure occurs? (Failure Effects)
5.
In what way does each failure matter? (Failure Consequences)
6.
What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)
7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
Determine the actions to be taken if routine maintenance cannot performed
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 24
Reliability-Centered Maintenance
RCM-DO-02 Preparing for Analysis What asset or system…? • Before we know what a system or sub-system can do…we need to know exactly what the system contains… If we go too high.. We risk de-motivating the team and creating superfluous analyses… RCM is best performed at a system level. However, it can be performed at an equipment level in special circumstances.
Plant Plant
Process 1 Process 1
If we go too low.. We risk paralysis by analysis…
Process 2 Process 2
Electrical System Electrical System
Process 3 Process 3
Mechanical Assets Mechanical Assets
Process 4 Process 4
Instrumentation Instrumentation
Fixed Equipment Fixed Equipment
Centrifugal Pump AC 3 phase motor Hydraulic Motor Chain Conveyor Rotary Valves
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 25
Reliability-Centered Maintenance
RCM-DO-03 Functions and Functional Failures The Seven Questions of RCM (SAE (SAE JA1011 5a. 5a. -5g. 2002 )) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context?
(Functions)
We will cover… • Operating context • Types/Categories of Functions • How to write a function statement
2.
In what ways can it fail to fulfil its functions? (Functional Failures)
3.
What causes each functional failure? (Failure Modes)
4.
What happens when each failure occurs? (Failure Effects)
5.
In what way does each failure matter? (Failure Consequences)
6.
What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)
7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
SAE SAEJA1011 JA10115.1.2 5.1.2All AllFunctions Functionsof of the asset/system shall be identified the asset/system shall be identified (all (allprimary primaryand andsecondary secondaryfunctions functions including the functions of including the functions ofall all protective protectivedevices)” devices)”
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 26
Reliability-Centered Maintenance
Operating Context 1. Duty Cycles… 2. Weather and the immediate Environmental…
Our car is a Ford Focus. Great car…we maintain it to the manufacturers specifications…
3. Applicable regulations and laws… 4. Asset Configuration… 5. Remoteness…
…but they don’t!
Why?
6. How it is managed… 7. Public perceptions… 8. Budget restraints… 9. Skills available… 10. Any other factor that determine how we use the asset (s) or system
The TheOperating OperatingContext Contextofofany anyasset assettells tells you how that asset is operated. you how that asset is operated. This Thiswill willinfluence influencehow howwe wemaintain maintainit.it. ItItdoesn’t doesn’ttell tellyou youwhat whatthe theasset assetcan cando, do,or or want it to do…. what we what we want it to do….
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 27
Reliability-Centered Maintenance
Writing Functions
Writing Functions SAE JA1011, 5.1.3 - All functions shall contain a verb, an object and a performance standard (quantified in every case where this is done)
Pump can deliver up to 1000 l/minute
Off take from Tank 800 l/ minute
Y X
We Weaccept acceptthat that“times “timesarrow” arrow”means meansthat thatassets assets will deteriorate. will deteriorate. Performance Performancestandards, standards,tell tellus usthe theminimum minimum level of performance acceptable to the level of performance acceptable to theusers usersor or owners of the asset. owners of the asset.
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 28
Reliability-Centered Maintenance
Performance Standards (What it can do)
4. Total
Margin for Deterioration
1. Between Limits 2. Specific
Performance
What its users want it to do 3. Varying – Up To 6. Open
One or more criteria for performance 5. Multiple Up to 800 l/minute
At 100 bar # SAE JA1012 Figure 2 # SAE JA1012 Section 6.2 Performance Standards
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 29
Reliability-Centered Maintenance
These are NOT Functions! (Why?) • • • •
To be safe… To be reliable… To comply with environmental standards… To comply with IE2314356XXX (etc)…
Performance Performance standards standards need need to to be be quantified quantified where possible to avoid ambiguity. where possible to avoid ambiguity. E.g. E.g. What What is is reliable, reliable, and and who who says says so? so? © Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 30
Reliability-Centered Maintenance
Exercises
Primary Function Statements The reason why an asset is purchased in the first place. SAE JA1011, 5.1.3 - All functions shall contain a verb, an object and a performance standard (quantified in every case where this is done)
• A light fitting in an office… • An office chair… • A projector used in presentations… • A pushbike for you to ride to work on…
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 31
Reliability-Centered Maintenance
Secondary Functions Secondary Functions (SAE JA1012 6.2.2)
Secondary functions are all the other requirements we have of the asset (s) that are not covered by the primary function. Environmental Integrity Safety / Structural Integrity Control / Containment / Comfort Appearance Protective Devices and Systems Economy and Efficiency Superfluous The primary Function of an office chair was given as “To support a person weighing up to 150 kilograms in a seated position” What are the secondary functions? © Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 32
Reliability-Centered Maintenance
RCM-DO-03b Air Conditioner An office is located in an extremely hot environment where the average annual temperature ranges between 30oC (86oF) and 41oC (105oF). They have installed an air-conditioning system that will, at maximum output, maintain a temperature differential of 20oC (+/-0.56oC) between the outside ambient air and the inside office air. It will also dehumidify the air to a level of 45% (+/4%). The office is approximately 914m2 (3000ft2); the air conditioning unit will provide six BTU (British Thermal Units) of cooling. Operational Description The system is very simple and consists of a reciprocating piston compressor, a condenser, a thermal expansion valve, and an evaporator. A three-phase electric squirrel cage motor drives the compressor via four parallel v-belts. A guard is in place to stop people touching the belts while they are in use. Setting air conditioning temperatures can be very individual and is almost never without complaints. Over the years the company has determined that a temperature in the range of 19oC (~66oF) and 23oC (~73oF) is the most comfortable to work at, and causes the least amount of arguments. The thermostat is set to 21oC (~70oF), and they would like it to not exceed 23oC, or to not go below 19oC. The compressor is oil lubricated, and compresses a standard refrigerant gas, which is a known greenhouse gas. Any release of the refrigerant breaches a number of environmental regulations. It takes low-pressure superheated gas from the evaporator, compresses it to high-pressure superheated gas, and pushes it through the condenser. A draft over the condenser coils by comes from a three phase electric fan, which removes the heat and changes the high-pressure vapor to a high pressure liquid. When the condenser is working well there is a temperature differential of 3.1oC (10oF) across the condenser. De-superheated high pressure liquid leaves the condenser in the liquid line to the thermal expansion valve (TX valve). The TX valve regulates the flow of high-pressure liquid refrigerant into the evaporator coil. It is designed to open just enough to let refrigerant flow while maintaining a high pressure differential from its inlet to its outlet. The pressure at the exit of the expansion valve is low enough that it initiates a phase change in the liquid refrigerant to a vapor.
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 33
Reliability-Centered Maintenance
A three phase motor forces draft air over the evaporator coils and superheats the vapor. This creates the cooling effect. Both the evaporator fan and the condenser fan have lightweight steel cowls to stop foreign objects from damaging the fan blades. The refrigerant then leaves the evaporator as a superheated gas and reinitiates the process again with the compressor. Any failure of the evaporator means that there is a possibility of liquid entering the compressor, destroying the internal components. When the evaporator is working well there is a temperature differential of 3.1oC (10oF) across its coils. The electric motor drives of the compressor and the evaporator have thermal overloads that will trip the circuit if the full load current (FLA) reaches 125%, the condenser fan has protection of 115% of FLA. The company has local research reports that show that bacteria, viruses and fungi tend to thrive in that part of the world when the humidity is greater than 47%. Similar “wellness” reports have shown that workers in an office environment are most comfortable between 30% and 44%. If the humidity is too low workers offer suffer from dry eyes, increased static and it feels colder than it is. Too high and workers feel very uncomfortable and feel hotter than it is. The air conditioner typically needs to run for 8-10 minutes before the dehumidification process can commence. At its present design capacity, it will run for 100% of the time in summer, and 40-50% of the time during other seasons in this climate. However, if the thermostat fails, and stops the compressor at temperatures above its set point, then this will cause short run times, and will not allow the unit to dehumidify the air in the office space. The company using this unit has other similar systems installed in other offices and finds them to be reliable and economical to install and to run. However, discussions with the manufacturer and a study of the history of similar systems have produced the following list of common failures. a) Condenser fins flattened, preventing forced airflow over the condenser coils. (Installation errors) b) Evaporator fins flattened, preventing forced airflow over the evaporator coils. (Installation errors) c) Clogging of the TX valve, causing a total failure of the system (Normally occurs every 2 years) d) Wear out of the valves within the compressor. (Normally once every 5 years) e) Failure of the thermostat, meaning it will not trip at all (once every 4 years), or it will trip at temperatures greater than the set point. (once every 6 years)
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 34
Reliability-Centered Maintenance
While these are common failure modes, they do not include all of the likely failure modes. For example, the drive motors for the compressor, the condenser, and the evaporator are all standard threephase squirrel cage electric motors and suffer from the failure modes that generally occur in these types of motors.
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 35
Reliability-Centered Maintenance
Functional Failures
The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)
2.
In what ways can it fail to fulfil its functions?
(Functional Failures) 3.
What causes each functional failure?
(Failure Modes) 4.
What happens when each failure occurs?
(Failure Effects) 5.
In what way does each failure matter? (Failure Consequences)
6.
What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)
7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 36
Reliability-Centered Maintenance
Failed States
Failed States Functional Failures indicate failed states – “How” it is unable to do what we want it to. •
We need to define all of the Failed States for every function. – Failed states are derived directly from the function statements and their performance standards – Generally cover too much, too little (partial) and not at all…(total)
•
To pump water from tank A to tank B at up to 800 l/minute (Varying) – Unable to pump at all – Pumps at more than 800 l/minute (?)
•
To pump water from tank A to tank B at between 800 l/minute and 1000 l/minute (Multiple) – Unable to pump at all – Pumps at less than 800 l/minute – Pumps at more than 800 l/minute
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 37
Reliability-Centered Maintenance
Exercise The primary function of a grinding machine may be listed as: “To grind bearing journals in a cycle time of 3.00 minutes ± 3 seconds, to a diameter of 75 mm ± 0.1 mm, with a surface finish of no greater than Ra 0.2.” 0.05
75 mm
0.05
0.05
0.05
3,06 3.03 3 minutes 2.57 2.54 # SAE JA1012 Section 7.2
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 38
Reliability-Centered Maintenance
Exercises
To start pumping water from tank A to tank B at a volume of 800l/minute, at a pressure of 100 bar, when the water level is at the low level switch and to stop when it reaches the high level switch
High Level
Low Level
100 bar 800l/minute
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 39
Reliability-Centered Maintenance
RCM-DO-04 Failure Modes and Effects The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)
2.
In what ways can it fail to fulfil its functions? (Functional Failures)
3.
What causes each functional failure?
(Failure Modes) 4.
What happens when each failure occurs?
(Failure Effects) 5.
In what way does each failure matter? (Failure Consequences)
6.
What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)
7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 40
Reliability-Centered Maintenance
Reasonably Likely
Reasonably Likely Pump struck by lightning
North West Australia – reasonably likely The Atacama Desert in Chile – Highly unlikely
Pump Stolen
Mexico – Reasonably Likely The USA - Unlikely
Supply cable insulation deteriorated due to sun exposure
Saudi Arabia – Reasonably Likely The UK – Not likely
Levels Levels of of reasonableness reasonableness determined determinedby bythe the analysis group….. analysis group….. IfIf no no agreement agreementisispossible possible then thenthe theorganization organization that owns the assets must make a decision that owns the assets must make a decision © Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 41
Reliability-Centered Maintenance
Causality Level 1? Unable to pump water at all 1. Motor Fails 2. Pump Fails 3. Pipes Fail 4. Inlet to tank B blocked 5. Outlet from tank A blocked
… or Level 3? Unable to pump water at all 1. Drive end bearing fails due to ingress of water 2. Drive end bearing fails due to lack of adequate grease 3. Drive end bearing fails due to misalignment
… or Level 2? Unable to pump water at all 1. Motor Fails due to stator earth fault 2. Motor fails due to short between the coils 3. Motor fails due fan end bearing failure 4. Motor fails due to drive end bearing failure 5. Motor fails due to overheating 6. Motor fails due to loose connections
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 42
Reliability-Centered Maintenance
How far is far enough? Level 1
Motor stops
Level 2
Due to failed drive end bearing
Level 3
Level 4
Due to lack of grease
Due to inadequate training of the lubrication technician
Due to the wrong grease
Due to improper purchasing controls
Level 5
Due to lack of communication between maintenance and purchasing
Level 6
Level 7
Due to former differences between department managers
Due to inadequate training of the lubrication technician Due to misalignment during installation
Due to poor installation procedures
Due to incorrect procedure writing procedures
Due to inadequate tools
Due to poor purchasing controls
Due to lack of communicatio ns between maintenance and purchasing
Due to former differences between department managers
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 43
Reliability-Centered Maintenance
Writing a Failure Mode
Writing a failure mode • Failure modes are the reasons why something is in a failed state. • When defining failure modes first we need to understand how it has failed (the functional failure) then we determine why it has failed. • Avoid verbs like, breaks, fails, malfunctions • Use the “due to” convention and at least a noun and a verb (Not a rule – a guide) • Only one cause per failure mode Normally written something like this… Functions
Functional Failures
Failure Modes
To pump water from tank A to tank B at 800 l/minute
Unable to pump water from tank A to tank B
Drive end motor bearing failed due to lack of grease Short in motor windings due to insulation degrades over time Drive end motor bearing seized due to misalignment on installation.
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 44
Reliability-Centered Maintenance
Types of Failures What its users want it to do
What it can do
What it can do
What its users want it to do
What its users want it to do
What it can do
Wear and tear, degradation of the asset
Incorrect use, often deliberate, overloading
Not fit for purpose
Maintenance
Operations
Engineering / Purchasing
Who’s responsible for reliability? Reliability is a process… not a department! # SAE JA1012 Figure 2
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 45
Reliability-Centered Maintenance
The Problem with Data
Where does the data come from? “One of the most important contributions of the ReliabilityCentered Maintenance Program is its explicit recognition that certain types of information … are, in principle ,as well as in practice, unobtainable.” Mathematical Aspects of Reliability Centered Maintenance H.L. Resnikoff
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 46
Reliability-Centered Maintenance
Where does the data come from?
“This means … in practice and in principle, the policy must be designed without using experiential data which will arise from the failures the policy is meant to avoid.” Data 30%
Mathematical Aspects of Reliability Centered Maintenance H.L. Resnikoff
Knowledge 70% © Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 47
Reliability-Centered Maintenance
Effects
The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)
2.
In what ways can it fail to fulfil its functions? (Functional Failures)
3.
What causes each functional failure? (Failure Modes)
4.
What happens when each failure occurs?
(Failure Effects) 5.
In what way does each failure matter? (Failure Consequences)
6.
What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)
7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 48
Reliability-Centered Maintenance
Effects and Consequences •
Effects are the direct outcome of failure mode. (What happens)
•
The primary role of the effects statement is to inform us of the consequences (Why it matters) – – – –
When do we know about it, what evidence is there that it has failed? Safety implications Implications for Environmental standards and regulations Operational implications • Cost of repair • What is required to restore the function? • Time to repair (TTR)
– Any other implications such as reputation, news headlines, etcetera.
•
SAE JA1011, 5.4.1 “Failure effects shall describe what would happen if no specific task is done to anticipate prevent or detect the failure
•
They are the typical worst case scenario… not the extreme worst case scenario.
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 49
Reliability-Centered Maintenance
RCM-DO-05 Consequences and Effectiveness The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)
2.
In what ways can it fail to fulfil its functions? (Functional Failures)
3.
What causes each functional failure? (Failure Modes)
4.
What happens when each failure occurs? (Failure Effects)
5.
In what way does each failure matter?
(Failure Consequences) 6.
What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)
7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 50
Reliability-Centered Maintenance
A Hierarchy of Consequences Operational?
Environment?
On-Condition Task?
Safety?
On-Condition Task?
Preventive Restoration or Preventive Replacement?
Preventive Restoration or Preventive Replacement?
Failure Finding Task?
Failure Finding Task?
Evident or Hidden?
Safety?
Environment?
On-Condition Task?
Preventive Restoration or Preventive Replacement?
Combination of Tasks? No scheduled maintenance
Operational?
On-Condition Task?
Preventive Restoration or Preventive Replacement?
No scheduled maintenance
Redesign is Compulsory Redesign may be desirable
Redesign may be desirable Redesign is Compulsory
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 51
Reliability-Centered Maintenance
Hidden or Evident?
To Process
To Process
Consequence: Explosion of the pressure vessel when under high pressure conditions
To Process
To Process
Multiple Failure Event: Dangerous build-up of gas pressure within the pressure vessel.
To Process
To Process
Hidden-Failure: Failure of pressure release valve on high pressure vessel in a gas plant
# The Maintenance Scorecard, Daryl Mather, Industrial Press, ISBN 0831131810
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 52
Reliability-Centered Maintenance
Hidden Failures HN
HO
Does the failure have a direct adverse effect on operational capability?
HE
HS
Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?
Is there an intolerable risk that the multiple failure could kill or injure someone?
ES Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?
Is there an intolerable risk that the failure could kill or injure someone?
EE Is there an intolerable risk that the failure could breach a known environmental standard or regulation?
EO
EN
Does the failure have a direct adverse effect on operational capability?
•
RCM begins by separating hidden and evident consequences
•
By themselves, hidden failures have no consequences, requiring an additional failure before they have any tangible impact
•
The ultimate consequences of failure are often severe
•
Can be separated into Safety, Environmental and Operational consequences
•
Generally devices that provide protection for safety, the environment of operations such as; high-high level switches, over-speed switches, standby equipment, etc.
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 53
Reliability-Centered Maintenance
Safety Safety Consequences HN
HO
Does the failure have a direct adverse effect on operational capability?
HE Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?
HS Is there an intolerable risk that the multiple failure could kill or injure someone?
Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?
ES
EE
Is there an intolerable risk that the failure could kill or injure someone?
Is there an intolerable risk that the failure could breach a known environmental standard or regulation?
EO
EN
Does the failure have a direct adverse effect on operational capability?
•
Once the failure has been categorized as Hidden or Evident, the first consideration in evaluating any failure possibility is safety to life and limb.
•
Asks the team to determine whether there is an intolerable risk of death or injury
•
Will not default to run to failure under any circumstances, at all times there is a need to take some action
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 54
Reliability-Centered Maintenance
Environmental Environmental Environmental Consequences Consequences HN
HO
Does the failure have a direct adverse effect on operational capability?
HE
HS
Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?
Is there an intolerable risk that the multiple failure could kill or injure someone?
ES Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?
Is there an intolerable risk that the failure could kill or injure someone?
EE Is there an intolerable risk that the failure could breach a known environmental standard or regulation?
EO
EN
Does the failure have a direct adverse effect on operational capability?
•
Gained prominence through the 1980’s with the onset of global warming and increased environmental awareness.
•
Deal with an intolerable risk of breaking environmental standards, regulations or laws. (Internal or external)
•
Will not default to run to failure under any circumstances, at all times there is a need to take some action
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 55
Reliability-Centered Maintenance
Operational Operational Consequences HN
HO
Does the failure have a direct adverse effect on operational capability?
HE
HS
Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?
Is there an intolerable risk that the multiple failure could kill or injure someone?
ES Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?
Is there an intolerable risk that the failure could kill or injure someone?
EE
EO
Is there an intolerable risk that the failure could breach a known environmental standard or regulation?
EN
Does the failure have a direct adverse effect on operational capability?
•
Any failure consequence that has direct, or secondary, negative effect on the operations
•
Task selection is, in part, determined by cost effectiveness trade off calculations as opposed to levels of tolerable risk.
•
Includes “other” cost implications such as reputation, adverse newspaper coverage and other PR related issues.
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 56
Reliability-Centered Maintenance
Repair Only
Non-operational consequences HN
HO
Does the failure have a direct adverse effect on operational capability?
HE Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?
HS Is there an intolerable risk that the multiple failure could kill or injure someone?
Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?
ES
EE
Is there an intolerable risk that the failure could kill or injure someone?
Is there an intolerable risk that the failure could breach a known environmental standard or regulation?
EO
EN
Does the failure have a direct adverse effect on operational capability?
• Economic consequences only • Costs of repair and secondary damages
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 57
Reliability-Centered Maintenance
RCM-HO-05a Assigning Consequences HN
HO
HE
HS
Hidden or Evident?
ES
EE
EO
EN
a) A household circuit breaker continuously trips when there is no fault present. b) A hydraulic positioning unit moves a train into position under the feed hopper of a large ammonia plant. Since it installation, some ten years ago, the high pressures and extreme heat of the working environment it has caused numerous leaks, resulting in some downtime each time. (In addition, all potential fire risks). Underneath the positioning unit is a concrete bund, there to stop any hydraulic oil seeping into the ground below, breaching a number of environmental regulations. However, due to errors in the pouring of the concrete, it is allowing small quantities of hydraulic oil to pass through it every time there is a leak. What is the consequence of the failure of the concrete? c) Vibration sensors protect a forced draft fan. Their role is to protect the fan from high secondary damages stemming from unplanned bearing failure. Due to the critical nature of this asset, the company keeps a spare fan assembly. In case of any failure of the fan, the quickest way to restore the function is to replace the entire assembly. This particular fan does not have any safety consequences associated with bearing failure. They are set at 7mm/second and provide a warning light for operators so they can shut the fan down immediately. Due to a failure of the indicating bulb at the control panel, the alarm goes unnoticed when vibration reaches the alarm level. d) A wastewater plant has turbidity meters to measure the relative clarity of the effluent leaving the plant into the local river system. High percentages of microscopic particles will cause the effluent to be excessively “cloudy”, the turbidity meter then adjust the dosing earlier in the process to reduce the impact on the environment. Over time, the calibration of this meter has drifted, so much so that the effluent leaving the plant contains a high percentage of microscopic solids, breaching a number of environmental laws and regulations, as well as adversely affecting the wildlife in the area. e) Over time, the brake pads in a car wear down; meaning that the car will not stop when required. The result was an accident when attempting to stop at a red light.
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 58
Reliability-Centered Maintenance
f) A speed sensor protects a turbine from over-speed, preventing it from speeding up to destruction, sending debris in every direction. The sensor has failed in such a way that it will not trip the turbine on over speed. g) A standby motor to drive a pumping system has developed false brinnelling (flat spots) of the bearings. This means that that when it is called on to run it will run for a short while before tripping the motor on overload. If it runs continually in this fashion, it could also cause secondary damage to the motor shaft. h) When the level in a tank reaches the low level, the low-level switch starts a pump. Because of vibration in the surrounding area, one of the terminals comes loose and the switch will not work when it is required to. i) A pumping system has a duty and a standby pump. The stand by pump takes over the function if ever the duty pump should fail. Over time, the resistance of the insulation within the duty motor breaks down, and it suffers an earth fault. j) Due to a pinhole leak, the air pressure has gone out of the spare tire in your car. k) Each aircraft is equipped with life preserver jackets for passenger use in case of a water landing. One of these has developed a failure, preventing it from inflating when required. l) An electrically driven “pony” pump primes a lubrication system on start up, at a specified pressure the main pump takes over to run the system at operating pressure. This is an effort to minimize the energy usage of the plant, and the main pump could easily start up under full load with no consequence aside from increased energy usage. The pony pump has a failure of the mechanical seal and be unserviceable for a time. m) An air-conditioning system has had the condenser fins flattened out by vandalism; the result is that the airflow through the condenser is not sufficient to reduce the temperature prior to the refrigerant gas travelling to the evaporator. The result is that the system will not reduce room temperature below the 35oC ambient temperature. This affects the health of the people working in the room and results in two people suffering from heatstroke. n) The high-high level switch on a tank trips the pump when there is a high-high level. This then needs to a manual reset. At present, this switch has spurious trips that cause the pump to stop when there is no high level. o) A large-scale screening facility gets its supply from a conveyor running the length of the building some four stories above the ground. Along the side of the conveyor are walkways with handrails. One of the handrails has a crack in it that is not visible to the naked eye. However, if somebody were to use it, it would give way, leaving the person to fall four stories to their death. p) An IT data center houses all of servers containing the corporate IT information. The cooling system of a data center requires the rooms to be continuously at a temperature of between 20oC and 25oC, and a humidity range of between 40%-60%. A failure of the power supply could lead to outright server failure, or at the very least increase failure rates of electronic components. This would have a catastrophic effect on business
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 59
Reliability-Centered Maintenance
continuity. For these reasons, a diesel generator set is on permanent standby protecting the power supply to the coolers and humidifiers; an uninterruptible power supply or UPS further protects this. The diesel generator set has developed a failure in the starter circuit due to corroded battery terminals, meaning it will not be able to start when required. q) An operating company used a tank farm to store flammable liquid raw material. A pressure safety valve (PSV) set at the tank maximum allowable working pressure (MAWP) of 100 psig protected one of the tanks containing a highly reactive material. The previous PHA identified the plugging of the PSV inlet as a potential concern. The PSV’s annual inspection reports verified plugging, substantiating this concern. The PHA team recommended the installation of a rupture disc upstream of the PSV. A month later, an overpressure event (triggered by contamination) caused the tank pressure to reach 180 psig before the rupture disc blew and vented the tank contents. The ensuing Incident Investigation revealed that the rupture disc had developed a pinhole leak and the space between the rupture disc and PSV had pressurized to the normal tank pressure of 80 psig.
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 60
Reliability-Centered Maintenance
Applicable and Effective
Applicable
Applicable and Effective (Based on diagram 17 of the SAE JA1012)
Effective Then they need to determine whether the task will be worthwhile in terms of either cost or risk. (Based on the consequences)
Before selecting any failure management policy analysts first need to determine whether or not the task is actually possible!
Within WithinRCM RCMNO NOtask taskcan canbe beapplied appliedtoto any failure mode without any failure mode withoutfirst first establishing establishingthat thatititisisactually actuallypossible possible totodo dothe thetask, task,and andsecondly secondlywithout without ensuring ensuringthat thatititwill willadequately adequatelymanage manage the theconsequences. consequences.
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 61
Reliability-Centered Maintenance
Tolerable levels of Risk What is risk, and how tolerable should it be…?
Ideal
Reality Risk is the likelihood of an unwanted event
•
People often forget to fear those things that rarely happen… particularly in the face of productivity challenges, market share opportunities and competitive necessities.
# Human Error, James Reason © Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 62
Reliability-Centered Maintenance
Example Tolerable Risk Levels
Government Set Tolerable Risk Criteria
UK
Hong Kong
The Netherlands
Australia
Individual risk minimum (Worker)
1x 10-5
Not Used
Not Used
Not Used
Individual risk minimum (Public)
1x 10-6
Not Used
1x 10-6
Not Used
Individual risk maximum (Worker)
1x 10-3
Not Used
Not Used
Not Used
Individual risk maximum (Public)
1x 10-4
1x 10-5
1x 10-6
1x 10-6
Survey of U. S. Corporate Tolerable Risk Criteria
High Range
Low Range
Minimum individual risk (Worker)
10-5
10-9
Maximum individual risk
10-3
10-6
individual SIF individual risk target
10-3
10-6
E.M. Marszal, Survey of process plant risk tolerance criteria and third party liability settlements, exida.com, Philadelphia, 2000.
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 63
Reliability-Centered Maintenance
Hidden Failures
Hidden Failures
A hidden functional failure, on its own, will not become evident to the operators under normal operating circumstances
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 64
Reliability-Centered Maintenance
The five main categories for hidden failures…
• The majority of hidden failures occur on protective devices, these are devices that: • Warn of abnormal conditions • Shutdown equipment in case of a failure • Eliminate or alleviate abnormal conditions caused by failure • Take over from a function that has failed • Prevent dangerous situations from arising
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 65
Reliability-Centered Maintenance
Most protective devices can fail in two ways…
• By acting when they are not needed… • By ceasing to provide protection….
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 66
Reliability-Centered Maintenance
The Famous Pump Example If the ultimate high level switch fails closed, it is evident… If the ultimate high level switch fails open, then nobody knows it has failed…
High level shuts off pump until low level turns it back on again
Low low level switch turns off the pump until manually reset
1000 l/m
Ultimate high level switch. (normally open) Shuts the pump off until manually reset. Low level switch turns on the pump until the level reaches the high level switch
800 l/m
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 67
1
Function
To pump between tank A and tank B at up to 800 l/m
A
Item
Item
Item
Reliability-Centered Maintenance
Function Failure
Unable to pump between tank A and tank B at up to 800 l/m
2
To stop the pump on ultimate high level
A
Function Failure
Unable to stop the pump on ultimate high level
Item
Function
Item
Item
Evident
1
Failure Modes and Effects
1
Pump blocked
2
Pipes blocked
3
Ultimate high level fails closed
Failure Modes and Effects
Ultimate high level fails open
Hidden
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 68
Reliability-Centered Maintenance
Slide 18
The probability that the protected function will fail in any one cycle is given by its failure rate One year Protected Function Protective Device
B Fails C Fails
If the failure rate is once in four years, then the probability that it will fail in one year is 1 in 4. (This corresponds to a mean time between failure of four year)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 69
Reliability-Centered Maintenance
Slide 19
The probability that the protective device will be in a failed state at any point in time is given by its downtime (if it conforms to a random failure pattern) One year Protected Function Protective Device
B Fails C Fails
If the downtime is 33% then the probability that is will be in a failed state at any point in time is 1 in 3. (This corresponds to an availability) © Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 70
Reliability-Centered Maintenance
Slide 20
The Probability of a Multiple Failure One year Protected Function B Protective Device C
Mean Time Between Failures = 4 years
Availability = 67%
Downtime = 33%
The probability that B will fail while C is in a failed state: 1 in 4 x 1 in 3 = 1 in 12 (In other words there is a one in twelve chance that the multiple failure will occur in any one year)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 71
Reliability-Centered Maintenance
When developing a failure management policy for a hidden function, the first stage is to decide what probability we are prepared to tolerate for a multiple failure…
One year Protected Function B Protective Device C
Mean Time Between Failures = 4 years
Availability = 67%
Downtime = 33%
The probability that B will fail while C is in a failed state: 1 in 4 x 1 in 3 = 1 in 12 Prepared to accept 1 in 1000
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 72
Reliability-Centered Maintenance
Reduce the probability of failure of the protected function (by applying a suitable failure management policy)
One year Protected Function B Protective Device C
Mean Time Between Failures = 10 years
X Availability = 99%
Unavailability = 1%
And/or by increasing the availability of the protective device: - by preventing the failure of the protective device, or - by periodically checking whether the protective device is still working and repairing it if it has failed - by modifying the system in some way The probability that B will fail while C is in a failed state is now 1in 10 x 1 in 100 = 1 in 1000 __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 73
Reliability-Centered Maintenance
• 6 identical PSV’s have each been checked once a year for 5 years (FFI = 1 year) • So the devices have been in service a total of 30 years
To Process
From Process
• In that time 3 were found to be in a failed state • So the MTBF of the devices (MTBFdevice) is 30 years / 3 failures = 10 years Year 1
• We know that the failed devices failed some time during the year before the checks – but not when… • It seems reasonable to assume that each failed device was down for an average of 6 months
Year 2
Year 3
Year 4
Year 5
1 2 3 4 5 6
1.
So the total downtime (DTdevice) was 1.5 years out of 30 or 5%
2.
So on the basis of these figures it appears that: FFI = 2 x DT device x MTBF device
3.
This is generally true if DT device <5% and MTBF device is random
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 74
Reliability-Centered Maintenance
Slide 24
Detective Task Frequency and Availability First Inspection
Second Inspection
Time 1 Year
2 Years
3 Years
4 Years
Maximum potential unavailability time = 2 years
First Inspection
Second Inspection
Third Inspection
Time 1 Year
2 Years
3 Years
4 Years
Maximum potential unavailability time = 1 year
Risk management of Hidden failures involves the management of unavailability to within levels accepted by the company…. © Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 75
Reliability-Centered Maintenance
One year Protected Function B Protective Device C
Mean Time Between Failures = 10 years
Fails Failed
1 in 10 x 1 in 100 = 1 in 1000
Step Two Determine / estimate how often the protected function is likely to need to protective device
Step Three Calculate what unavailability of the protective device enables us to achieve 1 given 2
if DTdevice = Unavailability of the protective device MTBFfunction = Failure rate of the protected function MTBFmultiple = Failure rate of the multiple failure
Step One Decide what probability we tolerate for the multiple failure
then (1/MTBFfunction) x DTdevice = 1/MTBFmultiple
or DTdevice = MTBFfunction / MTBFmultiple
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 76
Reliability-Centered Maintenance
We have seen that… Where: Where:
••FFI FFI ==failure failurefinding findingtask task interval interval = Unavailability of the ••Dt Dtdevice device = Unavailability of the protective protectivedevice device = MTBF of the ••MTBF MTBFdevice device = MTBF of the protective device protective device ••MTBF = MTBF of the MTBFfunction function = MTBF of the protected function protected function = MTBF of the ••MTBF MTBFmultiple multiple = MTBF of the multiple multiplefailure failure
…1
FFI = 2 x DT device x MTBF device
…and that… DTdevice = MTBFfunction / MTBFmultiple
…2
Therefore.. by substituting 2 into 1 gives…
FFI =
2 x MTBFfunction x MTBFdevice MTBFmultiple
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc
Page 77
Reliability-Centered Maintenance
Exercise 1 A small chemical plant has an eye bath to enable people to wash their eyes if dangerous chemicals contaminate them. When asked what checks have been done on the eye bath in the past, the maintenance department said “that’s the production departments job”. However, production thought the safety officer was doing it, who in turn thought it was “looked after by the preventive maintenance system”. As a result, it appears that the eye bath has never been checked, at least on a routine basis. The eye bath has been in place for eight years. A quick check now reveals that the eye bath is actually in working order, so the only data we have about the reliability of this bath is that it has not failed in eight years. Further investigation reveals that someone needed to use it in an emergency on two occasions since it was installed. The plant manager has asked you to set up a checking routine for this eye bath as a matter of urgency. How often should the check be done? The safety committee decided that they do not want the eye bath to be inoperable when it is needed more than once in 1,000,000 years. A series of phone calls to other companies reveals 60 eyebaths that have been installed for a total of 720 years between them. 2 of these have been found to be in a failed state in that period.
FFI =
2 x MTBFdevice x MTBFfunction MTBFmultiple
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 78
Reliability-Centered Maintenance
Exercise 2 A tank is used to store diesel and is enclosed in a concrete bund. This is intended to prevent anything which might escape from the tank seeping into the ground and breaching a variety of environmental regulations. The review group decides that they would not like this to happen more than once every 10,000 years. A review of a number of similar systems and discussions with users suggest that a significant quantity of liquid is likely to escape into the bund no more than once every 150 years on average, usually due to leaks in pipeline flanges or seals. The integrity of the bund itself has never been checked until now, but it can be done in a number of ways. One is to fill the bund with water to a depth of (say) 100 millimeters, and check whether the water level drops by more than the rate of evaporation over a period of (say) two days. Such a check is carried out on the bund, and reveals that it is still intact. So in the absence of any hard data at all, and after considerable discussion, the group decides that in any one year the chance of the bund springing an invisible leak (due to subsidence, latent construction, defects or whatever) is “1 in 100”.
FFI =
2 x MTBFdevice x MTBFfunction MTBFmultiple
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 79
Reliability-Centered Maintenance
Exercise 3 A steel producing plant has a need for many producthandling assets to move around the raw iron ore prior to processing. As part of this asset base, they have 10 large conveyors. Each of these has 4 e-stops, one on either side of the head end, and one on either side of the tail end of the conveyor. The management has tasked the maintenance team with determining a frequency for testing the function of each of these e-stops to make sure that when we need them to work they will work. After some discussion, they consulted relevant specifications and determined that they wanted these estops to meet their SIL-2 classification. For this company that means a likelihood of 1:100,000 (105) that any one would have a failure in any one year. They found that on their own plant they had never experienced a failure of one of the emergency stops. However, on consulting a commercial data store they found the following information: •
A population was tested over a time period of 106 hours
•
During this time the item was found to have failed 8 times in an undetected and unsafe manner,
•
And 60 times in a detected safe fashion
They were installed all of the conveyors at roughly the same time 20 years ago. After conversations with a few of the longer serving people they were able to ascertain that they had required to use an estop, either to protect people or to protect life, approximately 15 times. What frequency will they need to do for the detective task to maintain the level of risk that the company has deemed as tolerable?
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 80
Reliability-Centered Maintenance
This is the multiple failure. The result of a protected function failing while the protective device is in a failed state…
It needs a failure of the function before the consequences of a hidden failure are realised!
Function
Device Failure of the device has no consequences by itself….. Therefore… for a detective maintenance task to be technically feasible we need to:
1.
Ensure that the task will not increase the probability of a multiple failure
2.
Determine whether it is practical to do the task in the desired intervals
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 81
Reliability-Centered Maintenance
Case Study - BP refinery Incident
•
Indicator not designed to measure above 10 feet (Not fit for purpose)
•
High/High level alarm did not work (Hidden Failure)
•
Indicator not accurate, filled to 13 feet when indicating 10
•
Valve for liquid flow left closed (Human Error)
•
Reached 138 feet, indicator told operators 10 feet and falling (Hidden Failure)
•
Pressure controlling valve didn’t work (Hidden Failure)
•
High level alarm on the blow down drum did not work, the last line of defense… (Hidden Failure)
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 82
Reliability-Centered Maintenance
Managing Safety and Environmental Consequences
Effectiveness for Safety and Environmental Consequences PTIVE
PRES
PREP
Hidden Safety or Environmental Consequences
The failure management policy must reduce the risk of the multiple failure to a tolerable level
Evident Safety or Environmental Consequences
The failure management policy must reduce the risk of failure to a tolerable level
Combination
DTIVE If a suitable proactive task cannot be found then the first option is to seek a failurefinding task that reduces the probability of multiple failure to a tolerable level
Redesign Redesign is Compulsory
Is a combination of tasks technically feasible and worth doing?
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 83
Reliability-Centered Maintenance
Economic Consequences The Economic Consequences of Failure
• A failure has economic consequences if it has a direct adverse effect on operational capability – – – – – –
Reduced output Product quality considerations Poor customer service Increased operating costs Costs in terms of legal or regulatory charges Reputation costs
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 84
Reliability-Centered Maintenance
The Economic Consequences of Failure
• Two issues need to be considered when assessing the consequences of failure – How much the failure costs each time it occurs – How often would it occur if no attempt was made to prevent it
• As a result the consequences should always be evaluated over a reasonable period of time
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 85
Reliability-Centered Maintenance
Effectiveness of Economic Consequences PTIVE
PRES
PREP
Hidden Economic Consequences
Over a period of time, the failure management policy must reduce the probability of a multiple failure (and associated total costs) to an acceptable minimum
Evident Economic consequences
Over a period of time, the failure management policy must cost less than the cost of the operational consequences (if any) plus the total cost of repair
Run to Failure
If there is no effective routine task then the initial default is Run-to-Failure
Redesign
If Run-to-Failure is not an option due to frequency of failure or other implications, then redesign may be desirable.
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 86
Reliability-Centered Maintenance
RCM-DO-06 Applicability and Task Selection The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.
What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)
2.
In what ways can it fail to fulfil its functions? (Functional Failures)
3.
What causes each functional failure? (Failure Modes)
4.
What happens when each failure occurs? (Failure Effects)
5.
In what way does each failure matter? (Failure Consequences)
6.
What should be done to predict or prevent each failure?
(Proactive Tasks and Task Intervals) 7.
What should be done if a suitable proactive task cannot be found? (Default Actions)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 87
Reliability-Centered Maintenance
Hidden HN
HO
HE No
Does the failure have a direct adverse effect on operational capability?
No
Is there an intolerable risk that the failure could breach a known environmental standard or regulation?
Yes
Yes No
Is a Preventive Restoration task technically feasible and effective?
HO2 HN2
Yes No
Yes No
Is a Preventive Restoration task technically feasible and effective?
HE2
Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?
Based on Example 2 SAE JA1012
Yes Preventive Restoration Task
Technical Feasibility Criteria
Evident ES
Yes
RCM Decision Algorithm
Predictive Task
HE1
HS2
Preventive Restoration Task
Is a Preventive Replacement task technically feasible and effective?
Yes
Is a Predictive task technically feasible and effective?
HS1
Predictive Task
HN1
No
Is there an intolerable risk that the failure could kill or injure someone?
Yes
Is a Predictive On-Condition task technically feasible and effective?
HO1
HS No
No
No
EO
Is there an intolerable risk that the failure could breach a known environmental standard or regulation?
Yes
ES1
Yes No
Is a Preventive Restoration task technically feasible and effective?
ES2
Does the failure have a direct adverse effect on operational capability? Yes
Yes No
Yes Predictive Task No
EN1
Is a Preventive Restoration task technically feasible and effective?
EO2
Preventive Restoration task
Is a Preventive Replacement task technically feasible and effective?
No
Is a Predictive task technically feasible and effective?
EO1
Predictive Task
EE1
Applicability Criteria
EN
No
Yes
Is a Predictive task technically feasible and effective?
EE2
Is a Preventive Replacement task technically feasible and effective?
EE
Is there an intolerable risk that the failure could kill or injure someone?
EN2
Preventive Restoration Tasks
No
Is there an age at which there is an increase in the conditional probability of failure? (Life?) What is this age? Do enough items survive to this age to satisfy the effectiveness criteria? Will the task restore the original resistance to failure? When there are safety or environmental consequences, all items need to survive to this age.
No
Is there an age at which there is an increase in the conditional probability of failure? (Life?) What is this age? Do enough items survive to this age to satisfy the effectiveness criteria? When there are safety or environmental consequences, all items need to survive to this age.
Yes Preventive Restoration Task
Predictive Tasks Is there a clear potential failure condition? What is it? What is the P-F interval? Is the interval long enough for action to be taken to avoid or minimise the consequences of failure? Is the P-F interval reasonably consistent? Is it practical to do the task at intervals less than the P-F interval?
Is a Preventive Replacement task technically feasible and effective?
HN3
Yes Preventive Replacement Task
HS3 No
Is a Detective task to detect the failure technically feasible and effective? Yes
HO4 HN4
Detective Task
Preventive Replacement Task
EE3
No
Run-to -Fail
Preventive Yes Replacement Task
EO3 No
EN3
Preventive Replacement Task
Yes
Is a Detective task to detect the failure technically feasible and effective?
HE4
Detective Tasks
Yes
HS4 No
ES3
Yes
Detective Task
Is it possible to check the item has failed without significantly increasing the risk of a multiple failure? Is it practical to do the task at the required interval.
No
Yes
HO5 HN5
HE3
ES4 Run-to-Fail ?
EE4
Yes Combination of tasks
Is a combination of tasks technically feasible and effective?
Yes
EO4 EN4
Run-to -Fail
Run-to-Fail ?
No No
HO6
Redesign may be desirable
HN6
No
HS5
ES5
Redesign is compulsory
HE5
Redesign is compulsory
EE5
EO5
Redesign may be desirable
EN5
Run-to-Fail or a Combination of Tasks For Hidden Safety & Environmental consequences if no Failure Finding Task is feasible then re-design is compulsory. For Evident Safety & Environmental consequences if no combination of tasks is feasible then re-design is compulsory. For Operational & Non-Operational consequences re-design may be desirable rather than Run-to-Fail if the economic consequences justify this.
Hidden Economic Consequence To be effective:-
Hidden Safety and Environmental Consequence To be effective:-
Evident Safety and Environmental Consequence To be effective:-
Evident Economic Consequences To be effective:-
Over a period of time, the failure management policy reduce the risk of a multiple failure (and associate total costs) to at an acceptable minimum.
The failure management policy must reduce the risk of the failure to a tolerable level.
The failure management policy must reduce the risk of the failure to a tolerable level.
Over a period of time, the failure management policy must cost less than the cost of the operational consequences (if any) plus the total cost of repair.
Effectiveness Criteria
Applicable
Preventive Replacement Tasks
HO3
Before selecting any failure management policy analysts first need to determine whether or not the task is actually possible!
Effective Then they need to determine whether the task will be worthwhile in terms of either cost or risk. (Based on the consequences)
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 88
Reliability-Centered Maintenance
Technically Feasible
A routine task is applicable if it is physically possible for the task to reduce the consequences of the failure mode to an acceptable level.
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 89
Reliability-Centered Maintenance
Types of Maintenance RCM Term
Colloquial Term
What it is…
Abbreviations
Predictive Maintenance
• On-Condition Maintenance • Condition Based Maintenance (CBM) • Condition Monitoring (CM) • Inspections
Check an item for signs of potential failures and leave it in place on the condition that it will make it to it’s next inspection interval. (Planned)
PTIVE
Preventive Restoration
• Overhaul • Scheduled Restoration • Rework
A task to restore an assets original resistance to failure prior to its failure, this is a preventive task (Planned)
PRES
Preventive Replacement
Replacement Overhauls (Also)
A task to replace an asset prior to its failure, this is a preventive task (Planned)
PREP
Detective Maintenance
Failure finding Function testing
A task to detect whether an item has failed or not. (Planned)
DTIVE
Corrective Maintenance
Corrective Run to failure (RTF)
A task to correct failing or failed assets. (Planned)
CTIVE
Reactive Maintenance
Breakdown Shutdown
A task to restore the function of an asset that is failing or has failed (Unplanned)
Reactive
RCM will always direct maintainers to choose a maintenance or operational activity over a redesign as it is almost always the most cost effective means of managing failure.
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 90
Reliability-Centered Maintenance
Preventive Maintenance (PM’s) Preventive maintenance tasks (PM’s) are routine actions that are taken to prevent failures. Withinreliability-centred reliability-centred Within maintenancethere thereare aretwo two maintenance types of preventive tasks. types of preventive tasks.
• Preventive Restoration – tasks to restore an items original resistance to failure. • Preventive Replacement – tasks to replace an asset.
Characteristicsof ofPreventive Preventivetasks…. tasks…. Characteristics thetask taskisisaa The Themajority majorityofofitems itemsmust must There Theremust mustbe bean anage agewhere wherethe the IfIfthe restoration task survive until this point (only conditional probability of failure will restoration task survive until this point (only conditional probability of failure will thenititneeds needsto to aafew few“random” “random”failures) failures) increasedramatically dramatically(a (aLife) Life) then increase restore the restore the items original items original resistance to resistance to failure… failure…
Life
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 91
Reliability-Centered Maintenance
Preventive maintenance tasks (PM’s) are routine actions that are taken to prevent failures. •Withinreliability-centred reliability-centred •Within maintenancethere thereare aretwo two maintenance types of preventive tasks. types of preventive tasks.
• Preventive Restoration – tasks to restore an items original resistance to failure. • Preventive Replacement – tasks to replace an asset.
Characteristicsof ofPreventive Preventivetasks…. tasks…. Characteristics thefailure failuremode modehas hassafety safetyor or Theremust mustbe bean anage agewhere wherethe the IfIfthe There environmentalconsequences consequencesthen thenall allitems items conditional conditionalprobability probabilityofoffailure failurewill will environmental must survive to this age! increase dramatically (a Life) must survive to this age! increase dramatically (a Life) thetask taskisisaa IfIfthe restoration taskthen thenitit restoration task needs to restore the needs to restore the itemsoriginal original items resistance tofailure… failure… resistance to
Life
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 92
Reliability-Centered Maintenance
How important is this…?
The 6 Failure Patterns
Only 11% of failures in the original Nowlan and Heap report were related to age!
In later published studies this number has ranged from 8 to 23 % of all failures!
89% of failures were not related to age!
Yet Yetdespite despiteknowing knowingthese thesefacts factsmany manypeople peopleare are reluctant to let go of time based maintenance reluctant to let go of time based maintenance (such (suchas asmany manyscheduled scheduledshutdowns) shutdowns)
Why?
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 93
Reliability-Centered Maintenance
Predictive Maintenance
• Nearly all failures give a warning that they are about to occur or are in the process of occurring. • These warnings are known as potential failures • Operators see potential failures (warning signs) all the time… but they are not sure what they are warning them of…
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 94
Reliability-Centered Maintenance
Predictive Maintenance tasks
Items are checked for potential failures, and they are left in service on the condition that they continue to meet satisfactory performance standards
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 95
Reliability-Centered Maintenance
Predictive Maintenance tasks
The point where failure starts to occur
The point where we can detect it (Potential Failure) The point where it no longer does what we want it to do (Functional Failure)
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 96
Reliability-Centered Maintenance
Predictive Maintenance tasks
P-F Interval (1 Month)
P1 P2
2 weeks
P3 # Captured by Data, 2003, Daryl Mather
Inspection Interval = Less than the P-F Interval
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 97
Reliability-Centered Maintenance
Predictive Maintenance The P-F interval is long enough for action to be taken to avoid, eliminate or minimise the consequences of failure There is a clear potential failure condition (in other words there is a clear warning that the failure of the onset of failure)
The P - F Interval
Resistance to Failure
The P-F interval is reasonably consistent
Potential Failure Identified
A task can be done at intervals less than the PF interval
Functional Failure Occurs
Time or Task Intervals © Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 98
Reliability-Centered Maintenance
Predictive Maintenance tasks
P-F Interval (10 Months)
P1
P-F Interval (1 Month)
P1
P2
P2
P3
Inspection Interval (2 weeks)
P3
Inspection Interval (3 months)
# Captured by Data, 2003, Daryl Mather
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 99
Reliability-Centered Maintenance
Condition Monitoring
Product Quality Monitoring
The Human Senses
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 100
Reliability-Centered Maintenance
Detective Maintenance Detective Maintenance Tasks
This is the multiple failure. The result of a protected function failing while the protective device is in a failed state…
It needs a failure of the function before the consequences of a hidden failure are realised!
Function
Device Failure of the device has no consequences by itself….. Therefore… for a detective maintenance task to be technically feasible we need to:
1.
Ensure that the task will not increase the probability of a multiple failure
2.
Determine whether it is practical to do the task in the desired intervals
© Copyright Meridium, Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 101
Reliability-Centered Maintenance
Exercise 1 – Task Categories 1. Once every three years the flow rate of a wastewater pumping station is checked to see if it has deteriorated at all. If it has the team plan to replace the impeller within two weeks. 2. The drive end bearing of a motor is greased every 3 weeks 3. A hydraulic oil system provides pressure to drive the hydraulic motor that powers an apron feeder. Every so often there is a differential pressure alarm, which signals when the filter is no longer able to filter to the correct level and rate. When this occurs, the maintenance team cleans the filter. 4. A weight meter in a product handling plant is routinely calibrated to ensure that the production (profitability) of the plant is accurately measured. 5. A large DC motor regularly requires the commutator to be skimmed to prevent flashovers between the brush holders via the commutator. 6. A tank contains corrosive acid which would is prohibited from seeping into the ground by law. A task has been scheduled to perform a seepage test on this tank every 4 years.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 102
Reliability-Centered Maintenance
Exercise 2 – Which type of maintenance? The bearing in a the housing of a generator has failed once in the past causing an outage of two days to production at a cost of several million dollars. In order to avoid this occurring again the maintenance department is tasked with finding a task that will predict or prevent this occurring in the future. They decide to strip down the generator once every two years and to perform a dye-penetrant check on the bearing to search for cracks or fissures on the races primarily. We know that once cracks are able to be detected via the dye-penetrant test, the bearing usually has around 3 months left prior to total failure.
Q1. What type of task are they suggesting; and Q2. Will it solve their problem?
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 103
Reliability-Centered Maintenance
The Basis of Task Preference •
On-condition Tasks –Identify failure at the potential failure stage.
Reduce the likelihood of safety, environmental and operational consequences. Also reduce operational costs by allowing equipment to realise most of its useful life.
•
Preventive Restoration – When directed at specific components and parts it will lead to a reduction in the overall failure rate of items that have a dominant failure mode.
•
Preventive Replacement – Least favoured of the three.
Can reduce safety related consequences in some failure modes. However is a larger cost of execution. (Reduced cost effectiveness)
•
Detective Tasks – If the other three are not able to be applied, then this is the best option for hidden failures.
If the frequency is practical, it can be done safely and the task itself does not substantially increase the risk of failure, then this is the selected option. Unlike the other three tasks this option will leave the function in a state of unavailability for a period
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 104
Reliability-Centered Maintenance
RCM-DO-06c Uses of MTBF Mean Time Between Failure or MTBF4, is one of the most widely used metrics in physical asset management. Generally, companies use it as a guide to the performance of their physical assets, helping them to identify assets or processes that are causing lost revenue or cost related issues. However, although widely applied, MTBF is still the subject of some confusion. Moreover, MTBF is useful for a range of different purposes, giving organizations greater ability to increase the net present value of their physical asset base. When companies first look at implementing MTBF, they tend to ask three fundamental questions: 1) What MTBF can tell us about our assets, 2) what levels can it be applied at, and 3) how can MTBF be used to add value to our reliability initiatives?
What MTBF can tell us? The standard use for MTBF in industry is to tell us the performance of the primary function of an asset or system. Figure 1 - Example System
High-High High-Level
Duty
Standby
Low Level Low-Low 800 l / m Off - Take
For example, a pumping system consists of a duty/standby pump arrangement, a pressure relief valve, piping, and the tank and associated level switches. The primary function for this system is to pump water to tank B at a rate of between 900 l/minute and 1000 l/minute. In this case, a failure occurs when the pump system is unable to pump water at the required rate for whatever reason.
4
This module deals with MTBF in isolation and does not discuss other metrics such as MTTF (Mean Time To Failure) or MTTR (Mean Time To Repair).
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 105
Reliability-Centered Maintenance
Here, we can calculate the MTBF as follows: Total Time Required Number of Failures
So if the total time that we required to pump to deliver this function was (say) 5 years, and we had 4 failures in that time, the average time between failures would be 5/4 = 1.25. If this were the mean time between failures, then the failure rate for one year would be 1/MTBF or 1/1.25, which is 0.8, or 80% likelihood of experiencing a failure of the primary function in one year. If we then wanted to convert this into months we would first convert the MTBF figure to months, 1.25 years = 15 months, then again determine the likelihood of this occurring in one month 1/15, or 0.066. This means there is a 6% likelihood of experiencing a failure resulting in the loss of the primary function in any given month. We could do the same for a week, a day or any other given period.
The above example shows us that initial uses of MTBF can provide us with the average time between failures5 for a given time period, and that this can then be manipulated to give us a failure rate6 for any specified period of time. Thus, for one measurement of MTBF we are able to calculate the following information: •
MTBF of the Primary Function = 1.25
o
Likelihood of a failure in one year = 1/1.25years (80% or 8 x 10-1)
o
Likelihood of a failure in one month = 1/15 months (6% or 6 x 10-2)
o
Likelihood of a failure in one day = 1/456.25 days (0.22% or 2.2 x 10-3)
o
Likelihood of failure in one hour = 1/10950 hours (0.009% or 9 x 10-5)
At all times the formula takes into account the total time of the function, not of the asset itself. This means that regardless of the number or type of assets in the system, the calculation always uses the total time required of the function, or 5 years in this example.
At what level can we apply MTBF? Like many other metrics in physical asset management MTBF is applicable at any level throughout the asset base. However, for performance measurement there are two rules for its application: 1. it is always used to measure the function of the asset where it is being applied, and 2. it always uses the total time required of the function of the level where it is applied. For instance, in the example given above we determined that the MTBF for the pumping system was 1.25 years, and we were then able to derive failure rates for various other periods. In addition, we can also apply this to the assets in the system as demonstrated in table 1. 5
Total Time Required Number of Failures
6
1
MTBF
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 106
Reliability-Centered Maintenance
Table 1 - Component MTBF Asset
Function
Total Time Required
Number of Failures
MTBF
Annual Failure Rate
Duty Pump
To pump water to tank B at a rate of between 900 l/minute and 1000 l/minute
5 years
7
0.714 years
140%
Stand By Pump
To maintain 800l/minute to 1000 l/minute if the duty pump fails
5 years
2
2.5 years
40%
Piping
To provide clear access for 800 l/minute to 1000 l/minute from the pump sets to the tank
5 years
1
5 years
20%
High-High Level Switch
To trip the pumping system when water reaches the high-high level
5 years
1
5 years
20%
High Level Switch
To shut off the pump when the tank level reaches the high level
5 years
1
5 years
20%
Low Level Switch
To turn on the pump when the tank has been drained to the low level
5 years
1
5 years
20%
Low-Low level switch
To alarm when the tank level has been drained to the low-low level
5 years
0
5 years
20%
Tank
To contain up to 250,000 liters of water
5 years
0
5 years
20%
Pumping System
To pump water to tank B at a rate of between 900 l/minute and 1000 l/minute
5 years
4
1.25 years
80%
Table 1 contains some information that should immediately provoke some questions. For example, we have counted four failures in our system level MTBF, yet the table contains 13 failures. (Not counting the system failures) To understand this we need to review the functions for each of the components mentioned. For example, the function of the High-High Level Switch is to trip the pumping system when water reaches the high-high level. If there is a failure preventing this asset from performing its function, it will not prevent the system from pumping water. We have had one failure on the switch that we know about in this period.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 107
Reliability-Centered Maintenance
Another obvious issue is the fact that we have had seven failures of the Duty Pump. However, during this time we have also only had two failures on the Stand-By pump, a dormant function, which we know of. As this system has redundancy built into it, we can only experience a loss of the primary function if we have a failure of the Duty pump and the Standby pump at the same time. The four failures causing the loss of function at the system level were: •
One multiple failure of the duty and standby pump
•
One failure of the High Level switch, meaning the level reached the High-High level once during the 5-year period.
•
One failure of the Low Level switch, resulting in the Low-Low level tripping the downstream process
•
One failure of the piping causing downtime
Figure 2 - MTBF at Different Levels System System MTBF 1.25 years MTBF 1.25 years In any year 8x10-1-1 In any year 8x10
Multiple Pump Failure Multiple Pump Failure 1:4.17 in any year 1:4.17 in any year or or 2.4 x 10-1-1 2.4 x 10
Duty Pump Duty Pump MTBF 1.67 years MTBF 1.67 years In any year 6x10-1-1 In any year 6x10
High Level Switch High Level Switch MTBF 5 years MTBF 5 years In any year 2x10-1-1 In any year 2x10
Low Level Switch Low Level Switch MTBF 5 years MTBF 5 years In any year 2x10-1-1 In any year 2x10
Piping Piping MTBF 5 years MTBF 5 years-1 In any year 2x10 -1 In any year 2x10
Standby Pump Standby Pump MTBF 2.50 years MTBF 2.50 years In any year 4x10-1-1 In any year 4x10
All the other failure mentioned were either; hidden to the operations team until revealed by inspection, or their function was protected by other assets. (In the case of the failures on the Duty Pump) As shown in Figure 1, MTBF is useful at any level throughout an asset base. However, its’ application must be on the functions of the assets, and the total time required of each function, at each level of performance measurement.
How can MTBF add value to Reliability Initiatives? In the hands of a skilled RCM facilitator the measurement and manipulation of MTBF can be used to set the performance expectations of the physical asset base, as well as providing a base for evaluation of strategies, and to indicate the overall performance of assets; not just the performance of their functions.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 108
Reliability-Centered Maintenance
This helps organizations in the change process because they begin to think about what the assets do, rather than what they are. That is, an appreciation of functional performance as opposed to asset performance. For example, in the system described in Figure 1 we can break the system down into its’ functions, and begin to assign performance expectations to each of these. 7
Function 1 - To pump water to tank B at a rate of between 900 l/minute and 1000 l/minute Functional Failure 1.A – Does not pump water at all The water pump in this example provides, say, the cooling water for a petrochemical plant. If the system is unable to pump water, there will be a loss of production. The tank contains enough water to keep the plant running for a minimum period of 2 hours, and a maximum period of 6 hours. A multiple failure of both pumps would nominally result in a loss of production equal to, say, USD $2,000,000. In this case the asset owner would like to keep the likelihood of this occurring to a reasonably low level and after some discussion he decides on a level of 1:10,000 years, or an annual rate of 10-4. This means management of all failure modes causing this consequence, an adverse impact on operational capability, to the same level of likelihood.
Function 2 – To trip the pumping system when water reaches the high-high level Functional Failure 2.A – Does not trip when the water reaches the high-high level. In the case of the water system, an overflow of the tank would result in water in the surrounding area. While this is a slip hazard for employees sent to correct the issue, the asset owner does not regard it as a serious hazard, nor will it result in any damage to additional equipment. The failure mode is dormant, meaning it will only have consequences when there is a failure of the high-level switch and the high-high level switch. In this particular case, the asset owner is at ease accepting a higher level of risk of occurrence, say, one in every 100 years, or a likelihood of 10-2 in any one year.
Function 3 – To alarm when the tank level is at the low-low level Functional Failure 3.A – Does not trip when the tank is at the low-low level. As with the High-High protection this alarm is only required once there has already been a failure of some sort, in this case, notably a failure of the Low-Level Switch. If this was to occur, and the tank consequently ran dry, the results would be catastrophic in financial terms. The downstream equipment would run dry, and the plant would be without cooling water forcing a loss of production estimated at around 3 days or USD $6,000,000 in this case. There would also be damages conservatively estimated at USD$1,500,000 for producing assets. The asset owner sees this as the worst possible outcome of a failure of this system. As a result, he would like to keep the likelihood of failure at 1:100,000 years, or 10-5 per year. The resulting performance expectations of failure modes are in Table 2 below. We can see that the sum of each of the failure modes contributing to the loss of function must equal the desired failure rate, or risk, at the above level. (Assuming these are all the relevant failure modes)
7
Full details about how to construct a risk profile based on performance expectations is contained in module RCMDO-05a Tolerable Levels of Risk (A Study of Industry)
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 109
Reliability-Centered Maintenance
Table 2 - Functional MTBF Function To pump water to tank B at a rate of between 900 l/minute and 1000 l/minute
Failure
Desired Failure Rate
Existing Annual Failure Rate
-4
Desired failure rate is 10 , therefore every failure mode underneath must be managed to at least 4x10-5 to ensure this level is reached.
Multiple Pump Failure
4x10-5
1:1.67 x 1:2.5 = 1:4.17 or 2.4x10
-1
Piping
4x10-5
1:5
High Level Switch
4x10-5
1:5
Low Level Switch
4x10-5
1:5
To trip the pumping system when water reaches the high-high level
High-High Level Switch
1:10-2
1:5 x 1:5 = 1:10
To alarm when the tank level has been drained to the low-low level
Low-Low level switch
1:10-5
1:5 x 1:5 = 1:10
Here we can see the desired failure rates set out in Table 2 for each function, and translated into a performance requirement for each failure mode. We can also record actual MTBF measures against this to see how effective we have been in managing the failures of this asset to the desired levels of performance. However, this would only be a guide. The MTBF measured would only calculate since the beginning of measurement. The best use of this approach is to provide valuable input for RCM analysts, as well as for other applications within the reliability field. It would also give asset owners a pre-determined risk envelope that they require their assets to work within, increasing their control over asset performance, and hence over corporate profitability.
Summary MTBF is an exceptionally useful metric in the field of physical asset management and it is possible to apply it at any level throughout the physical asset base. The principal benefit of wide ranging use of MTBF is that it begins the process of focusing a company on how the assets work to fulfill a function, rather than what those assets actually are. This is one of the fundamental concepts of Reliability-centered Maintenance.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 110
Reliability-Centered Maintenance
As such, at whatever level it is applied, MTBF measures the function performed by that asset, asset system, or entire process. It is also useful for proactively establishing the performance expectations of the asset base, particularly in the areas of the Efficiency function.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 111
Reliability-Centered Maintenance
RCM-DO-06d Advanced Detective Maintenance Techniques
To Process
Consequence: Explosion of the pressure vessel when under high pressure conditions
To Process
To Process
To Process
To Process
Multiple Failure Event: Dangerous build-up of gas pressure within the pressure vessel.
To Process
Hidden-Failure: Failure of pressure release valve on high pressure vessel in a gas plant
# The Maintenance Scorecard, Daryl Mather, Industrial Press, ISBN 0831131810
If the ultimate high level switch fails closed, it is evident… If the ultimate high level switch fails open, then nobody knows it has failed…
High level shuts off pump until low level turns it back on again
Low low level switch turns off the pump until manually reset
1000 l/m
Ultimate high level switch. (normally open) Shuts the pump off until manually reset. Low level switch turns on the pump until the level reaches the high level switch
800 l/m
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 112
Reliability-Centered Maintenance
The probability that the protected function will fail in any one cycle is given by its failure rate One year Protected Function Protective Device
B Fails C Fails
If the failure rate is once in four years, then the probability that it will fail in one year is 1 in 4. (This corresponds to a mean time between failure of four year)
© Copyright Meridium... Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 113
Reliability-Centered Maintenance
The probability that the protective device will be in a failed state at any point in time is given by its downtime (if it conforms to a random failure pattern) One year Protected Function Protective Device
B Fails C Fails
If the downtime is 33% then the probability that is will be in a failed state at any point in time is 1 in 3. (This corresponds to an availability) © Copyright Meridium... Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 114
Reliability-Centered Maintenance
The Probability of a Multiple Failure One year Protected Function B Protective Device C
Mean Time Between Failures = 4 years
Availability = 67%
Downtime = 33%
The probability that B will fail while C is in a failed state: 1 in 4 x 1 in 3 = 1 in 12 (In other words there is a one in twelve chance that the multiple failure will occur in any one year)
© Copyright Meridium... Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 115
Reliability-Centered Maintenance
When developing a failure management policy for a hidden function, the first stage is to decide what probability we are prepared to tolerate for a multiple failure…
One year Protected Function B Protective Device C
Mean Time Between Failures = 4 years
Availability = 67%
Downtime = 33%
The probability that B will fail while C is in a failed state: 1 in 4 x 1 in 3 = 1 in 12 Prepared to accept 1 in 1000
© Copyright Meridium... Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 116
Reliability-Centered Maintenance
Reducing the Probability of a Multiple Failure Reduce the probability of failure of the protected function (by applying a suitable failure management policy)
One year Mean Time Between Failures = 10 years
Protected Function B Protective Device C
X Availability = 99%
Unavailability = 1%
And/or by increasing the availability of the protective device: - by preventing the failure of the protective device, or - by periodically checking whether the protective device is still working and repairing it if it has failed - by modifying the system in some way The probability that B will fail while C is in a failed state is now 1in 10 x 1 in 100 = 1 in 1000 © Copyright Meridium... Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 117
Reliability-Centered Maintenance
• 6 identical PSV’s have each been checked once a year for 5 years (FFI = 1 year) • So the devices have been in service a total of 30 years
To Process
From Process
• In that time 3 were found to be in a failed state • So the MTBF of the devices (MTBFdevice) is 30 years / 3 failures = 10 years Year 1
• We know that the failed devices failed some time during the year before the checks – but not when… • It seems reasonable to assume that each failed device was down for an average of 6 months
Year 2
Year 3
Year 4
Year 5
1 2 3 4 5 6
1.
So the total downtime (DTdevice) was 1.5 years out of 30 or 5%
2.
So on the basis of these figures it appears that: FFI = 2 x DT device x MTBF device
3.
This is generally true if DT device <5% and MTBF device is random
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 118
Reliability-Centered Maintenance
One year Protected Function B Protective Device C
Mean Time Between Failures = 10 years
Fails Failed
1 in 10 x 1 in 100 = 1 in 1000
Step Two Determine / estimate how often the protected function is likely to need to protective device
Step Three Calculate what unavailability of the protective device enables us to achieve 1 given 2
if DTdevice = Unavailability of the protective device MTBFfunction = Failure rate of the protected function MTBFmultiple = Failure rate of the multiple failure
Step One Decide what probability we tolerate for the multiple failure
then (1/MTBFfunction) x DTdevice = 1/MTBFmultiple
or DTdevice = MTBFfunction / MTBFmultiple
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 119
Reliability-Centered Maintenance
We have seen that… Where: Where:
••FFI FFI ==failure failurefinding findingtask task interval interval = Unavailability of ••DT DTdevice device = Unavailability of the theprotective protectivedevice device = MTBF of the ••MTBF MTBFdevice device = MTBF of the protective protectivedevice device ••MTBF = MTBF of the MTBFfunction function = MTBF of the protected function protected function = MTBF of the ••MTBF MTBFmultiple multiple = MTBF of the multiple multiplefailure failure
FFI = 2 x DT device x MTBF device
…1
…and that… DTdevice = MTBFfunction / MTBFmultiple
…2
Therefore.. by substituting 2 into 1 gives…
FFI =
2 x MTBFfunction x MTBFdevice MTBFmultiple
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 120
Reliability-Centered Maintenance
Exercise 1 – Steam Turbine The function of a speed sensor on a large steam turbine (680MW) is to measure the rotational speed of the turbine and to shut off the steam supply if the speed exceeds a specified limit. The multiple failures which could occur if this mechanism does not work when required is that the turbine could speed up to the point where centrifugal forces cause it to disintegrate. The electric utility which operates the turbine decides that they will accept a probability of the multiple failures once in 100,000 years for any one turbine. The utility has twenty similar turbines in operation for an average of ten years each, giving a total of 200 years of operating experience. As far as anyone knows, only two of these turbines have tripped out due to over-speeding during this period. This corresponds to an MTBF of the protected function of 100 years for any one turbine. The utility has never found one of the over speed mechanisms to be in a failed state when they have carried out failure finding checks on their own machines, but data from a commercial data bank indicate an MTBF of 500 years. How often should the utility perform a failure finding task on the over speed mechanism in order to reduce the probability of failure of the multiple failure to the desired level?
FFI =
2 x MTBFdevice x MTBFfunction MTBFmultiple
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 121
Reliability-Centered Maintenance
Exercise 2 – Steel Plant A steel producing plant has a need for many product-handling assets to move around the raw iron ore prior to processing. As part of this asset base, they have 15 large conveyors. Each of these has 4 e-stops, one on either side of the head end, and one on either side of the tail end of the conveyor. The management has tasked the maintenance team with determining a frequency for testing the function of each of these e-stops to make sure that when we need them to work they will work. After some discussion, they consulted relevant specifications and determined that they wanted these estops to meet their SIL-2 classification. For this company that means a likelihood of 1:1,000,000 (106) that any one would have a failure in any one year. They found that on their own plant they had never experienced a failure of one of the emergency stops. However, on consulting a commercial data store they found the following information: •
A population was tested over a time period of 108 hours
•
During this time the item was found to have failed 14 times in an undetected and unsafe manner,
•
And 60 times in a detected safe fashion They were installed all of the conveyors at roughly the same time 18 years ago. After conversations with a few of the longer serving people they were able to ascertain that they had required to use an e-stop, either to protect people or to protect life, approximately 12 times. What frequency will they need to do for the detective task to maintain the level of risk that the company has deemed as tolerable?
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 122
Reliability-Centered Maintenance
Common Cause Failure Modes
Calculation 2
• Managing more than one hidden failure…
U1
– Any failure mode could take out the protective function Failure Mode 1 Failure Mode 1
– Any failure modes that can be managed via predictive or preventive routines should be managed that way
Failure Mode 1
– All failure modes can be managed via one detective maintenance task
n…
– The detective task does not increase the likelihood of a multiple failure – It is practical to do the task at the required interval
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 123
Reliability-Centered Maintenance
We saw previously that…
Calculation 2
1/MTBFMultiple = (1/MTBFFunction) x DTTOTAL
B Fails C Fails U1
U2
Therefore…
U3
DTTOTAL = DT1 + DT2 + DT3
MTBF Function / MTBF Multiple = DT1 + DT2 + DT3 We can deduce from what we also saw previously… DT Device = FFI / 2 x (MTBF Device)
If we call MTBF of each of the three failure modes MD1, MD2 and MD3 respectively then… MTBF Function / MTBF Multiple =(FFI/2 x MD1)+(FFI/2 x MD2)+(FFI/2 x MD3) Therefore…
FFI =
2 x MTBF Function MTBF Multiple x (1/MD1+1/ MD2+1/ MD3)
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 124
Reliability-Centered Maintenance
Exercise 4 - Hoist A speed sensor on the hoist drum of a crane used in a machine shop is designed to activate the emergency brake on the main hoist if the drum starts turning too fast. If any aspect of the emergency braking system does not work when required and the hoist drum runs away, industry standards statistics suggest that there is a 5% chance that someone could get badly hurt or killed as a result. The group performing the review decides that they would like to reduce the probability of this happening to once in 200,000 years. If there is only a 1 in 20 chance (5%) that the multiple failure of the over speeding drum and failed emergency brakes will hurt or kill someone, an overall probability of 1 death or injury in 200,000 years for this reason can be achieved if the probability of the multiple failure itself is reduced to 1 in 10,000 years. This is a new system, so the users of the crane have no historical data about its performance. However, the suppliers of the speed sensor advise that it has an MTBF in this context of 300 years, and the emergency brake an MTBF in this context of 100 years. No information is available about the reliability of the electrical circuit between the two, but the behavior of similar circuits on similar cranes suggests an MTBF of 200 years. The circumstances under which the drum over speeds and needs the emergency brake occur on average once every 50 years. You are asked to determine how often the emergency braking system should be tested to reduce the multiple failure probability to the required level.
2 x MTBF Function
FFI =
MTBF Multiple x (1/MD1+1/ MD2+1/ MD3)
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 125
Reliability-Centered Maintenance
Options for redesign What if, we re-did the speed sensor example, but with different figures? (A higher level of tolerable risk and a lower device failure rate?)
…The electric utility which operates the turbine decides that they will accept a probability of failure of the multiple failure once in (say) 1,000,000 years for any one turbine. The utility has twenty similar turbines in operation for an average of ten years each, giving a total of 200 years of operating experience. As far as anyone knows, only two of these turbines have tripped out due to over-speeding during this period. This corresponds to an MTBF of the protected function of 100 years for any one turbine. The utility has never found one of the over speed mechanisms to be in a failed state when they have carried out failure finding checks on their own machines, but data from a commercial data bank indicate an MTBF of 100 years. How often should the utility perform a failure finding task on the over speed mechanism in order to reduce the probability of failure of the multiple failure to the desired level?
FFI =
2 x 100
x 100
1,000,000
FFI =
7.3 days
We can… Make the function evident somehow …or… Provide additional layers of protection __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 126
Reliability-Centered Maintenance
One year Protected Function B Protective Device C
10-2 x 10-2 = 1:10-4
Mean Time Between Failures = 5 years
Availability = 75%
Downtime = 25%
Function
10-2 x 10-2 x 10-2 = 1:10-6 Device 1 Device 2
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 127
Reliability-Centered Maintenance
In this case the unavailability of these devices should be squared in the failure finding formula… 1/MTBFMultiple = (1/MTBFFunction) x (DT Device)2
Formula 3
B Fails U1
Therefore…
C Fails
(MTBF Function / MTBF Multiple)1/2= DT Device
U2
If MTBFFunction and MTBFMultiple are given then… DT Device = (MTBFFunction / MTBFMultiple)1/2
Therefore…
FFI = 2 x MTBF Device x
1/2 MTBF Function MTBF Multiple
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 128
Reliability-Centered Maintenance
What if, we re-did the speed sensor example, but with different figures? (Higher level of tolerable risk, and lower device failure rate?) (Now two sensors) …The electric utility which operates the turbine decides that they will accept a probability of failure of the multiple failure once in (say) 1,000,000 years for any one turbine. The utility has twenty similar turbines in operation for an average of ten years each, giving a total of 200 years of operating experience. As far as anyone knows, only two of these turbines have tripped out due to over-speeding during this period. This corresponds to am MTBF of the protected function of 100 years for any one turbine. The utility has never found one of the over speed mechanisms to be in a failed state when they have carried out failure finding checks on their own machines, but data from a commercial data bank indicate an MTBF of 100 years. How often should the utility perform a failure finding task on the over speed mechanism in order to reduce the probability of failure of the multiple failure to the desired level?
FFI = 2 x MTBF Device x
MTBF Function
1/2
MTBF Multiple 100 FFI = 2 x 100 x
1/2 FFI =
2 years!
1,000,000
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 129
Reliability-Centered Maintenance
Multiple Redundant Devices
• We can do this for any number of devices tested randomly… FFI = 2 x MTBF Device x
MTBF Function
1/n
MTBF Multiple
• If all are tested together then the formula becomes… 1/n FFI = MTBF Device x
(n+1) x MTBF Function MTBF Multiple
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
Exercise 5 – Pumps and PSV’s
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 130
Reliability-Centered Maintenance
A hydraulic system is protected from overpressure by four Pressure Relief Valves (PRV’s). One is placed in the line to the line directly from the duty and standby pumping arrangements, and there is one PSV in each of the supply lines to the accumulators. If the pressure exceeds the safe working pressures then the PRV’s will relieve the pressure in the lines back to the hydraulic oil tank. All PSV’s are set to the same pressure level. The unit operates under extremely high pressures and if the safe working pressure is exceeded there is a chance of a pipe rupturing, exposing people in the surrounding areas to pressures likely to cause serious injuries. Risk ranking structures set-up by the corporate safety department has deemed this asset as a high criticality asset. This means that it will need to be managed to a tolerable probability of failure of 1:1,000,000. PSV 1
Accumulators
PSV 2
In the 12 years that the hydraulic system has been installed it has never once required any PRV to relieve the pressure within the hydraulic circuit to the accumulators. For this system they were unable to find failure rate information in commercial databanks.
However, a quick call to their 5 other plants in their company showed them that PSV 3 there were 4 such systems in the company, with a combined operating life of 80 years. Incident records show that the PRV’s have been used to relieve the pumps 10 times. Evidence from the manufacturer suggests that the PRV’s have a failure rate of 1:100. Given that all three will be tested at the same time, what is the failure finding frequency required to achieve the tolerable probability of a multiple failure?
1/n FFI = MTBF Device x
(n+1) x MTBF Function MTBF Multiple
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 131
Reliability-Centered Maintenance
Managing Risk in Hidden Failures Environment
Safety
Predictive
Preventive Restoration
First establish whether there is an intolerable risk or not. Second determine if a Predictive task is applicable and effective Third determine if a Preventive task is applicable and effective
Preventive Replacement Failure Finding
Redesign
Fourth determine if a Detective task is applicable and effective Fifth – the protection is inadequate
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 132
Reliability-Centered Maintenance
Voting Systems k out Formula of n systems 6
If “r” = number of units that need to be in a failed state before the entire system would fail then…
B Fails U1
U2
U3
r = n – k +1
C Fails
Therefore; if FFI is a very small fraction of MTBF Device it can be shown that: 1/r (n-1)! x r! x (r + 1) x MTBF Function FFI = MTBF Device x
n! x MTBF Multiple
! = Factorial (Used a lot in combinatronics and other probability theory statistical formulae) 5! = 1 x 2 x 3 x 4 x 5
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 133
Reliability-Centered Maintenance
Economic Consequences
But what about economic consequences…? • Operational and economic-only consequences are purely economic – In other words, the only consequence of a multiple failure that does not affect safety or the environment is that it costs money.
• But doing a failure finding task also costs money – So in this case, we need to determine the failure finding task interval that reduces total costs to a minimum, and then ask whether the minimum total cost is acceptable
© Copyright Meridium... Inc. 2007
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 134
Reliability-Centered Maintenance
We saw previously that…
FFI =
2 x MTBFdevice x MTBFfunction MTBFmultiple
Therefore the probability of failure in any one year
FFI 2 x MTBFdevice x MTBFfunction CM x FFI
The annualized cost of failure will be
2 x MTBFdevice x MTBFfunction CFF
The annualized cost of doing a failure finding task
FFI C Device
If FFI is a fairly small fraction of MTBF Function, the annualized cost of repairing the failed protective device will be approximately: MTBF Device
Likewise for the function…
C Function MTBF Function
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 135
Reliability-Centered Maintenance
Annualised cost of a multiple failure
dCTotal
Cmultiple
dFFI
x
FFI
2 x MTBFdevice x MTBFfunction
+
CFF FFI
+
C Device MTBF Device
+
C Function MTBF Function
At a minimum when
Annualised cost of failure finding CFF
Cost
FFI
Interval between failure finding tasks
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 136
Reliability-Centered Maintenance
Annualised cost of a multiple failure Cmultiple
x FFI
2 x MTBFdevice x MTBFfunction
dCTotal
+
CFF FFI
C Device
+
MTBF Device
Cmultiple
=
2 x MTBFdevice x MTBFfunction
-
dFFI
FFI2 = Where: Where:
+
C Function MTBF Function
CFF FFI2
2 x MTBFdevice x MTBFfunction x C FF Cmultiple
•• CCmultiple ==Cost Costof ofone oneMultiple Multiple multiple Failure Failure •• CCFF ==Cost Costof ofone onefailure failure FF finding findingtask task = Failure rate of •• MTBF MTBFdevice device = Failure rate of the protective the protectivedevice device •• MTBF = Failure rate of MTBFfunction function = Failure rate of the protected the protectedfunction function
1/2 2 x MTBFdevice x MTBFfunction x CFF CMultiple
__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 137
Reliability-Centered Maintenance
Exercise 6 – Economic Hidden Failures A hydraulic motor is used to drive an agitator on a reactor vessel in a chemical plant. The oil tank on the hydraulic system contains a low level alarm which is used to remind the operators when to fill the tank with oil. An ultimate low level alarm which is designed to shutdown the hydraulic system if the system runs low on oil and the upper switch fails to warn the operators. If both switches fail and the oil runs out, the motor could be severely damaged and the reactor down for up to 5 hours. This would cost the company $1,500 in lost production and $525 to repair the motor – a total cost of $2,025. The company has three such reactor vessels each driven by its own hydraulic system, and the operators can only recall two occasions in which an ultimate low level switch has needed to stop a motor over a period of twelve years. This means that the mean time between failures of the protected function is 18 years. (MTBFfunction) Until now the low level switches have never been checked, nor have they been in a failed state when called upon to work. In the absence of any other information and after careful study of the configuration of the switches, the RCM Facilitator decides that the MTBF of the ultimate low level switch is likely to be about twice that of the low level alarm, or 36 years. (MTBFdevice) It is difficult to reach the switches, so a full functional check of the ultimate switches requires lowering the level of the tanks under controlled conditions and checking whether the motors cut-out. This task takes about an hour per tank at a cost of $25 per task (CFF). In the light of this information, you are asked to determine the optimum failure finding interval for these switches.
1/2 2 x MTBFdevice x MTBFfunction x CFF CMultiple
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 138
Reliability-Centered Maintenance
RCM-DO-07 The Value of RCM As a cornerstone of the maintenance discipline, RCM can achieve benefits in a vast number of areas depending on where and how it is applied. When properly implemented, Reliability Centered Maintenance provides companies with a tool for achieving lowest asset Net Present Costs (NPC) for a given level of performance and risk. This implies a cashable impact across a multitude of economic activities, covering both OPEX8 and CAPEX9. However, RCM will also provide companies with a range of non-cashable advantages that will have a positive impact throughout the enterprise. This document contains a brief list of potential areas of benefit only, and not the entire range of potential uses of RCM. Along with these areas, the author has previously used RCM for •
capital submissions in regulated industries,
•
to reduce the risk of legal ramifications in management of environmental integrity,
•
to establish a tool for contract negotiations related to outsourced maintenance,
•
reduction of a companies carbon footprint,
•
and as a means of developing trouble shooting guides
The information in this module is to alleviate some of the benefits anxiety that often surfaces in the early implementation stages of large-scale RCM projects, and to provide guidelines for trainee RCM Analysts.
The Cashable Results of RCM Direct cashable benefits from implementing RCM can emerge in every area where maintenance and operations have an impact. This can include such disparate areas as increased uptime, decreasing energy usage, reductions in chemical utilization, or reductions in inventory holdings and routine maintenance spending. Instead of trying to cover all the potential areas where the method can deliver financial impacts, this section will focus more on how RCM influences the profit and loss of an enterprise. This is evident in two principle areas, •
an increase in potential revenue, and
•
direct cost reductions.
8 9
OPEX – Operational Expenditure CAPEX – Capital Expenditure
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 139
Reliability-Centered Maintenance
Direct Cost Reductions The main noticeable result of Reliability Centered Maintenance is a dramatic change to the maintenance regimes that are in place.
John Moubray, a pioneer in this field until his recent passing, regularly stated that RCM would achieve “a reduction of between 20% and 70% in routine maintenance where there is an existing scheduled maintenance program.” Based on the experience of the author, this leads primarily to an increased level of cost-effectiveness of maintenance, particularly in industries that are very asset intensive.10 The team is able to claim benefits in these areas where there is a calculable reduction in the cost of labor, materials or consumables to perform maintenance11 over a reasonable amount of time. (Usually a year) Logically, these are only potential benefits at the completion of the analysis, as it will take until the first omitted routine, or the first breakdown requiring reduced resources, before savings begin to accrue. However, once implemented they can easily be counted through direct calculation. For this to be accurate there is a need to quantify both the routine maintenance costs as well as the corrective maintenance costs. There are some real world limitations on attempting to forecast cost reductions purely through accumulated data. The first issue the team can face is that current maintenance regimes often do not exist in the company’s ERP or CMMS program, or they group them at a high level. Data losses, poor ERP management, and distrust of technology means that experienced technicians often keep the knowledge of existing maintenance outside of corporate systems. Further compounding the issue is the disparate way that maintenance routines are stored. At times, they are at an asset level, a maintainable item level, and still other times they can be at higher system or unit levels. A second limitation is that on the occasions when RCM proposes a more rigorous policy, there is a tendency to overlook the change in reactive and corrective maintenance.12 Still, some direct cost reduction cases are obvious and do not require a detailed activity analysis. Every task in an RCM analysis must be both applicable, meaning it is physically possible to do the task, and effective, worthwhile doing in terms of cost and/or risk, before selection as an adequate failure management strategy. 10
Asset-Intensive – Industries where asset maintenance and asset replacement form major parts of OPEX and CAPEX 11 Maintenance refers to both routine and corrective or reactive activities. 12 The issues surrounding RCM and WoL asset management are covered in more detail in “RCM-DO-10 RCM and Whole-of-Life (WoL) Asset Management”
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 140
Reliability-Centered Maintenance
When maintenance is developed using an unstructured method there are common errors that can occur. Ineffective Maintenance One of the great misleading statistics in asset maintenance today is the calculation of average life for bearings. The effect of this is to support the outdated and almost mystical belief of the link between age and failure.
Based on this way of thinking, it is still common to find maintenance departments carrying out hardtime bearing replacement programs as a means of managing risk. However, it has been the experience of the author that hard time bearing replacement policies can increase, rather than decrease, the likelihood of failure while at the same time increasing the direct maintenance costs. This flies in the face of popular beliefs and is an example of how RCM thinking can drive reductions in routine maintenance levels. The original Nowlan and Heap report13 specifically spoke about bearings when addressing failure in complex assets. A complex item, as opposed to a simple item, is one that is subject to many failure modes. As a result, the failure processes may involve a dozen different stress and resistance considerations. Even with complex items, failures related to age will concentrate about an average age for that mode. However, bearings have many failure modes. Where there is no dominant failure mode14, as is the case in complex items such as most bearings, then distribution of the average life of all the failure modes is widely dispersed along the entire exposure axis.15 Therefore, failure will be unrelated to operating age. This is a unique feature of complex items. When deciding maintenance policy for bearings, this issue is further exacerbating by the provision of the L10 life by manufacturers. This number represents the point at which 10% of the items may have failed, meaning that 90% will have survived. Lieblein and Zelen, in their seminal work on the subject of bearing life16, found that the characteristic life, the point where statistically 63.2% of the items will have failed, was roughly 5 times the L10 life. They also found that the “life” forecasts had a median Weibull Beta value of 1.4, indicating a near constant probability of failure. This means that the likelihood of failure at any point in the life of the bearings in their study increased only marginally as the asset aged. Other published analyses have quoted a beta of “1.3” for Ball and Roller Bearings, and a beta of “1” for sleeve bearings.17
13
Reliability-centered Maintenance, F.S. Nowlan et al, United Airlines, San Francisco, December 1978 Dominant failure mode – the most common cause of failure 15 Reliability-centered Maintenance, F.S. Nowlan et al, United Airlines, San Francisco, December 1978 16 Statistical Investigation of the Fatigue Life of Deep Groove Bearings, J. Lieblen and M. Zelen, Journal of Research of the National Bureau of Standards, Vol 57, No 5, November 1956. 14
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 141
Reliability-Centered Maintenance
In process manufacturing industries, we find contaminated oil as one of frequent reasons for early life failures. However, this is only one of the multitudes of stresses that bearings face as complex assets. Others can include poor storage leading to false brinnelling and early corrosion, excessive heat and pressure, overloading, exposure to vibration, abrasions and cracks. All of these could contribute to either early life failures, or premature wear out.
L50 Life
Often, the L10 life is mistaken for an end life point for bearings, thus used as a reference interval for replacement tasks. However, as can be seen from the information above, it is not the end-life, rather a minimum guaranteed life for 90% of bearings under specific load conditions. This is in line with Nowlan and Heaps’ findings and shows that in many cases we are at best wasting a large portion of the bearings useful life, making this an ineffective use of maintenance resources.18
Characteristic Life 63.2%
Complex assets, such as bearings, do not have a dominant failure mode. Instead they many different stresses leading to failure.
L10 Life
Average Life
Conditional probability of failure Likelihood of failure at every point… Constant / Random These failures are distributed along the stress axis, making failure unrelated to age. This is unique to complex assets.
Increased bearing life and decreased labor costs are not the only potential savings. Frequent replacing of bearings on, say, motor shafts we introduce the likelihood of a range of additional failure modes. For example, installation and frequent change out failures include: Wear of the motor shaft, decreasing the adequacy of the interference fit; leading to bearings spinning on the shaft (A failure of the motor, not of the bearing) Over heating of the bearing leading to early life failures and distortion of the inner race Excessive force (i.e. Hammers) instead of bearing pullers, damaging the races of the bearings and leading to early life failures 17
Bloch, Heinz P. and Fred K. Geitner, 1994, Practical Machinery Management for Process Plants, Volume 2: Machinery Failure Analysis and Troubleshooting, 2nd Edition, Gulf Publishing Company, Houston, TX 18 Over one machine, this appears to be a very small maintenance cost item. However, when applied throughout a plant, or on the so-called “critical” assets, it amounts to a significant maintenance cost.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 142
Reliability-Centered Maintenance
Bearing misalignment Wrong bearing selection Pre-failed bearings due to poor storage techniques While we can manage some of these, others are a direct result of frequent bearing changes. Therefore, if we use hard time bearing replacement as a maintenance policy then we are: a) reducing the maximum used life of the bearing, and b) increasing the likelihood of failure through the introduction of several additional failure modes In the Meridium RCM decision algorithm19, a management policy for an Evident Operational and Non-Operational failure mode must comply with the following:
“Over a period of time, the failure management policy must cost less than the cost of the operational consequences (if any) plus the total cost of repair.” Ineffective maintenance is more common than most professionals think, it can also include areas such as maintenance out of context, where maintenance regimes are unaligned with how the asset is used, or practices that decrease an assets efficient operations. Using the decision algorithm in RCM, the first option available to the team is Predictive Maintenance. Where this is both applicable and effective it will increase the effectiveness of maintenance in a range of areas: Predictive Maintenance detects the signs of the onset of failure. As such, it provides the capability to manage all failures, including random failures. It can be done in-situ and often without interfering with the normal operation of the process. It will ensure that the asset utilizes all of its economically useful life. (As opposed to hard-time replacements) Inapplicable Maintenance This mistaken belief that there is always a relationship between age and failure leads maintenance departments to all sorts of policies that, in practice, are achieving nothing.
Often these occur during maintenance turnarounds. The opportunity to access items that are normally in a running state drives people to inspect items just in case a life related failure mode has developed. In particular, this again is a common activity in relation to bearing management. For example, a turbine turnaround occurs once every 3 years (say) for other failure management reasons.
19
The Meridium RCM Decision Algorithm is based on Figure 17 – A Second Decision Diagram Example, page 49, SAE JA1012, 2002-01
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 143
Reliability-Centered Maintenance
The maintenance department has taken this opportunity to perform a dye penetrant check on the bearing to see if any cracks are starting to form, requiring them to take action. On the face of it, this appears to be a perfectly valid, even wise, use of the opportunity. However, on applying the RCM logic a little closer this perception changes dramatically. For the sake of this example, we will say that the P-F interval is about 3 months. Meaning once we detect cracks in this particular bearing, we have around three months of time prior to functional failure. If we test the bearing on a hard-time basis of every three years, and the P-F interval is three months, then the following logic applies. a) The dye penetrant test is only useful if the bearing failure is occurring at the time of inspection. b) This means it had to start developing at less than 3 months prior to opening. As we shutdown every 36 months, the likelihood of this occurring (given the randomness of bearing failure) is around 1:12. Turnaround Interval = 3 years
Moreover, the likelihood of it not occurring is around 11:12. This task does not satisfy the RCM applicability criteria and is a waste of resources. In addition, opening the bearing housing and interfering with the bearing, which presumably is operating fine, we again introduce the possibility of human error.20 Likelihood of detection 1:12 Likelihood of non-detection 11:12 P-F Interval = 3 months
It is difficult to categorize this maintenance practice directly; but the closest match in RCM is Predictive Maintenance. (PTIVE)
In the Meridium RCM decision algorithm, this means the team needs to answer all of the following questions before this task is applicable:
Is there a clear potential failure condition? What is it? What is the P-F interval? Is the interval long enough to take action to avoid or minimise the consequences of failure? Is the P-F interval reasonably consistent? 20
Human error is discussed in detail in module RCM-DO-06a Introducing Human Error.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 144
Reliability-Centered Maintenance
Is it practical to do the task at intervals less than the P-F interval? The team would be able to answer all of the above questions positively except for the last one. For the task of dye penetrant, testing it is not practical to do the task at intervals less than the P-F Interval, therefore the task is not applicable.
Inapplicable maintenance practices are widespread and, in the experience of the author, often reflect the underlying belief of a consistent relationship between age and failure. Increases in Revenue There are two specific areas where an RCM team can claim savings.
a) Where an asset, or system, has a history of failures leading to lost production opportunities. Principally this refers unplanned shutdowns, overrun turnarounds, and start up issues of an asset or system. b) Where an asset, or system, has a history of failures leading to reduced production output. This includes areas such as utilization, quality, and reduced availability. For example: a. Reduced turnaround times b. Increased yield (quality) c. Increased availability for full production rates
Unplanned Shutdowns
Downtime
Shutdown Overruns Startup Failures Off Spec. Production
Under-performance Production Slow Down
Uptime Best Achievable Rate
Planned Capacity
The RCM team can claim these savings only where they can prove they have isolated the cause of the lost, or reduced, production and have recommended a strategy that will mitigate it or prevent it in the future.
These are potential because it will take a reasonable amount of time, nominally one year, before effective measurement can prove reduced production losses. However, it is often the case that there are noticeable increases in available uptime after implementing RCM maintenance policies. Calculating benefits in this case requires the estimation the value of additional uptime, throughput or yield, as well as the reduced costs of labor and materials.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 145
Reliability-Centered Maintenance
As these are historic failures, issues such as quantification of lost production, direct maintenance costs, and the frequency of failure are relatively easy to find out. However, an alternative is to use sophisticated forecasting techniques such as Crow-AMSAA. This is time proven as an accurate method for forecasting failure rates; enabling the team to then calculate savings from the changes to asset maintenance. This is also a valid method for forecasting savings in direct costs. Other Cashable Benefits It is the experience of the author that CAPEX, as opposed to OPEX, benefits often represent the largest cashable advantages to implementing RCM. •
A delayed use of capital, compared to the pre-RCM scenario, allowing deployment elsewhere in the enterprise. This occurs through life-extension, and through higher confidence decision making.
•
A reduction in operating losses, over the life of the asset base, attributable to correct timing of capital refurbishment and replacement tasks
•
A potential reduction in the cost of capital and the cost of insuring assets, due to the increased confidence in decision-making
•
Through the incorporation of risk into the budgeting process, the benefits of this are literally incalculable as they depend on how the organization uses this information in the marketplace.
•
A calculable reduction in inventory holdings based on the RCM approach.
While there are other cashable benefits, the above listed items represent the most common and the least debated among the reliability communities.
The Non-cashable Results of RCM RCM will increase the teams’ awareness of the limitations and operational requirements of the physical assets they study, often substantially. This results in the following intangible benefits: •
A reduction in the risk of safety and environmental integrity related failure modes.
•
Increased knowledge of the assets, their functions and their failures
•
Increased ability to trouble shoot failed assets
•
Changes to P&IDs specifically, and at times to other process drawings
•
Changes to operation procedures, training, purchasing, work practices and other related areas
•
A tangible increase in the quality and integrity of asset data because of the focus of RCM
However, it is often difficult, if not impossible, to measure the extent of the impact or to link them to changes in the profitability of the enterprise. At times, the effort to do this can actually distort or obscure the achievement itself. (Attempts to equate a reduction in the risk of loss of life to a monetary value, is an example of this)
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 146
Reliability-Centered Maintenance
However, it is possible to represent some non-cashable benefits in monetary terms. The most common of these is cost avoidance. Risk Mitigation When the mitigated risk is economic, it is often termed cost avoidance.
Where the team has implemented a policy for a reasonably likely21 failure mode where there was an inadequate existing strategy in place, the team is justified in claiming this as a potential benefit of RCM, even though the failure has not occurred previously. These benefits count as non-cashable for a number of reasons: 1. They will never appear as part of the profit and loss of any enterprise. Nor will they cause a change to maintenance budgets or revenues. 2. The team requires estimates to calculate the cost avoidance benefit. Some failure modes may have similar consequences, affect similar assets, and have overlapping impacts on production. For example, RCM teams can find themselves presenting benefits of several times the value of the entire installation. If not explained correctly this is a false representation, which can erode the credibility of RCM, and of the team attempting to implement it. They are nevertheless valid and important benefits for the RCM team to claim. Note the emphasis on “an inadequate existing strategy”. RCM did not invent maintenance, and often there are adequate existing failure management policies in place. As an output, the team will find that some maintenance regimes will disappear, some will remain, and they will add some new, more sophisticated, regimes. Redundancy
New
Existing pre-RCM routines
Remaining pre-RCM routines Net maintenance tasks
This occurs because some of the maintenance policies in place are redundant, some are either inapplicable or ineffective, yet others are adequate means of managing failure. Thus, there is no justification for claiming benefits where there is an adequate existing strategy to manage the failure mode.
New
Nor is there any justification for claiming benefits where failure modes are not reasonably likely. Other areas of risk mitigation are failure modes that would affect either safety or environmental integrity.
In many cases, these will have direct economic consequences through regulatory penalties, or through secondary economic damages caused by the Pre-RCM
Post-RCM
21
What constitutes reasonably likely is specific to each company, and often to each RCM analysis. Methods for determining reasonableness are not included in this module.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 147
Reliability-Centered Maintenance
failure. Where this is the case then the team can calculate the value of the cost avoided in a similar method to economic only consequences. 22 Where the failure mode will not have significant economic consequences, the delta between the discovered risk and the managed risk can represent the benefit of risk mitigation.
The Principal Barrier to Value Realization The benefits of RCM are obvious to anybody who has studied it or to any maintenance practitioner who can relate to the concepts espoused in the method. All levels within the corporation generally see different advantages to RCM and there is rarely a lack of motivation for improvement. Implementation problems commence due to fundamental misunderstandings about maintenance and the functions of physical asset management23. This leads maintenance departments to see increased risk where it does not exist. Cashable
Non-Cashable
Increased Revenue
Risk Mitigation
Reduced Costs
Knowledge Increases
For example, a maintenance manager could face any of the following recommendations: (Among others) •
Elimination hard-time applicable and effective,
replacement
policies
where
•
Elimination of invasive inspection while we have the opportunity on planned turnarounds.
This reluctance to change comes from the perception that this is risky, and instead of implementing the policy changes, things stay as they are. The result is more of the same.
•
Risk of unplanned failure stays provably higher, and
•
the effectiveness of maintenance stays provably lower.
Moreover, resources remain tight performing maintenance that is not required, or repairing problems caused by the activities that are supposed to prevent them. It is clear that before we can successfully implement the strategy outcomes of RCM, we first need to make sure that there is a deep understanding within the company of modern reliability principles.
22
Cost avoidance calculation methods are available in Handout RCM-DO-07a Calculating Costs Avoided, inspired by the work of Steve Soos on this subject. 23 The Role of the Maintenance Manager, Daryl Mather, 2008: • Design effective maintenance policy • Execute them as efficiently as possible • Collect relevant data for higher confidence decisions in the future.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 148
Reliability-Centered Maintenance
The Role of the RCM Facilitator/Analyst In a time of continual change, the ability to implement is one of the most prized and sought after skill sets. In module RCM-DO-08 Implementation and Execution, we highlight the importance of momentum and the vital role of benefit awareness in creating momentum. RCM often requires the cooperation of a range of departments; including purchasing/stores, human resources/training, operations, maintenance and the engineering department. In the experience of the author, initiatives are not successful over the medium-long term when companies try to order change. If you want to change the way an organization works fundamentally, then people have to want to change. For this to happen they need to understand the logic behind RCM, and they must understand what the benefits are to them in their present role. One of the useful tools for engaging people is a solid, fact based benefits cases for every analysis that is completed. If it is to be effective, then this task should commence during the analysis period itself, and presented before implementation.
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 149