3. Rcm Fundamentals- Meridium

  • Uploaded by: murali
  • 0
  • 0
  • January 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View 3. Rcm Fundamentals- Meridium as PDF for free.

More details

  • Words: 26,922
  • Pages: 149
Loading documents preview...
Reliability-Centered Maintenance

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 1

Reliability-Centered Maintenance Version: RCM Fundamentals Training.doc Copyright © Meridium, Inc. 2008 All Rights Reserved This training material is provided under a license agreement containing restrictions on use and disclosure. All rights, including reproduction by photographic or electronic process and translation into other languages, are reserved by Meridium. Meridium is a registered trademark of Meridium, Inc.

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 2

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 3

Reliability-Centered Maintenance

Table of Contents Table of Contents ............................................................................. 4 Foreword ....................................................................................... 7 Reliability-centered Maintenance ......................................................... 10 RCM-DO-01 Fundamentals of Managing Maintenance................................. 11 The Expectations of Maintenance ..................................................... 11 Understanding Failure ................................................................... 12 The Objective of Maintenance ......................................................... 15 What is RCM?.............................................................................. 16 The RCM Structure ....................................................................... 17 Functions .................................................................................. 18 The FMEA.................................................................................. 19 Consequences............................................................................. 20 Failure Management Strategies ........................................................ 22 Default Actions ........................................................................... 24 RCM-DO-02 Preparing for Analysis....................................................... 25 RCM-DO-03 Functions and Functional Failures ........................................ 26 Operating Context ....................................................................... 27 Writing Functions ........................................................................ 28 Performance Standards ................................................................. 29 Exercises .................................................................................. 31 Secondary Functions..................................................................... 32 RCM-DO-03b Air Conditioner ........................................................... 33 Functional Failures ...................................................................... 36 Failed States .............................................................................. 37 Exercise ................................................................................... 38 Exercises .................................................................................. 39 RCM-DO-04 Failure Modes and Effects .................................................. 40 Reasonably Likely ........................................................................ 41 Causality .................................................................................. 42 Writing a Failure Mode .................................................................. 44 Types of Failures ......................................................................... 45 The Problem with Data.................................................................. 46 Effects ..................................................................................... 48 RCM-DO-05 Consequences and Effectiveness .......................................... 50 Hidden or Evident?....................................................................... 52 Safety ...................................................................................... 54 Environmental ............................................................................ 55

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 4

Reliability-Centered Maintenance

Operational ............................................................................... 56 Repair Only ............................................................................... 57 RCM-HO-05a Assigning Consequences ................................................. 58 Applicable and Effective ................................................................ 61 Tolerable levels of Risk ................................................................. 62 Hidden Failures........................................................................... 64 The Famous Pump Example ............................................................ 67 Exercise 1 ................................................................................. 78 Exercise 2 ................................................................................. 79 Exercise 3 ................................................................................. 80 Case Study - BP refinery Incident...................................................... 82 Managing Safety and Environmental Consequences................................. 83 Economic Consequences ................................................................ 84 RCM-DO-06 Applicability and Task Selection .......................................... 87 Types of Maintenance ................................................................... 90 Preventive Maintenance (PM’s) ........................................................ 91 Predictive Maintenance ................................................................. 94 Detective Maintenance.................................................................101 Exercise 1 – Task Categories...........................................................102 Exercise 2 – Which type of maintenance? ...........................................103 The Basis of Task Preference..........................................................104 RCM-DO-06c Uses of MTBF...............................................................105 What MTBF can tell us?.................................................................105 At what level can we apply MTBF? ...................................................106 How can MTBF add value to Reliability Initiatives? ................................108 Summary .................................................................................110 RCM-DO-06d Advanced Detective Maintenance Techniques........................112 Exercise 1 – Steam Turbine ............................................................121 Exercise 2 – Steel Plant ................................................................122 Common Cause Failure Modes.........................................................123 Exercise 4 - Hoist .......................................................................125 Options for redesign ....................................................................126 Multiple Redundant Devices ...........................................................130 Exercise 5 – Pumps and PSV’s .........................................................130 Managing Risk in Hidden Failures .....................................................132 Voting Systems ..........................................................................133 Economic Consequences ...............................................................134 Exercise 6 – Economic Hidden Failures ..............................................138 RCM-DO-07 The Value of RCM...........................................................139 The Cashable Results of RCM..........................................................139 The Non-cashable Results of RCM ....................................................146 The Principal Barrier to Value Realization ..........................................148 The Role of the RCM Facilitator/Analyst ............................................149 © Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 5

Reliability-Centered Maintenance

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 6

Reliability-Centered Maintenance

Foreword The Reliability-Centered Maintenance (RCM) approach was first documented in the detailed book on the subject by F. Stanley Nowlan, Director, Maintenance Analysis, and Howard F. Heap, Manager, Maintenance Program Planning, both of United Airlines1. The book was sponsored by the Office of the Assistant Secretary of Defense (Manpower, Reserve Affairs and Logistics) and was published in 1978. From that book: For years maintenance was a craft learned through experience and rarely examined analytically. As new performance requirements led to increasingly complex equipment, however, maintenance cost grew accordingly. By the late 1950's the volume of these cost in the airline industry had reached a level that warranted a new look at the entire concept of preventive maintenance. By that time studies of actual operating data had also begun to contradict certain basis assumptions of traditional maintenance practice. One of the underlying assumptions of maintenance theory has always been that there is a fundamental cause-and-effect relationship between scheduled maintenance and operating reliability. This assumption was based on the intuitive belief that because mechanical parts wear out, the reliability of any equipments directly related to operating age. It therefore followed that the more frequently equipment was overhauled, the better protected it was against the likelihood of failure. The only problem was in determining what age limit was necessary to assure reliable operation. In the case of aircraft it was also commonly assumed that all reliability problems were directly related to operating safety. Over the years, however, it was found that many types of failures could not be prevented no matter how intensive the maintenance activities. Moreover, in a field subject to rapidly expanding technology it was becoming increasingly difficult to eliminate uncertainty. Equipment designers were able to cope with this problem, not by preventing failures, but by preventing such failures from affecting safety. In most aircraft essential functions are protected by redundancy features which ensure that, in the event of a failure, the necessary function will still be available from some other source. Although fail-safe and "failure-tolerant" design practices have not entirely eliminated the relationship between safety and reliability, they have dissociated the two issues sufficiently that their implications for maintenance have become quite different. A major question still remained, however, concerning the relationship between schedule maintenance and reliability. Despite the time-honored belief that reliability was directly related to the intervals between scheduled overhauls, searching studies based on actuarial analysis of failure

1

F. Stanley Nowlan and Howard F. Heap, Reliability Centered Maintenance, United Airlines and Dolby Press, sponsored and published by the Office of Assistant Secretary of Defense, 1978

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 7

Reliability-Centered Maintenance

data suggested that the traditional hard-time policies were, apart from their expense, ineffective in controlling failure rates. This was not because the intervals were not short enough, and surely not because the teardown inspections were not sufficiently through. Rather, it was because, contrary to expectations, for many items the likelihood of failure did not in fact increase with increasing operation age. Consequently a maintenance policy based exclusively on some maximum operating age would, no matter what the age limit, have little or no effect on the failure rate. In 1960 a task force of FAA and airline personnel was formed to investigate scheduled maintenance and resulted in an FAA/Industry Reliability Program in 1961. Building upon this work, in 1965 United Airlines developed a rudimentary decision-diagram technique. This technique was refined and embodied in the 747 Maintenance Steering Group (MSG) Handbook: Maintenance Evaluation and Program Development (MSG-1) from the Air Transport Association in 1968. MSG-1 was used to develop the maintenance program for the Boeing 747, the first maintenance program to apply RCM concepts. Subsequent improvements led to MSG-2, which was used to develop the maintenance programs for the Lockheed 1011 and the Douglas DC-10. A similar document, European Maintenance System Guide, served as the basis for development of the initial programs for the Concorde and the Airbus A-300. The objective of the approach outlined in MSG-1 and MSG-2 was to develop a scheduled maintenance program that assured the maximum safety and reliability of equipment at the lowest cost. An example of the success of this approach can be seen comparing the Douglas DC-8, which had a scheduled overhaul of 339 items in a traditional maintenance program to the DC-10, based upon MSG-2, which only had seven items to be overhauled. The latest commercial aircraft maintenance guidance is based upon MSG-3 (Rev 2) for the Boeing 757 and 767 aircraft. In the early 1970's this work attracted the attention of the office of the Secretary of Defense. The Navy was the first military organization to apply RCM to both new design and in-service aircraft. Also in the early 1970's, the Navy embarked on a major program to change the way nuclear submarines were maintained. Over the next 20 years the Navy would virtually eliminate scheduled overhaul on the nuclear submarine based upon an aggressive Condition Monitoring Program and other technical advances to the ship systems. RCM is currently being used on all new ship designs. The RCM methodology has subsequently been applied in a wide variety of commercial and military applications. The Electric Power Research Institute (EPRI) has tested the methodology at several nuclear power utility sites of Florida Power & Electric, Duke Power, and Southern California Edison. Puget Sound Power and Light Co. has been using RCM since 1991 in both substations and line maintenance. NASA has long used RCM in analyzing Space Shuttle and

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 8

Reliability-Centered Maintenance

Shuttle Support Systems. In the early 1990's, NASA embarked on a process of basing the approach to facilities maintenance on RCM. And in 1995, Boeing Commercial Airplane Group embraced RCM as one of the tools in implementing a more robust and standardized facilities maintenance program.2 This was significant in that one of the key groups in fomenting RCM in complex systems (Boeing Aircraft) was now applying the approach to common industrial facilities equipment. More recently, issues surrounding RCM seem more focused on applying the technique and less on proving its value. Must a group perform a classical/rigorous analysis, or is a more streamlined approach acceptable? An excellent article regarding the variations in the methodology was presented at the 2003 International Maintenance Conference.3 Regardless of the approach selected, the outcome of RCM analysis is focused on selecting the most effective maintenance strategy and, when maintenance can not deliver the needed reliability, identifying redesign requirements.

2

Westbrook, Dennis, Boeing Commercial Airplane Group, and William H. Closser, C&A Consulting, “Transition of an Organization to a Reliability Based Culture”, Proceedings of 14th Annual International Maintenance Conference, August 3-7, 1997, Atlanta, GA 3 Nicholas, Jack R. “The Controversy about Reliability Centered Maintenance Methodology, Its Variants and Derivatives”, Proceedings of the 18th International Maintenance Conference, Dec. 7-10, 2003, Clearwater, FL.

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 9

Reliability-Centered Maintenance

Reliability-centered Maintenance

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 10

Reliability-Centered Maintenance

RCM-DO-01 Fundamentals of Managing Maintenance The Expectations of Maintenance

• Productivity – How much are we producing?

• Cost-Effectiveness – What is it costing us to do so?

• Safety & Environment – Are we hurting anybody or damaging the environment in the process?

• Quality – Are we producing at a consistent high level of quality?

• Corporate Learning – How can I make sure that I will be able to sustain/improve this into the future?

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 11

Reliability-Centered Maintenance

Understanding Failure The “Wear-out” Curve

The Thebelief beliefthat thatall allassets assetshave haveaa“life”. “life”.That Thatisis–– aaperiod periodofoffew fewrandom randomfailures failuresfollowed followedby byaa wear out zone. wear out zone.

Eventually Eventuallypeople peoplestarted startedto tobelieve believethat thatmany many assets actually suffered early life failures. assets actually suffered early life failures. The The“bath-tub” “bath-tub”curve curvemakes makesup upthe thebasis basisofof many engineers beliefs in asset performance many engineers beliefs in asset performance

The “Bathtub” Curve

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 12

Reliability-Centered Maintenance



Only 11% of failures were related to age… 89% had no direct correlation with the age of the assets at all! • And only 6% had a wear out curve



So what? • If our maintenance schedule has been developed based on principles of “life” then we are achieving…? Or worse… • 64% of failures were infant mortality failures..

The 6 Failure Patterns

A.

4%

B.

2%

C.

5%

D.

7%



14% of all failures were seen as random, therefore we are often doing absolutely nothing to manage these!

E.

14%



F.

66%

Do different assets fail differently? • Complex assets… • Simple assets have dominant failure modes (Wear, erosion, corrosion, evaporation etc)



Regardless of the status in your industry – it will increase, as automation, mechanization and asset complexity increases.



# Reliability-Centered Maintenance, (Nowlan and Heap) Exhibit 2:13 Age related Patterns

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 13

Reliability-Centered Maintenance

UAL 1968

Broberg 1973

MSP 1982

A

4%

3%

3%

B

2%

1%

17%

C

5%

4%

3%

D

7%

11%

6%

E

14%

15%

42%

F

68%

66%

29%

The 6 Failure Patterns

# U.S. Navy Analysis of Submarine Maintenance Data and the Development of Age and Reliability Profiles - Timothy M. Allen, Department of the Navy

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 14

Reliability-Centered Maintenance

The Objective of Maintenance Initial Capability (What it can do) Margin for Deterioration

Performance

Desired Performance (What its users want it to do)

• So, if the objective of maintenance is to keep the asset running between what it “can” do and what the users “want” it to do. Then we must: – First, define what the users want the asset to do in its present operating context – Second, determine if the asset is able to meet these requirements – Third, determine the maintenance interventions required

# SAE JA1012 Figure 2 # SAE JA1012 Section 6.2 Performance Standards

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 15

Reliability-Centered Maintenance

What is RCM?

RCM is a process to ensure that assets continue to meet their user requirements in their present operating context. ~John Moubray RCM applies to any equipment where there is a need to realise maximum operating reliability at the lowest cost ~ Stan Nowlan and Howard Heap

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 16

Reliability-Centered Maintenance

The RCM Structure 1. What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions) 2. In what ways can it fail to fulfil its functions? (Functional Failures) 3. What causes each functional failure? (Failure Modes) 4. What happens when each failure occurs? (Failure Effects) 5. In what way does each failure matter? (Failure Consequences) 6. What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals) 7. What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 17

Reliability-Centered Maintenance

Functions

The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context?

(Functions) (All Functions)

2.

In what ways can it fail to fulfil its functions? (Functional Failures)

3.

What causes each functional failure? (Failure Modes)

4.

What happens when each failure occurs? (Failure Effects)

5.

In what way does each failure matter? (Failure Consequences)

6.

What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 18

Reliability-Centered Maintenance

The FMEA

The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)

2.

In what ways can it fail to fulfil its functions?

(Functional Failures) 3.

What causes each functional failure?

(Failure Modes) 4.

All failed states, causes of failure, and the effects of each failure

What happens when each failure occurs?

(Failure Effects) 5.

In what way does each failure matter? (Failure Consequences)

6.

What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 19

Reliability-Centered Maintenance

Consequences

The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)

2.

In what ways can it fail to fulfil its functions? (Functional Failures)

3.

What causes each functional failure? (Failure Modes)

4.

What happens when each failure occurs? (Failure Effects)

5.

In what way does each failure matter?

(Failure Consequences)

How it matters

6.

What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 20

Reliability-Centered Maintenance

HN

HO

HE No

Does the failure have a direct adverse effect on operational capability?

No

Yes

Yes Predictive Task

Is a Preventive Restoration task technically feasible and effective?

HO2 HN2

Yes No

HN3

Yes Preventive Replacement Task

Is a Detective task to detect the failure technically feasible and effective? Yes

HO4 HN4

Detective Task

HE2

No

Is a Preventive Replacement task technically feasible and effective?

HE3

Preventive Replacement Task

Run-to -Fail

Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?

ES

Yes

RCM Decision Algorithm Based on Example 2 SAE JA1012

Yes Preventive Restoration Task

EE

Is there an intolerable risk that the failure could kill or injure someone?

No

EO

Is there an intolerable risk that the failure could breach a known environmental standard or regulation?

Yes

ES1

Yes Predictive Task

Is a Preventive Restoration task technically feasible and effective?

ES2

Yes No

ES3 No

EE3

Preventive Yes Replacement Task

No

Is a Preventive Restoration task technically feasible and effective?

EN2

Yes Preventive Restoration Task

No

Is a Preventive Replacement task technically feasible and effective?

EO3 No

No

Yes Predictive Task

EN1

EO2

Preventive Restoration task

Is a Preventive Replacement task technically feasible and effective?

Yes

Yes Is a Predictive task technically feasible and effective?

EO1 No

EE1

EE2

Does the failure have a direct adverse effect on operational capability?

Yes

Is a Predictive task technically feasible and effective?

EN

No

EN3

Preventive Replacement Task

Yes No

Is a Detective task to detect the failure technically feasible and effective? Yes

HS4 No

No

HE4

Detective Task

No

Yes

HO5 HN5

Yes Predictive Task

Is a Preventive Restoration task technically feasible and effective?

HS3 No

Yes

HE1

HS2

Preventive Restoration Task

Is a Preventive Replacement task technically feasible and effective?

HO3

No

Is there an intolerable risk that the failure could kill or injure someone?

Is a Predictive task technically feasible and effective?

HS1 No

HN1

HS No

Yes

Is a Predictive On-Condition task technically feasible and effective?

HO1

Is there an intolerable risk that the failure could breach a known environmental standard or regulation?

ES4 Run-to-Fail ?

EE4

Yes Combination of tasks

Is a combination of tasks technically feasible and effective?

Yes

EO4 EN4

Run-to -Fail

Run-to-Fail ?

No No

HO6 HN6

Redesign may be desirable

No

HS5

Redesign is compulsory

HE5

ES5 EE5

Redesign is compulsory

EO5

Redesign may be desirable

EN5

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 21

Reliability-Centered Maintenance

Failure Management Strategies

The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)

2.

In what ways can it fail to fulfil its functions? (Functional Failures)

3.

What causes each functional failure? (Failure Modes)

4.

What happens when each failure occurs? (Failure Effects)

5.

In what way does each failure matter? (Failure Consequences)

6.

What should be done to predict or prevent each failure?

(Proactive Tasks and Task Intervals) 7.

Each task must be applicable and effective

What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 22

Reliability-Centered Maintenance

What types of maintenance are there? RCM Term

Alternative Term

What it is…

Abbreviations

Predictive Maintenance

• On-Condition Maintenance • Condition Based Maintenance (CBM) • Condition Monitoring (CM) • Inspections

Check an item for signs of potential failures and leave it in place on the condition that it will make it to it’s next inspection interval.

PTIVE

Preventive Restoration

• Overhaul • Scheduled Restoration • Restorative tasks • Rework

A task to restore an assets original resistance to failure prior to its failure, this is a preventive task

PRES

Preventive Replacement

Replacement Overhauls (Also)

A task to replace an asset prior to its failure, this is a preventive task

PREP

Detective Maintenance

Failure finding Function testing

A task to detect whether an item has failed or not.

DTIVE

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 23

Reliability-Centered Maintenance

Default Actions The The Seven Seven Questions Questions of of RCM RCM (SAE (SAE JA1011 JA1011 5a. 5a. -5g. -5g. 2002 2002 ))

1.

What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)

2.

In what ways can it fail to fulfil its functions? (Functional Failures)

3.

What causes each functional failure? (Failure Modes)

4.

What happens when each failure occurs? (Failure Effects)

5.

In what way does each failure matter? (Failure Consequences)

6.

What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

Determine the actions to be taken if routine maintenance cannot performed

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 24

Reliability-Centered Maintenance

RCM-DO-02 Preparing for Analysis What asset or system…? • Before we know what a system or sub-system can do…we need to know exactly what the system contains… If we go too high.. We risk de-motivating the team and creating superfluous analyses… RCM is best performed at a system level. However, it can be performed at an equipment level in special circumstances.

Plant Plant

Process 1 Process 1

If we go too low.. We risk paralysis by analysis…

Process 2 Process 2

Electrical System Electrical System

Process 3 Process 3

Mechanical Assets Mechanical Assets

Process 4 Process 4

Instrumentation Instrumentation

Fixed Equipment Fixed Equipment

Centrifugal Pump AC 3 phase motor Hydraulic Motor Chain Conveyor Rotary Valves

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 25

Reliability-Centered Maintenance

RCM-DO-03 Functions and Functional Failures The Seven Questions of RCM (SAE (SAE JA1011 5a. 5a. -5g. 2002 )) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context?

(Functions)

We will cover… • Operating context • Types/Categories of Functions • How to write a function statement

2.

In what ways can it fail to fulfil its functions? (Functional Failures)

3.

What causes each functional failure? (Failure Modes)

4.

What happens when each failure occurs? (Failure Effects)

5.

In what way does each failure matter? (Failure Consequences)

6.

What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

SAE SAEJA1011 JA10115.1.2 5.1.2All AllFunctions Functionsof of the asset/system shall be identified the asset/system shall be identified (all (allprimary primaryand andsecondary secondaryfunctions functions including the functions of including the functions ofall all protective protectivedevices)” devices)”

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 26

Reliability-Centered Maintenance

Operating Context 1. Duty Cycles… 2. Weather and the immediate Environmental…

Our car is a Ford Focus. Great car…we maintain it to the manufacturers specifications…

3. Applicable regulations and laws… 4. Asset Configuration… 5. Remoteness…

…but they don’t!

Why?

6. How it is managed… 7. Public perceptions… 8. Budget restraints… 9. Skills available… 10. Any other factor that determine how we use the asset (s) or system

The TheOperating OperatingContext Contextofofany anyasset assettells tells you how that asset is operated. you how that asset is operated. This Thiswill willinfluence influencehow howwe wemaintain maintainit.it. ItItdoesn’t doesn’ttell tellyou youwhat whatthe theasset assetcan cando, do,or or want it to do…. what we what we want it to do….

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 27

Reliability-Centered Maintenance

Writing Functions

Writing Functions SAE JA1011, 5.1.3 - All functions shall contain a verb, an object and a performance standard (quantified in every case where this is done)

Pump can deliver up to 1000 l/minute

Off take from Tank 800 l/ minute

Y X

We Weaccept acceptthat that“times “timesarrow” arrow”means meansthat thatassets assets will deteriorate. will deteriorate. Performance Performancestandards, standards,tell tellus usthe theminimum minimum level of performance acceptable to the level of performance acceptable to theusers usersor or owners of the asset. owners of the asset.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 28

Reliability-Centered Maintenance

Performance Standards (What it can do)

4. Total

Margin for Deterioration

1. Between Limits 2. Specific

Performance

What its users want it to do 3. Varying – Up To 6. Open

One or more criteria for performance 5. Multiple Up to 800 l/minute

At 100 bar # SAE JA1012 Figure 2 # SAE JA1012 Section 6.2 Performance Standards

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 29

Reliability-Centered Maintenance

These are NOT Functions! (Why?) • • • •

To be safe… To be reliable… To comply with environmental standards… To comply with IE2314356XXX (etc)…

Performance Performance standards standards need need to to be be quantified quantified where possible to avoid ambiguity. where possible to avoid ambiguity. E.g. E.g. What What is is reliable, reliable, and and who who says says so? so? © Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 30

Reliability-Centered Maintenance

Exercises

Primary Function Statements The reason why an asset is purchased in the first place. SAE JA1011, 5.1.3 - All functions shall contain a verb, an object and a performance standard (quantified in every case where this is done)

• A light fitting in an office… • An office chair… • A projector used in presentations… • A pushbike for you to ride to work on…

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 31

Reliability-Centered Maintenance

Secondary Functions Secondary Functions (SAE JA1012 6.2.2)

Secondary functions are all the other requirements we have of the asset (s) that are not covered by the primary function. Environmental Integrity Safety / Structural Integrity Control / Containment / Comfort Appearance Protective Devices and Systems Economy and Efficiency Superfluous The primary Function of an office chair was given as “To support a person weighing up to 150 kilograms in a seated position” What are the secondary functions? © Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 32

Reliability-Centered Maintenance

RCM-DO-03b Air Conditioner An office is located in an extremely hot environment where the average annual temperature ranges between 30oC (86oF) and 41oC (105oF). They have installed an air-conditioning system that will, at maximum output, maintain a temperature differential of 20oC (+/-0.56oC) between the outside ambient air and the inside office air. It will also dehumidify the air to a level of 45% (+/4%). The office is approximately 914m2 (3000ft2); the air conditioning unit will provide six BTU (British Thermal Units) of cooling. Operational Description The system is very simple and consists of a reciprocating piston compressor, a condenser, a thermal expansion valve, and an evaporator. A three-phase electric squirrel cage motor drives the compressor via four parallel v-belts. A guard is in place to stop people touching the belts while they are in use. Setting air conditioning temperatures can be very individual and is almost never without complaints. Over the years the company has determined that a temperature in the range of 19oC (~66oF) and 23oC (~73oF) is the most comfortable to work at, and causes the least amount of arguments. The thermostat is set to 21oC (~70oF), and they would like it to not exceed 23oC, or to not go below 19oC. The compressor is oil lubricated, and compresses a standard refrigerant gas, which is a known greenhouse gas. Any release of the refrigerant breaches a number of environmental regulations. It takes low-pressure superheated gas from the evaporator, compresses it to high-pressure superheated gas, and pushes it through the condenser. A draft over the condenser coils by comes from a three phase electric fan, which removes the heat and changes the high-pressure vapor to a high pressure liquid. When the condenser is working well there is a temperature differential of 3.1oC (10oF) across the condenser. De-superheated high pressure liquid leaves the condenser in the liquid line to the thermal expansion valve (TX valve). The TX valve regulates the flow of high-pressure liquid refrigerant into the evaporator coil. It is designed to open just enough to let refrigerant flow while maintaining a high pressure differential from its inlet to its outlet. The pressure at the exit of the expansion valve is low enough that it initiates a phase change in the liquid refrigerant to a vapor.

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 33

Reliability-Centered Maintenance

A three phase motor forces draft air over the evaporator coils and superheats the vapor. This creates the cooling effect. Both the evaporator fan and the condenser fan have lightweight steel cowls to stop foreign objects from damaging the fan blades. The refrigerant then leaves the evaporator as a superheated gas and reinitiates the process again with the compressor. Any failure of the evaporator means that there is a possibility of liquid entering the compressor, destroying the internal components. When the evaporator is working well there is a temperature differential of 3.1oC (10oF) across its coils. The electric motor drives of the compressor and the evaporator have thermal overloads that will trip the circuit if the full load current (FLA) reaches 125%, the condenser fan has protection of 115% of FLA. The company has local research reports that show that bacteria, viruses and fungi tend to thrive in that part of the world when the humidity is greater than 47%. Similar “wellness” reports have shown that workers in an office environment are most comfortable between 30% and 44%. If the humidity is too low workers offer suffer from dry eyes, increased static and it feels colder than it is. Too high and workers feel very uncomfortable and feel hotter than it is. The air conditioner typically needs to run for 8-10 minutes before the dehumidification process can commence. At its present design capacity, it will run for 100% of the time in summer, and 40-50% of the time during other seasons in this climate. However, if the thermostat fails, and stops the compressor at temperatures above its set point, then this will cause short run times, and will not allow the unit to dehumidify the air in the office space. The company using this unit has other similar systems installed in other offices and finds them to be reliable and economical to install and to run. However, discussions with the manufacturer and a study of the history of similar systems have produced the following list of common failures. a) Condenser fins flattened, preventing forced airflow over the condenser coils. (Installation errors) b) Evaporator fins flattened, preventing forced airflow over the evaporator coils. (Installation errors) c) Clogging of the TX valve, causing a total failure of the system (Normally occurs every 2 years) d) Wear out of the valves within the compressor. (Normally once every 5 years) e) Failure of the thermostat, meaning it will not trip at all (once every 4 years), or it will trip at temperatures greater than the set point. (once every 6 years)

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 34

Reliability-Centered Maintenance

While these are common failure modes, they do not include all of the likely failure modes. For example, the drive motors for the compressor, the condenser, and the evaporator are all standard threephase squirrel cage electric motors and suffer from the failure modes that generally occur in these types of motors.

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 35

Reliability-Centered Maintenance

Functional Failures

The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)

2.

In what ways can it fail to fulfil its functions?

(Functional Failures) 3.

What causes each functional failure?

(Failure Modes) 4.

What happens when each failure occurs?

(Failure Effects) 5.

In what way does each failure matter? (Failure Consequences)

6.

What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 36

Reliability-Centered Maintenance

Failed States

Failed States Functional Failures indicate failed states – “How” it is unable to do what we want it to. •

We need to define all of the Failed States for every function. – Failed states are derived directly from the function statements and their performance standards – Generally cover too much, too little (partial) and not at all…(total)



To pump water from tank A to tank B at up to 800 l/minute (Varying) – Unable to pump at all – Pumps at more than 800 l/minute (?)



To pump water from tank A to tank B at between 800 l/minute and 1000 l/minute (Multiple) – Unable to pump at all – Pumps at less than 800 l/minute – Pumps at more than 800 l/minute

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 37

Reliability-Centered Maintenance

Exercise The primary function of a grinding machine may be listed as: “To grind bearing journals in a cycle time of 3.00 minutes ± 3 seconds, to a diameter of 75 mm ± 0.1 mm, with a surface finish of no greater than Ra 0.2.” 0.05

75 mm

0.05

0.05

0.05

3,06 3.03 3 minutes 2.57 2.54 # SAE JA1012 Section 7.2

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 38

Reliability-Centered Maintenance

Exercises

To start pumping water from tank A to tank B at a volume of 800l/minute, at a pressure of 100 bar, when the water level is at the low level switch and to stop when it reaches the high level switch

High Level

Low Level

100 bar 800l/minute

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 39

Reliability-Centered Maintenance

RCM-DO-04 Failure Modes and Effects The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)

2.

In what ways can it fail to fulfil its functions? (Functional Failures)

3.

What causes each functional failure?

(Failure Modes) 4.

What happens when each failure occurs?

(Failure Effects) 5.

In what way does each failure matter? (Failure Consequences)

6.

What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 40

Reliability-Centered Maintenance

Reasonably Likely

Reasonably Likely Pump struck by lightning

North West Australia – reasonably likely The Atacama Desert in Chile – Highly unlikely

Pump Stolen

Mexico – Reasonably Likely The USA - Unlikely

Supply cable insulation deteriorated due to sun exposure

Saudi Arabia – Reasonably Likely The UK – Not likely

Levels Levels of of reasonableness reasonableness determined determinedby bythe the analysis group….. analysis group….. IfIf no no agreement agreementisispossible possible then thenthe theorganization organization that owns the assets must make a decision that owns the assets must make a decision © Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 41

Reliability-Centered Maintenance

Causality Level 1? Unable to pump water at all 1. Motor Fails 2. Pump Fails 3. Pipes Fail 4. Inlet to tank B blocked 5. Outlet from tank A blocked

… or Level 3? Unable to pump water at all 1. Drive end bearing fails due to ingress of water 2. Drive end bearing fails due to lack of adequate grease 3. Drive end bearing fails due to misalignment

… or Level 2? Unable to pump water at all 1. Motor Fails due to stator earth fault 2. Motor fails due to short between the coils 3. Motor fails due fan end bearing failure 4. Motor fails due to drive end bearing failure 5. Motor fails due to overheating 6. Motor fails due to loose connections

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 42

Reliability-Centered Maintenance

How far is far enough? Level 1

Motor stops

Level 2

Due to failed drive end bearing

Level 3

Level 4

Due to lack of grease

Due to inadequate training of the lubrication technician

Due to the wrong grease

Due to improper purchasing controls

Level 5

Due to lack of communication between maintenance and purchasing

Level 6

Level 7

Due to former differences between department managers

Due to inadequate training of the lubrication technician Due to misalignment during installation

Due to poor installation procedures

Due to incorrect procedure writing procedures

Due to inadequate tools

Due to poor purchasing controls

Due to lack of communicatio ns between maintenance and purchasing

Due to former differences between department managers

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 43

Reliability-Centered Maintenance

Writing a Failure Mode

Writing a failure mode • Failure modes are the reasons why something is in a failed state. • When defining failure modes first we need to understand how it has failed (the functional failure) then we determine why it has failed. • Avoid verbs like, breaks, fails, malfunctions • Use the “due to” convention and at least a noun and a verb (Not a rule – a guide) • Only one cause per failure mode Normally written something like this… Functions

Functional Failures

Failure Modes

To pump water from tank A to tank B at 800 l/minute

Unable to pump water from tank A to tank B

Drive end motor bearing failed due to lack of grease Short in motor windings due to insulation degrades over time Drive end motor bearing seized due to misalignment on installation.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 44

Reliability-Centered Maintenance

Types of Failures What its users want it to do

What it can do

What it can do

What its users want it to do

What its users want it to do

What it can do

Wear and tear, degradation of the asset

Incorrect use, often deliberate, overloading

Not fit for purpose

Maintenance

Operations

Engineering / Purchasing

Who’s responsible for reliability? Reliability is a process… not a department! # SAE JA1012 Figure 2

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 45

Reliability-Centered Maintenance

The Problem with Data

Where does the data come from? “One of the most important contributions of the ReliabilityCentered Maintenance Program is its explicit recognition that certain types of information … are, in principle ,as well as in practice, unobtainable.” Mathematical Aspects of Reliability Centered Maintenance H.L. Resnikoff

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 46

Reliability-Centered Maintenance

Where does the data come from?

“This means … in practice and in principle, the policy must be designed without using experiential data which will arise from the failures the policy is meant to avoid.” Data 30%

Mathematical Aspects of Reliability Centered Maintenance H.L. Resnikoff

Knowledge 70% © Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 47

Reliability-Centered Maintenance

Effects

The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)

2.

In what ways can it fail to fulfil its functions? (Functional Failures)

3.

What causes each functional failure? (Failure Modes)

4.

What happens when each failure occurs?

(Failure Effects) 5.

In what way does each failure matter? (Failure Consequences)

6.

What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 48

Reliability-Centered Maintenance

Effects and Consequences •

Effects are the direct outcome of failure mode. (What happens)



The primary role of the effects statement is to inform us of the consequences (Why it matters) – – – –

When do we know about it, what evidence is there that it has failed? Safety implications Implications for Environmental standards and regulations Operational implications • Cost of repair • What is required to restore the function? • Time to repair (TTR)

– Any other implications such as reputation, news headlines, etcetera.



SAE JA1011, 5.4.1 “Failure effects shall describe what would happen if no specific task is done to anticipate prevent or detect the failure



They are the typical worst case scenario… not the extreme worst case scenario.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 49

Reliability-Centered Maintenance

RCM-DO-05 Consequences and Effectiveness The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)

2.

In what ways can it fail to fulfil its functions? (Functional Failures)

3.

What causes each functional failure? (Failure Modes)

4.

What happens when each failure occurs? (Failure Effects)

5.

In what way does each failure matter?

(Failure Consequences) 6.

What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 50

Reliability-Centered Maintenance

A Hierarchy of Consequences Operational?

Environment?

On-Condition Task?

Safety?

On-Condition Task?

Preventive Restoration or Preventive Replacement?

Preventive Restoration or Preventive Replacement?

Failure Finding Task?

Failure Finding Task?

Evident or Hidden?

Safety?

Environment?

On-Condition Task?

Preventive Restoration or Preventive Replacement?

Combination of Tasks? No scheduled maintenance

Operational?

On-Condition Task?

Preventive Restoration or Preventive Replacement?

No scheduled maintenance

Redesign is Compulsory Redesign may be desirable

Redesign may be desirable Redesign is Compulsory

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 51

Reliability-Centered Maintenance

Hidden or Evident?

To Process

To Process

Consequence: Explosion of the pressure vessel when under high pressure conditions

To Process

To Process

Multiple Failure Event: Dangerous build-up of gas pressure within the pressure vessel.

To Process

To Process

Hidden-Failure: Failure of pressure release valve on high pressure vessel in a gas plant

# The Maintenance Scorecard, Daryl Mather, Industrial Press, ISBN 0831131810

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 52

Reliability-Centered Maintenance

Hidden Failures HN

HO

Does the failure have a direct adverse effect on operational capability?

HE

HS

Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?

Is there an intolerable risk that the multiple failure could kill or injure someone?

ES Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?

Is there an intolerable risk that the failure could kill or injure someone?

EE Is there an intolerable risk that the failure could breach a known environmental standard or regulation?

EO

EN

Does the failure have a direct adverse effect on operational capability?



RCM begins by separating hidden and evident consequences



By themselves, hidden failures have no consequences, requiring an additional failure before they have any tangible impact



The ultimate consequences of failure are often severe



Can be separated into Safety, Environmental and Operational consequences



Generally devices that provide protection for safety, the environment of operations such as; high-high level switches, over-speed switches, standby equipment, etc.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 53

Reliability-Centered Maintenance

Safety Safety Consequences HN

HO

Does the failure have a direct adverse effect on operational capability?

HE Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?

HS Is there an intolerable risk that the multiple failure could kill or injure someone?

Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?

ES

EE

Is there an intolerable risk that the failure could kill or injure someone?

Is there an intolerable risk that the failure could breach a known environmental standard or regulation?

EO

EN

Does the failure have a direct adverse effect on operational capability?



Once the failure has been categorized as Hidden or Evident, the first consideration in evaluating any failure possibility is safety to life and limb.



Asks the team to determine whether there is an intolerable risk of death or injury



Will not default to run to failure under any circumstances, at all times there is a need to take some action

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 54

Reliability-Centered Maintenance

Environmental Environmental Environmental Consequences Consequences HN

HO

Does the failure have a direct adverse effect on operational capability?

HE

HS

Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?

Is there an intolerable risk that the multiple failure could kill or injure someone?

ES Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?

Is there an intolerable risk that the failure could kill or injure someone?

EE Is there an intolerable risk that the failure could breach a known environmental standard or regulation?

EO

EN

Does the failure have a direct adverse effect on operational capability?



Gained prominence through the 1980’s with the onset of global warming and increased environmental awareness.



Deal with an intolerable risk of breaking environmental standards, regulations or laws. (Internal or external)



Will not default to run to failure under any circumstances, at all times there is a need to take some action

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 55

Reliability-Centered Maintenance

Operational Operational Consequences HN

HO

Does the failure have a direct adverse effect on operational capability?

HE

HS

Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?

Is there an intolerable risk that the multiple failure could kill or injure someone?

ES Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?

Is there an intolerable risk that the failure could kill or injure someone?

EE

EO

Is there an intolerable risk that the failure could breach a known environmental standard or regulation?

EN

Does the failure have a direct adverse effect on operational capability?



Any failure consequence that has direct, or secondary, negative effect on the operations



Task selection is, in part, determined by cost effectiveness trade off calculations as opposed to levels of tolerable risk.



Includes “other” cost implications such as reputation, adverse newspaper coverage and other PR related issues.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 56

Reliability-Centered Maintenance

Repair Only

Non-operational consequences HN

HO

Does the failure have a direct adverse effect on operational capability?

HE Is there an intolerable risk that the multiple failure could breach a known environmental standard or regulation?

HS Is there an intolerable risk that the multiple failure could kill or injure someone?

Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?

ES

EE

Is there an intolerable risk that the failure could kill or injure someone?

Is there an intolerable risk that the failure could breach a known environmental standard or regulation?

EO

EN

Does the failure have a direct adverse effect on operational capability?

• Economic consequences only • Costs of repair and secondary damages

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 57

Reliability-Centered Maintenance

RCM-HO-05a Assigning Consequences HN

HO

HE

HS

Hidden or Evident?

ES

EE

EO

EN

a) A household circuit breaker continuously trips when there is no fault present. b) A hydraulic positioning unit moves a train into position under the feed hopper of a large ammonia plant. Since it installation, some ten years ago, the high pressures and extreme heat of the working environment it has caused numerous leaks, resulting in some downtime each time. (In addition, all potential fire risks). Underneath the positioning unit is a concrete bund, there to stop any hydraulic oil seeping into the ground below, breaching a number of environmental regulations. However, due to errors in the pouring of the concrete, it is allowing small quantities of hydraulic oil to pass through it every time there is a leak. What is the consequence of the failure of the concrete? c) Vibration sensors protect a forced draft fan. Their role is to protect the fan from high secondary damages stemming from unplanned bearing failure. Due to the critical nature of this asset, the company keeps a spare fan assembly. In case of any failure of the fan, the quickest way to restore the function is to replace the entire assembly. This particular fan does not have any safety consequences associated with bearing failure. They are set at 7mm/second and provide a warning light for operators so they can shut the fan down immediately. Due to a failure of the indicating bulb at the control panel, the alarm goes unnoticed when vibration reaches the alarm level. d) A wastewater plant has turbidity meters to measure the relative clarity of the effluent leaving the plant into the local river system. High percentages of microscopic particles will cause the effluent to be excessively “cloudy”, the turbidity meter then adjust the dosing earlier in the process to reduce the impact on the environment. Over time, the calibration of this meter has drifted, so much so that the effluent leaving the plant contains a high percentage of microscopic solids, breaching a number of environmental laws and regulations, as well as adversely affecting the wildlife in the area. e) Over time, the brake pads in a car wear down; meaning that the car will not stop when required. The result was an accident when attempting to stop at a red light.

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 58

Reliability-Centered Maintenance

f) A speed sensor protects a turbine from over-speed, preventing it from speeding up to destruction, sending debris in every direction. The sensor has failed in such a way that it will not trip the turbine on over speed. g) A standby motor to drive a pumping system has developed false brinnelling (flat spots) of the bearings. This means that that when it is called on to run it will run for a short while before tripping the motor on overload. If it runs continually in this fashion, it could also cause secondary damage to the motor shaft. h) When the level in a tank reaches the low level, the low-level switch starts a pump. Because of vibration in the surrounding area, one of the terminals comes loose and the switch will not work when it is required to. i) A pumping system has a duty and a standby pump. The stand by pump takes over the function if ever the duty pump should fail. Over time, the resistance of the insulation within the duty motor breaks down, and it suffers an earth fault. j) Due to a pinhole leak, the air pressure has gone out of the spare tire in your car. k) Each aircraft is equipped with life preserver jackets for passenger use in case of a water landing. One of these has developed a failure, preventing it from inflating when required. l) An electrically driven “pony” pump primes a lubrication system on start up, at a specified pressure the main pump takes over to run the system at operating pressure. This is an effort to minimize the energy usage of the plant, and the main pump could easily start up under full load with no consequence aside from increased energy usage. The pony pump has a failure of the mechanical seal and be unserviceable for a time. m) An air-conditioning system has had the condenser fins flattened out by vandalism; the result is that the airflow through the condenser is not sufficient to reduce the temperature prior to the refrigerant gas travelling to the evaporator. The result is that the system will not reduce room temperature below the 35oC ambient temperature. This affects the health of the people working in the room and results in two people suffering from heatstroke. n) The high-high level switch on a tank trips the pump when there is a high-high level. This then needs to a manual reset. At present, this switch has spurious trips that cause the pump to stop when there is no high level. o) A large-scale screening facility gets its supply from a conveyor running the length of the building some four stories above the ground. Along the side of the conveyor are walkways with handrails. One of the handrails has a crack in it that is not visible to the naked eye. However, if somebody were to use it, it would give way, leaving the person to fall four stories to their death. p) An IT data center houses all of servers containing the corporate IT information. The cooling system of a data center requires the rooms to be continuously at a temperature of between 20oC and 25oC, and a humidity range of between 40%-60%. A failure of the power supply could lead to outright server failure, or at the very least increase failure rates of electronic components. This would have a catastrophic effect on business

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 59

Reliability-Centered Maintenance

continuity. For these reasons, a diesel generator set is on permanent standby protecting the power supply to the coolers and humidifiers; an uninterruptible power supply or UPS further protects this. The diesel generator set has developed a failure in the starter circuit due to corroded battery terminals, meaning it will not be able to start when required. q) An operating company used a tank farm to store flammable liquid raw material. A pressure safety valve (PSV) set at the tank maximum allowable working pressure (MAWP) of 100 psig protected one of the tanks containing a highly reactive material. The previous PHA identified the plugging of the PSV inlet as a potential concern. The PSV’s annual inspection reports verified plugging, substantiating this concern. The PHA team recommended the installation of a rupture disc upstream of the PSV. A month later, an overpressure event (triggered by contamination) caused the tank pressure to reach 180 psig before the rupture disc blew and vented the tank contents. The ensuing Incident Investigation revealed that the rupture disc had developed a pinhole leak and the space between the rupture disc and PSV had pressurized to the normal tank pressure of 80 psig.

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 60

Reliability-Centered Maintenance

Applicable and Effective

Applicable

Applicable and Effective (Based on diagram 17 of the SAE JA1012)

Effective Then they need to determine whether the task will be worthwhile in terms of either cost or risk. (Based on the consequences)

Before selecting any failure management policy analysts first need to determine whether or not the task is actually possible!

Within WithinRCM RCMNO NOtask taskcan canbe beapplied appliedtoto any failure mode without any failure mode withoutfirst first establishing establishingthat thatititisisactually actuallypossible possible totodo dothe thetask, task,and andsecondly secondlywithout without ensuring ensuringthat thatititwill willadequately adequatelymanage manage the theconsequences. consequences.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 61

Reliability-Centered Maintenance

Tolerable levels of Risk What is risk, and how tolerable should it be…?

Ideal

Reality Risk is the likelihood of an unwanted event



People often forget to fear those things that rarely happen… particularly in the face of productivity challenges, market share opportunities and competitive necessities.

# Human Error, James Reason © Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 62

Reliability-Centered Maintenance

Example Tolerable Risk Levels

Government Set Tolerable Risk Criteria

UK

Hong Kong

The Netherlands

Australia

Individual risk minimum (Worker)

1x 10-5

Not Used

Not Used

Not Used

Individual risk minimum (Public)

1x 10-6

Not Used

1x 10-6

Not Used

Individual risk maximum (Worker)

1x 10-3

Not Used

Not Used

Not Used

Individual risk maximum (Public)

1x 10-4

1x 10-5

1x 10-6

1x 10-6

Survey of U. S. Corporate Tolerable Risk Criteria

High Range

Low Range

Minimum individual risk (Worker)

10-5

10-9

Maximum individual risk

10-3

10-6

individual SIF individual risk target

10-3

10-6

E.M. Marszal, Survey of process plant risk tolerance criteria and third party liability settlements, exida.com, Philadelphia, 2000.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 63

Reliability-Centered Maintenance

Hidden Failures

Hidden Failures

A hidden functional failure, on its own, will not become evident to the operators under normal operating circumstances

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 64

Reliability-Centered Maintenance

The five main categories for hidden failures…

• The majority of hidden failures occur on protective devices, these are devices that: • Warn of abnormal conditions • Shutdown equipment in case of a failure • Eliminate or alleviate abnormal conditions caused by failure • Take over from a function that has failed • Prevent dangerous situations from arising

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 65

Reliability-Centered Maintenance

Most protective devices can fail in two ways…

• By acting when they are not needed… • By ceasing to provide protection….

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 66

Reliability-Centered Maintenance

The Famous Pump Example If the ultimate high level switch fails closed, it is evident… If the ultimate high level switch fails open, then nobody knows it has failed…

High level shuts off pump until low level turns it back on again

Low low level switch turns off the pump until manually reset

1000 l/m

Ultimate high level switch. (normally open) Shuts the pump off until manually reset. Low level switch turns on the pump until the level reaches the high level switch

800 l/m

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 67

1

Function

To pump between tank A and tank B at up to 800 l/m

A

Item

Item

Item

Reliability-Centered Maintenance

Function Failure

Unable to pump between tank A and tank B at up to 800 l/m

2

To stop the pump on ultimate high level

A

Function Failure

Unable to stop the pump on ultimate high level

Item

Function

Item

Item

Evident

1

Failure Modes and Effects

1

Pump blocked

2

Pipes blocked

3

Ultimate high level fails closed

Failure Modes and Effects

Ultimate high level fails open

Hidden

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 68

Reliability-Centered Maintenance

Slide 18

The probability that the protected function will fail in any one cycle is given by its failure rate One year Protected Function Protective Device

B Fails C Fails

If the failure rate is once in four years, then the probability that it will fail in one year is 1 in 4. (This corresponds to a mean time between failure of four year)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 69

Reliability-Centered Maintenance

Slide 19

The probability that the protective device will be in a failed state at any point in time is given by its downtime (if it conforms to a random failure pattern) One year Protected Function Protective Device

B Fails C Fails

If the downtime is 33% then the probability that is will be in a failed state at any point in time is 1 in 3. (This corresponds to an availability) © Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 70

Reliability-Centered Maintenance

Slide 20

The Probability of a Multiple Failure One year Protected Function B Protective Device C

Mean Time Between Failures = 4 years

Availability = 67%

Downtime = 33%

The probability that B will fail while C is in a failed state: 1 in 4 x 1 in 3 = 1 in 12 (In other words there is a one in twelve chance that the multiple failure will occur in any one year)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 71

Reliability-Centered Maintenance

When developing a failure management policy for a hidden function, the first stage is to decide what probability we are prepared to tolerate for a multiple failure…

One year Protected Function B Protective Device C

Mean Time Between Failures = 4 years

Availability = 67%

Downtime = 33%

The probability that B will fail while C is in a failed state: 1 in 4 x 1 in 3 = 1 in 12 Prepared to accept 1 in 1000

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 72

Reliability-Centered Maintenance

Reduce the probability of failure of the protected function (by applying a suitable failure management policy)

One year Protected Function B Protective Device C

Mean Time Between Failures = 10 years

X Availability = 99%

Unavailability = 1%

And/or by increasing the availability of the protective device: - by preventing the failure of the protective device, or - by periodically checking whether the protective device is still working and repairing it if it has failed - by modifying the system in some way The probability that B will fail while C is in a failed state is now 1in 10 x 1 in 100 = 1 in 1000 __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 73

Reliability-Centered Maintenance

• 6 identical PSV’s have each been checked once a year for 5 years (FFI = 1 year) • So the devices have been in service a total of 30 years

To Process

From Process

• In that time 3 were found to be in a failed state • So the MTBF of the devices (MTBFdevice) is 30 years / 3 failures = 10 years Year 1

• We know that the failed devices failed some time during the year before the checks – but not when… • It seems reasonable to assume that each failed device was down for an average of 6 months

Year 2

Year 3

Year 4

Year 5

1 2 3 4 5 6

1.

So the total downtime (DTdevice) was 1.5 years out of 30 or 5%

2.

So on the basis of these figures it appears that: FFI = 2 x DT device x MTBF device

3.

This is generally true if DT device <5% and MTBF device is random

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 74

Reliability-Centered Maintenance

Slide 24

Detective Task Frequency and Availability First Inspection

Second Inspection

Time 1 Year

2 Years

3 Years

4 Years

Maximum potential unavailability time = 2 years

First Inspection

Second Inspection

Third Inspection

Time 1 Year

2 Years

3 Years

4 Years

Maximum potential unavailability time = 1 year

Risk management of Hidden failures involves the management of unavailability to within levels accepted by the company…. © Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 75

Reliability-Centered Maintenance

One year Protected Function B Protective Device C

Mean Time Between Failures = 10 years

Fails Failed

1 in 10 x 1 in 100 = 1 in 1000

Step Two Determine / estimate how often the protected function is likely to need to protective device

Step Three Calculate what unavailability of the protective device enables us to achieve 1 given 2

if DTdevice = Unavailability of the protective device MTBFfunction = Failure rate of the protected function MTBFmultiple = Failure rate of the multiple failure

Step One Decide what probability we tolerate for the multiple failure

then (1/MTBFfunction) x DTdevice = 1/MTBFmultiple

or DTdevice = MTBFfunction / MTBFmultiple

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 76

Reliability-Centered Maintenance

We have seen that… Where: Where:

••FFI FFI ==failure failurefinding findingtask task interval interval = Unavailability of the ••Dt Dtdevice device = Unavailability of the protective protectivedevice device = MTBF of the ••MTBF MTBFdevice device = MTBF of the protective device protective device ••MTBF = MTBF of the MTBFfunction function = MTBF of the protected function protected function = MTBF of the ••MTBF MTBFmultiple multiple = MTBF of the multiple multiplefailure failure

…1

FFI = 2 x DT device x MTBF device

…and that… DTdevice = MTBFfunction / MTBFmultiple

…2

Therefore.. by substituting 2 into 1 gives…

FFI =

2 x MTBFfunction x MTBFdevice MTBFmultiple

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved. Document: RCM Fundamentals Training.doc

Page 77

Reliability-Centered Maintenance

Exercise 1 A small chemical plant has an eye bath to enable people to wash their eyes if dangerous chemicals contaminate them. When asked what checks have been done on the eye bath in the past, the maintenance department said “that’s the production departments job”. However, production thought the safety officer was doing it, who in turn thought it was “looked after by the preventive maintenance system”. As a result, it appears that the eye bath has never been checked, at least on a routine basis. The eye bath has been in place for eight years. A quick check now reveals that the eye bath is actually in working order, so the only data we have about the reliability of this bath is that it has not failed in eight years. Further investigation reveals that someone needed to use it in an emergency on two occasions since it was installed. The plant manager has asked you to set up a checking routine for this eye bath as a matter of urgency. How often should the check be done? The safety committee decided that they do not want the eye bath to be inoperable when it is needed more than once in 1,000,000 years. A series of phone calls to other companies reveals 60 eyebaths that have been installed for a total of 720 years between them. 2 of these have been found to be in a failed state in that period.

FFI =

2 x MTBFdevice x MTBFfunction MTBFmultiple

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 78

Reliability-Centered Maintenance

Exercise 2 A tank is used to store diesel and is enclosed in a concrete bund. This is intended to prevent anything which might escape from the tank seeping into the ground and breaching a variety of environmental regulations. The review group decides that they would not like this to happen more than once every 10,000 years. A review of a number of similar systems and discussions with users suggest that a significant quantity of liquid is likely to escape into the bund no more than once every 150 years on average, usually due to leaks in pipeline flanges or seals. The integrity of the bund itself has never been checked until now, but it can be done in a number of ways. One is to fill the bund with water to a depth of (say) 100 millimeters, and check whether the water level drops by more than the rate of evaporation over a period of (say) two days. Such a check is carried out on the bund, and reveals that it is still intact. So in the absence of any hard data at all, and after considerable discussion, the group decides that in any one year the chance of the bund springing an invisible leak (due to subsidence, latent construction, defects or whatever) is “1 in 100”.

FFI =

2 x MTBFdevice x MTBFfunction MTBFmultiple

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 79

Reliability-Centered Maintenance

Exercise 3 A steel producing plant has a need for many producthandling assets to move around the raw iron ore prior to processing. As part of this asset base, they have 10 large conveyors. Each of these has 4 e-stops, one on either side of the head end, and one on either side of the tail end of the conveyor. The management has tasked the maintenance team with determining a frequency for testing the function of each of these e-stops to make sure that when we need them to work they will work. After some discussion, they consulted relevant specifications and determined that they wanted these estops to meet their SIL-2 classification. For this company that means a likelihood of 1:100,000 (105) that any one would have a failure in any one year. They found that on their own plant they had never experienced a failure of one of the emergency stops. However, on consulting a commercial data store they found the following information: •

A population was tested over a time period of 106 hours



During this time the item was found to have failed 8 times in an undetected and unsafe manner,



And 60 times in a detected safe fashion

They were installed all of the conveyors at roughly the same time 20 years ago. After conversations with a few of the longer serving people they were able to ascertain that they had required to use an estop, either to protect people or to protect life, approximately 15 times. What frequency will they need to do for the detective task to maintain the level of risk that the company has deemed as tolerable?

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 80

Reliability-Centered Maintenance

This is the multiple failure. The result of a protected function failing while the protective device is in a failed state…

It needs a failure of the function before the consequences of a hidden failure are realised!

Function

Device Failure of the device has no consequences by itself….. Therefore… for a detective maintenance task to be technically feasible we need to:

1.

Ensure that the task will not increase the probability of a multiple failure

2.

Determine whether it is practical to do the task in the desired intervals

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 81

Reliability-Centered Maintenance

Case Study - BP refinery Incident



Indicator not designed to measure above 10 feet (Not fit for purpose)



High/High level alarm did not work (Hidden Failure)



Indicator not accurate, filled to 13 feet when indicating 10



Valve for liquid flow left closed (Human Error)



Reached 138 feet, indicator told operators 10 feet and falling (Hidden Failure)



Pressure controlling valve didn’t work (Hidden Failure)



High level alarm on the blow down drum did not work, the last line of defense… (Hidden Failure)

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 82

Reliability-Centered Maintenance

Managing Safety and Environmental Consequences

Effectiveness for Safety and Environmental Consequences PTIVE

PRES

PREP

Hidden Safety or Environmental Consequences

The failure management policy must reduce the risk of the multiple failure to a tolerable level

Evident Safety or Environmental Consequences

The failure management policy must reduce the risk of failure to a tolerable level

Combination

DTIVE If a suitable proactive task cannot be found then the first option is to seek a failurefinding task that reduces the probability of multiple failure to a tolerable level

Redesign Redesign is Compulsory

Is a combination of tasks technically feasible and worth doing?

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 83

Reliability-Centered Maintenance

Economic Consequences The Economic Consequences of Failure

• A failure has economic consequences if it has a direct adverse effect on operational capability – – – – – –

Reduced output Product quality considerations Poor customer service Increased operating costs Costs in terms of legal or regulatory charges Reputation costs

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 84

Reliability-Centered Maintenance

The Economic Consequences of Failure

• Two issues need to be considered when assessing the consequences of failure – How much the failure costs each time it occurs – How often would it occur if no attempt was made to prevent it

• As a result the consequences should always be evaluated over a reasonable period of time

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 85

Reliability-Centered Maintenance

Effectiveness of Economic Consequences PTIVE

PRES

PREP

Hidden Economic Consequences

Over a period of time, the failure management policy must reduce the probability of a multiple failure (and associated total costs) to an acceptable minimum

Evident Economic consequences

Over a period of time, the failure management policy must cost less than the cost of the operational consequences (if any) plus the total cost of repair

Run to Failure

If there is no effective routine task then the initial default is Run-to-Failure

Redesign

If Run-to-Failure is not an option due to frequency of failure or other implications, then redesign may be desirable.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 86

Reliability-Centered Maintenance

RCM-DO-06 Applicability and Task Selection The Seven Questions of RCM (SAE JA1011 5a. -5g. 2002 ) 1.

What are the functions and associated desired standards of performance of the asset in its present operating context? (Functions)

2.

In what ways can it fail to fulfil its functions? (Functional Failures)

3.

What causes each functional failure? (Failure Modes)

4.

What happens when each failure occurs? (Failure Effects)

5.

In what way does each failure matter? (Failure Consequences)

6.

What should be done to predict or prevent each failure?

(Proactive Tasks and Task Intervals) 7.

What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 87

Reliability-Centered Maintenance

Hidden HN

HO

HE No

Does the failure have a direct adverse effect on operational capability?

No

Is there an intolerable risk that the failure could breach a known environmental standard or regulation?

Yes

Yes No

Is a Preventive Restoration task technically feasible and effective?

HO2 HN2

Yes No

Yes No

Is a Preventive Restoration task technically feasible and effective?

HE2

Will the loss of function caused by this failure mode on its own become evident to the operating crew under normal circumstances?

Based on Example 2 SAE JA1012

Yes Preventive Restoration Task

Technical Feasibility Criteria

Evident ES

Yes

RCM Decision Algorithm

Predictive Task

HE1

HS2

Preventive Restoration Task

Is a Preventive Replacement task technically feasible and effective?

Yes

Is a Predictive task technically feasible and effective?

HS1

Predictive Task

HN1

No

Is there an intolerable risk that the failure could kill or injure someone?

Yes

Is a Predictive On-Condition task technically feasible and effective?

HO1

HS No

No

No

EO

Is there an intolerable risk that the failure could breach a known environmental standard or regulation?

Yes

ES1

Yes No

Is a Preventive Restoration task technically feasible and effective?

ES2

Does the failure have a direct adverse effect on operational capability? Yes

Yes No

Yes Predictive Task No

EN1

Is a Preventive Restoration task technically feasible and effective?

EO2

Preventive Restoration task

Is a Preventive Replacement task technically feasible and effective?

No

Is a Predictive task technically feasible and effective?

EO1

Predictive Task

EE1

Applicability Criteria

EN

No

Yes

Is a Predictive task technically feasible and effective?

EE2

Is a Preventive Replacement task technically feasible and effective?

EE

Is there an intolerable risk that the failure could kill or injure someone?

EN2

Preventive Restoration Tasks

No

Is there an age at which there is an increase in the conditional probability of failure? (Life?) What is this age? Do enough items survive to this age to satisfy the effectiveness criteria? Will the task restore the original resistance to failure? When there are safety or environmental consequences, all items need to survive to this age.

No

Is there an age at which there is an increase in the conditional probability of failure? (Life?) What is this age? Do enough items survive to this age to satisfy the effectiveness criteria? When there are safety or environmental consequences, all items need to survive to this age.

Yes Preventive Restoration Task

Predictive Tasks Is there a clear potential failure condition? What is it? What is the P-F interval? Is the interval long enough for action to be taken to avoid or minimise the consequences of failure? Is the P-F interval reasonably consistent? Is it practical to do the task at intervals less than the P-F interval?

Is a Preventive Replacement task technically feasible and effective?

HN3

Yes Preventive Replacement Task

HS3 No

Is a Detective task to detect the failure technically feasible and effective? Yes

HO4 HN4

Detective Task

Preventive Replacement Task

EE3

No

Run-to -Fail

Preventive Yes Replacement Task

EO3 No

EN3

Preventive Replacement Task

Yes

Is a Detective task to detect the failure technically feasible and effective?

HE4

Detective Tasks

Yes

HS4 No

ES3

Yes

Detective Task

Is it possible to check the item has failed without significantly increasing the risk of a multiple failure? Is it practical to do the task at the required interval.

No

Yes

HO5 HN5

HE3

ES4 Run-to-Fail ?

EE4

Yes Combination of tasks

Is a combination of tasks technically feasible and effective?

Yes

EO4 EN4

Run-to -Fail

Run-to-Fail ?

No No

HO6

Redesign may be desirable

HN6

No

HS5

ES5

Redesign is compulsory

HE5

Redesign is compulsory

EE5

EO5

Redesign may be desirable

EN5

Run-to-Fail or a Combination of Tasks For Hidden Safety & Environmental consequences if no Failure Finding Task is feasible then re-design is compulsory. For Evident Safety & Environmental consequences if no combination of tasks is feasible then re-design is compulsory. For Operational & Non-Operational consequences re-design may be desirable rather than Run-to-Fail if the economic consequences justify this.

Hidden Economic Consequence To be effective:-

Hidden Safety and Environmental Consequence To be effective:-

Evident Safety and Environmental Consequence To be effective:-

Evident Economic Consequences To be effective:-

Over a period of time, the failure management policy reduce the risk of a multiple failure (and associate total costs) to at an acceptable minimum.

The failure management policy must reduce the risk of the failure to a tolerable level.

The failure management policy must reduce the risk of the failure to a tolerable level.

Over a period of time, the failure management policy must cost less than the cost of the operational consequences (if any) plus the total cost of repair.

Effectiveness Criteria

Applicable

Preventive Replacement Tasks

HO3

Before selecting any failure management policy analysts first need to determine whether or not the task is actually possible!

Effective Then they need to determine whether the task will be worthwhile in terms of either cost or risk. (Based on the consequences)

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 88

Reliability-Centered Maintenance

Technically Feasible

A routine task is applicable if it is physically possible for the task to reduce the consequences of the failure mode to an acceptable level.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 89

Reliability-Centered Maintenance

Types of Maintenance RCM Term

Colloquial Term

What it is…

Abbreviations

Predictive Maintenance

• On-Condition Maintenance • Condition Based Maintenance (CBM) • Condition Monitoring (CM) • Inspections

Check an item for signs of potential failures and leave it in place on the condition that it will make it to it’s next inspection interval. (Planned)

PTIVE

Preventive Restoration

• Overhaul • Scheduled Restoration • Rework

A task to restore an assets original resistance to failure prior to its failure, this is a preventive task (Planned)

PRES

Preventive Replacement

Replacement Overhauls (Also)

A task to replace an asset prior to its failure, this is a preventive task (Planned)

PREP

Detective Maintenance

Failure finding Function testing

A task to detect whether an item has failed or not. (Planned)

DTIVE

Corrective Maintenance

Corrective Run to failure (RTF)

A task to correct failing or failed assets. (Planned)

CTIVE

Reactive Maintenance

Breakdown Shutdown

A task to restore the function of an asset that is failing or has failed (Unplanned)

Reactive

RCM will always direct maintainers to choose a maintenance or operational activity over a redesign as it is almost always the most cost effective means of managing failure.

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 90

Reliability-Centered Maintenance

Preventive Maintenance (PM’s) Preventive maintenance tasks (PM’s) are routine actions that are taken to prevent failures. Withinreliability-centred reliability-centred Within maintenancethere thereare aretwo two maintenance types of preventive tasks. types of preventive tasks.

• Preventive Restoration – tasks to restore an items original resistance to failure. • Preventive Replacement – tasks to replace an asset.

Characteristicsof ofPreventive Preventivetasks…. tasks…. Characteristics thetask taskisisaa The Themajority majorityofofitems itemsmust must There Theremust mustbe bean anage agewhere wherethe the IfIfthe restoration task survive until this point (only conditional probability of failure will restoration task survive until this point (only conditional probability of failure will thenititneeds needsto to aafew few“random” “random”failures) failures) increasedramatically dramatically(a (aLife) Life) then increase restore the restore the items original items original resistance to resistance to failure… failure…

Life

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 91

Reliability-Centered Maintenance

Preventive maintenance tasks (PM’s) are routine actions that are taken to prevent failures. •Withinreliability-centred reliability-centred •Within maintenancethere thereare aretwo two maintenance types of preventive tasks. types of preventive tasks.

• Preventive Restoration – tasks to restore an items original resistance to failure. • Preventive Replacement – tasks to replace an asset.

Characteristicsof ofPreventive Preventivetasks…. tasks…. Characteristics thefailure failuremode modehas hassafety safetyor or Theremust mustbe bean anage agewhere wherethe the IfIfthe There environmentalconsequences consequencesthen thenall allitems items conditional conditionalprobability probabilityofoffailure failurewill will environmental must survive to this age! increase dramatically (a Life) must survive to this age! increase dramatically (a Life) thetask taskisisaa IfIfthe restoration taskthen thenitit restoration task needs to restore the needs to restore the itemsoriginal original items resistance tofailure… failure… resistance to

Life

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 92

Reliability-Centered Maintenance

How important is this…?

The 6 Failure Patterns

Only 11% of failures in the original Nowlan and Heap report were related to age!

In later published studies this number has ranged from 8 to 23 % of all failures!

89% of failures were not related to age!

Yet Yetdespite despiteknowing knowingthese thesefacts factsmany manypeople peopleare are reluctant to let go of time based maintenance reluctant to let go of time based maintenance (such (suchas asmany manyscheduled scheduledshutdowns) shutdowns)

Why?

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 93

Reliability-Centered Maintenance

Predictive Maintenance

• Nearly all failures give a warning that they are about to occur or are in the process of occurring. • These warnings are known as potential failures • Operators see potential failures (warning signs) all the time… but they are not sure what they are warning them of…

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 94

Reliability-Centered Maintenance

Predictive Maintenance tasks

Items are checked for potential failures, and they are left in service on the condition that they continue to meet satisfactory performance standards

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 95

Reliability-Centered Maintenance

Predictive Maintenance tasks

The point where failure starts to occur

The point where we can detect it (Potential Failure) The point where it no longer does what we want it to do (Functional Failure)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 96

Reliability-Centered Maintenance

Predictive Maintenance tasks

P-F Interval (1 Month)

P1 P2

2 weeks

P3 # Captured by Data, 2003, Daryl Mather

Inspection Interval = Less than the P-F Interval

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 97

Reliability-Centered Maintenance

Predictive Maintenance The P-F interval is long enough for action to be taken to avoid, eliminate or minimise the consequences of failure There is a clear potential failure condition (in other words there is a clear warning that the failure of the onset of failure)

The P - F Interval

Resistance to Failure

The P-F interval is reasonably consistent

Potential Failure Identified

A task can be done at intervals less than the PF interval

Functional Failure Occurs

Time or Task Intervals © Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 98

Reliability-Centered Maintenance

Predictive Maintenance tasks

P-F Interval (10 Months)

P1

P-F Interval (1 Month)

P1

P2

P2

P3

Inspection Interval (2 weeks)

P3

Inspection Interval (3 months)

# Captured by Data, 2003, Daryl Mather

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 99

Reliability-Centered Maintenance

Condition Monitoring

Product Quality Monitoring

The Human Senses

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 100

Reliability-Centered Maintenance

Detective Maintenance Detective Maintenance Tasks

This is the multiple failure. The result of a protected function failing while the protective device is in a failed state…

It needs a failure of the function before the consequences of a hidden failure are realised!

Function

Device Failure of the device has no consequences by itself….. Therefore… for a detective maintenance task to be technically feasible we need to:

1.

Ensure that the task will not increase the probability of a multiple failure

2.

Determine whether it is practical to do the task in the desired intervals

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 101

Reliability-Centered Maintenance

Exercise 1 – Task Categories 1. Once every three years the flow rate of a wastewater pumping station is checked to see if it has deteriorated at all. If it has the team plan to replace the impeller within two weeks. 2. The drive end bearing of a motor is greased every 3 weeks 3. A hydraulic oil system provides pressure to drive the hydraulic motor that powers an apron feeder. Every so often there is a differential pressure alarm, which signals when the filter is no longer able to filter to the correct level and rate. When this occurs, the maintenance team cleans the filter. 4. A weight meter in a product handling plant is routinely calibrated to ensure that the production (profitability) of the plant is accurately measured. 5. A large DC motor regularly requires the commutator to be skimmed to prevent flashovers between the brush holders via the commutator. 6. A tank contains corrosive acid which would is prohibited from seeping into the ground by law. A task has been scheduled to perform a seepage test on this tank every 4 years.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 102

Reliability-Centered Maintenance

Exercise 2 – Which type of maintenance? The bearing in a the housing of a generator has failed once in the past causing an outage of two days to production at a cost of several million dollars. In order to avoid this occurring again the maintenance department is tasked with finding a task that will predict or prevent this occurring in the future. They decide to strip down the generator once every two years and to perform a dye-penetrant check on the bearing to search for cracks or fissures on the races primarily. We know that once cracks are able to be detected via the dye-penetrant test, the bearing usually has around 3 months left prior to total failure.

Q1. What type of task are they suggesting; and Q2. Will it solve their problem?

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 103

Reliability-Centered Maintenance

The Basis of Task Preference •

On-condition Tasks –Identify failure at the potential failure stage.

Reduce the likelihood of safety, environmental and operational consequences. Also reduce operational costs by allowing equipment to realise most of its useful life.



Preventive Restoration – When directed at specific components and parts it will lead to a reduction in the overall failure rate of items that have a dominant failure mode.



Preventive Replacement – Least favoured of the three.

Can reduce safety related consequences in some failure modes. However is a larger cost of execution. (Reduced cost effectiveness)



Detective Tasks – If the other three are not able to be applied, then this is the best option for hidden failures.

If the frequency is practical, it can be done safely and the task itself does not substantially increase the risk of failure, then this is the selected option. Unlike the other three tasks this option will leave the function in a state of unavailability for a period

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 104

Reliability-Centered Maintenance

RCM-DO-06c Uses of MTBF Mean Time Between Failure or MTBF4, is one of the most widely used metrics in physical asset management. Generally, companies use it as a guide to the performance of their physical assets, helping them to identify assets or processes that are causing lost revenue or cost related issues. However, although widely applied, MTBF is still the subject of some confusion. Moreover, MTBF is useful for a range of different purposes, giving organizations greater ability to increase the net present value of their physical asset base. When companies first look at implementing MTBF, they tend to ask three fundamental questions: 1) What MTBF can tell us about our assets, 2) what levels can it be applied at, and 3) how can MTBF be used to add value to our reliability initiatives?

What MTBF can tell us? The standard use for MTBF in industry is to tell us the performance of the primary function of an asset or system. Figure 1 - Example System

High-High High-Level

Duty

Standby

Low Level Low-Low 800 l / m Off - Take

For example, a pumping system consists of a duty/standby pump arrangement, a pressure relief valve, piping, and the tank and associated level switches. The primary function for this system is to pump water to tank B at a rate of between 900 l/minute and 1000 l/minute. In this case, a failure occurs when the pump system is unable to pump water at the required rate for whatever reason.

4

This module deals with MTBF in isolation and does not discuss other metrics such as MTTF (Mean Time To Failure) or MTTR (Mean Time To Repair).

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 105

Reliability-Centered Maintenance

Here, we can calculate the MTBF as follows: Total Time Required Number of Failures

So if the total time that we required to pump to deliver this function was (say) 5 years, and we had 4 failures in that time, the average time between failures would be 5/4 = 1.25. If this were the mean time between failures, then the failure rate for one year would be 1/MTBF or 1/1.25, which is 0.8, or 80% likelihood of experiencing a failure of the primary function in one year. If we then wanted to convert this into months we would first convert the MTBF figure to months, 1.25 years = 15 months, then again determine the likelihood of this occurring in one month 1/15, or 0.066. This means there is a 6% likelihood of experiencing a failure resulting in the loss of the primary function in any given month. We could do the same for a week, a day or any other given period.

The above example shows us that initial uses of MTBF can provide us with the average time between failures5 for a given time period, and that this can then be manipulated to give us a failure rate6 for any specified period of time. Thus, for one measurement of MTBF we are able to calculate the following information: •

MTBF of the Primary Function = 1.25

o

Likelihood of a failure in one year = 1/1.25years (80% or 8 x 10-1)

o

Likelihood of a failure in one month = 1/15 months (6% or 6 x 10-2)

o

Likelihood of a failure in one day = 1/456.25 days (0.22% or 2.2 x 10-3)

o

Likelihood of failure in one hour = 1/10950 hours (0.009% or 9 x 10-5)

At all times the formula takes into account the total time of the function, not of the asset itself. This means that regardless of the number or type of assets in the system, the calculation always uses the total time required of the function, or 5 years in this example.

At what level can we apply MTBF? Like many other metrics in physical asset management MTBF is applicable at any level throughout the asset base. However, for performance measurement there are two rules for its application: 1. it is always used to measure the function of the asset where it is being applied, and 2. it always uses the total time required of the function of the level where it is applied. For instance, in the example given above we determined that the MTBF for the pumping system was 1.25 years, and we were then able to derive failure rates for various other periods. In addition, we can also apply this to the assets in the system as demonstrated in table 1. 5

Total Time Required Number of Failures

6

1

MTBF

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 106

Reliability-Centered Maintenance

Table 1 - Component MTBF Asset

Function

Total Time Required

Number of Failures

MTBF

Annual Failure Rate

Duty Pump

To pump water to tank B at a rate of between 900 l/minute and 1000 l/minute

5 years

7

0.714 years

140%

Stand By Pump

To maintain 800l/minute to 1000 l/minute if the duty pump fails

5 years

2

2.5 years

40%

Piping

To provide clear access for 800 l/minute to 1000 l/minute from the pump sets to the tank

5 years

1

5 years

20%

High-High Level Switch

To trip the pumping system when water reaches the high-high level

5 years

1

5 years

20%

High Level Switch

To shut off the pump when the tank level reaches the high level

5 years

1

5 years

20%

Low Level Switch

To turn on the pump when the tank has been drained to the low level

5 years

1

5 years

20%

Low-Low level switch

To alarm when the tank level has been drained to the low-low level

5 years

0

5 years

20%

Tank

To contain up to 250,000 liters of water

5 years

0

5 years

20%

Pumping System

To pump water to tank B at a rate of between 900 l/minute and 1000 l/minute

5 years

4

1.25 years

80%

Table 1 contains some information that should immediately provoke some questions. For example, we have counted four failures in our system level MTBF, yet the table contains 13 failures. (Not counting the system failures) To understand this we need to review the functions for each of the components mentioned. For example, the function of the High-High Level Switch is to trip the pumping system when water reaches the high-high level. If there is a failure preventing this asset from performing its function, it will not prevent the system from pumping water. We have had one failure on the switch that we know about in this period.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 107

Reliability-Centered Maintenance

Another obvious issue is the fact that we have had seven failures of the Duty Pump. However, during this time we have also only had two failures on the Stand-By pump, a dormant function, which we know of. As this system has redundancy built into it, we can only experience a loss of the primary function if we have a failure of the Duty pump and the Standby pump at the same time. The four failures causing the loss of function at the system level were: •

One multiple failure of the duty and standby pump



One failure of the High Level switch, meaning the level reached the High-High level once during the 5-year period.



One failure of the Low Level switch, resulting in the Low-Low level tripping the downstream process



One failure of the piping causing downtime

Figure 2 - MTBF at Different Levels System System MTBF 1.25 years MTBF 1.25 years In any year 8x10-1-1 In any year 8x10

Multiple Pump Failure Multiple Pump Failure 1:4.17 in any year 1:4.17 in any year or or 2.4 x 10-1-1 2.4 x 10

Duty Pump Duty Pump MTBF 1.67 years MTBF 1.67 years In any year 6x10-1-1 In any year 6x10

High Level Switch High Level Switch MTBF 5 years MTBF 5 years In any year 2x10-1-1 In any year 2x10

Low Level Switch Low Level Switch MTBF 5 years MTBF 5 years In any year 2x10-1-1 In any year 2x10

Piping Piping MTBF 5 years MTBF 5 years-1 In any year 2x10 -1 In any year 2x10

Standby Pump Standby Pump MTBF 2.50 years MTBF 2.50 years In any year 4x10-1-1 In any year 4x10

All the other failure mentioned were either; hidden to the operations team until revealed by inspection, or their function was protected by other assets. (In the case of the failures on the Duty Pump) As shown in Figure 1, MTBF is useful at any level throughout an asset base. However, its’ application must be on the functions of the assets, and the total time required of each function, at each level of performance measurement.

How can MTBF add value to Reliability Initiatives? In the hands of a skilled RCM facilitator the measurement and manipulation of MTBF can be used to set the performance expectations of the physical asset base, as well as providing a base for evaluation of strategies, and to indicate the overall performance of assets; not just the performance of their functions.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 108

Reliability-Centered Maintenance

This helps organizations in the change process because they begin to think about what the assets do, rather than what they are. That is, an appreciation of functional performance as opposed to asset performance. For example, in the system described in Figure 1 we can break the system down into its’ functions, and begin to assign performance expectations to each of these. 7

Function 1 - To pump water to tank B at a rate of between 900 l/minute and 1000 l/minute Functional Failure 1.A – Does not pump water at all The water pump in this example provides, say, the cooling water for a petrochemical plant. If the system is unable to pump water, there will be a loss of production. The tank contains enough water to keep the plant running for a minimum period of 2 hours, and a maximum period of 6 hours. A multiple failure of both pumps would nominally result in a loss of production equal to, say, USD $2,000,000. In this case the asset owner would like to keep the likelihood of this occurring to a reasonably low level and after some discussion he decides on a level of 1:10,000 years, or an annual rate of 10-4. This means management of all failure modes causing this consequence, an adverse impact on operational capability, to the same level of likelihood.

Function 2 – To trip the pumping system when water reaches the high-high level Functional Failure 2.A – Does not trip when the water reaches the high-high level. In the case of the water system, an overflow of the tank would result in water in the surrounding area. While this is a slip hazard for employees sent to correct the issue, the asset owner does not regard it as a serious hazard, nor will it result in any damage to additional equipment. The failure mode is dormant, meaning it will only have consequences when there is a failure of the high-level switch and the high-high level switch. In this particular case, the asset owner is at ease accepting a higher level of risk of occurrence, say, one in every 100 years, or a likelihood of 10-2 in any one year.

Function 3 – To alarm when the tank level is at the low-low level Functional Failure 3.A – Does not trip when the tank is at the low-low level. As with the High-High protection this alarm is only required once there has already been a failure of some sort, in this case, notably a failure of the Low-Level Switch. If this was to occur, and the tank consequently ran dry, the results would be catastrophic in financial terms. The downstream equipment would run dry, and the plant would be without cooling water forcing a loss of production estimated at around 3 days or USD $6,000,000 in this case. There would also be damages conservatively estimated at USD$1,500,000 for producing assets. The asset owner sees this as the worst possible outcome of a failure of this system. As a result, he would like to keep the likelihood of failure at 1:100,000 years, or 10-5 per year. The resulting performance expectations of failure modes are in Table 2 below. We can see that the sum of each of the failure modes contributing to the loss of function must equal the desired failure rate, or risk, at the above level. (Assuming these are all the relevant failure modes)

7

Full details about how to construct a risk profile based on performance expectations is contained in module RCMDO-05a Tolerable Levels of Risk (A Study of Industry)

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 109

Reliability-Centered Maintenance

Table 2 - Functional MTBF Function To pump water to tank B at a rate of between 900 l/minute and 1000 l/minute

Failure

Desired Failure Rate

Existing Annual Failure Rate

-4

Desired failure rate is 10 , therefore every failure mode underneath must be managed to at least 4x10-5 to ensure this level is reached.

Multiple Pump Failure

4x10-5

1:1.67 x 1:2.5 = 1:4.17 or 2.4x10

-1

Piping

4x10-5

1:5

High Level Switch

4x10-5

1:5

Low Level Switch

4x10-5

1:5

To trip the pumping system when water reaches the high-high level

High-High Level Switch

1:10-2

1:5 x 1:5 = 1:10

To alarm when the tank level has been drained to the low-low level

Low-Low level switch

1:10-5

1:5 x 1:5 = 1:10

Here we can see the desired failure rates set out in Table 2 for each function, and translated into a performance requirement for each failure mode. We can also record actual MTBF measures against this to see how effective we have been in managing the failures of this asset to the desired levels of performance. However, this would only be a guide. The MTBF measured would only calculate since the beginning of measurement. The best use of this approach is to provide valuable input for RCM analysts, as well as for other applications within the reliability field. It would also give asset owners a pre-determined risk envelope that they require their assets to work within, increasing their control over asset performance, and hence over corporate profitability.

Summary MTBF is an exceptionally useful metric in the field of physical asset management and it is possible to apply it at any level throughout the physical asset base. The principal benefit of wide ranging use of MTBF is that it begins the process of focusing a company on how the assets work to fulfill a function, rather than what those assets actually are. This is one of the fundamental concepts of Reliability-centered Maintenance.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 110

Reliability-Centered Maintenance

As such, at whatever level it is applied, MTBF measures the function performed by that asset, asset system, or entire process. It is also useful for proactively establishing the performance expectations of the asset base, particularly in the areas of the Efficiency function.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 111

Reliability-Centered Maintenance

RCM-DO-06d Advanced Detective Maintenance Techniques

To Process

Consequence: Explosion of the pressure vessel when under high pressure conditions

To Process

To Process

To Process

To Process

Multiple Failure Event: Dangerous build-up of gas pressure within the pressure vessel.

To Process

Hidden-Failure: Failure of pressure release valve on high pressure vessel in a gas plant

# The Maintenance Scorecard, Daryl Mather, Industrial Press, ISBN 0831131810

If the ultimate high level switch fails closed, it is evident… If the ultimate high level switch fails open, then nobody knows it has failed…

High level shuts off pump until low level turns it back on again

Low low level switch turns off the pump until manually reset

1000 l/m

Ultimate high level switch. (normally open) Shuts the pump off until manually reset. Low level switch turns on the pump until the level reaches the high level switch

800 l/m

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 112

Reliability-Centered Maintenance

The probability that the protected function will fail in any one cycle is given by its failure rate One year Protected Function Protective Device

B Fails C Fails

If the failure rate is once in four years, then the probability that it will fail in one year is 1 in 4. (This corresponds to a mean time between failure of four year)

© Copyright Meridium... Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 113

Reliability-Centered Maintenance

The probability that the protective device will be in a failed state at any point in time is given by its downtime (if it conforms to a random failure pattern) One year Protected Function Protective Device

B Fails C Fails

If the downtime is 33% then the probability that is will be in a failed state at any point in time is 1 in 3. (This corresponds to an availability) © Copyright Meridium... Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 114

Reliability-Centered Maintenance

The Probability of a Multiple Failure One year Protected Function B Protective Device C

Mean Time Between Failures = 4 years

Availability = 67%

Downtime = 33%

The probability that B will fail while C is in a failed state: 1 in 4 x 1 in 3 = 1 in 12 (In other words there is a one in twelve chance that the multiple failure will occur in any one year)

© Copyright Meridium... Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 115

Reliability-Centered Maintenance

When developing a failure management policy for a hidden function, the first stage is to decide what probability we are prepared to tolerate for a multiple failure…

One year Protected Function B Protective Device C

Mean Time Between Failures = 4 years

Availability = 67%

Downtime = 33%

The probability that B will fail while C is in a failed state: 1 in 4 x 1 in 3 = 1 in 12 Prepared to accept 1 in 1000

© Copyright Meridium... Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 116

Reliability-Centered Maintenance

Reducing the Probability of a Multiple Failure Reduce the probability of failure of the protected function (by applying a suitable failure management policy)

One year Mean Time Between Failures = 10 years

Protected Function B Protective Device C

X Availability = 99%

Unavailability = 1%

And/or by increasing the availability of the protective device: - by preventing the failure of the protective device, or - by periodically checking whether the protective device is still working and repairing it if it has failed - by modifying the system in some way The probability that B will fail while C is in a failed state is now 1in 10 x 1 in 100 = 1 in 1000 © Copyright Meridium... Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 117

Reliability-Centered Maintenance

• 6 identical PSV’s have each been checked once a year for 5 years (FFI = 1 year) • So the devices have been in service a total of 30 years

To Process

From Process

• In that time 3 were found to be in a failed state • So the MTBF of the devices (MTBFdevice) is 30 years / 3 failures = 10 years Year 1

• We know that the failed devices failed some time during the year before the checks – but not when… • It seems reasonable to assume that each failed device was down for an average of 6 months

Year 2

Year 3

Year 4

Year 5

1 2 3 4 5 6

1.

So the total downtime (DTdevice) was 1.5 years out of 30 or 5%

2.

So on the basis of these figures it appears that: FFI = 2 x DT device x MTBF device

3.

This is generally true if DT device <5% and MTBF device is random

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 118

Reliability-Centered Maintenance

One year Protected Function B Protective Device C

Mean Time Between Failures = 10 years

Fails Failed

1 in 10 x 1 in 100 = 1 in 1000

Step Two Determine / estimate how often the protected function is likely to need to protective device

Step Three Calculate what unavailability of the protective device enables us to achieve 1 given 2

if DTdevice = Unavailability of the protective device MTBFfunction = Failure rate of the protected function MTBFmultiple = Failure rate of the multiple failure

Step One Decide what probability we tolerate for the multiple failure

then (1/MTBFfunction) x DTdevice = 1/MTBFmultiple

or DTdevice = MTBFfunction / MTBFmultiple

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 119

Reliability-Centered Maintenance

We have seen that… Where: Where:

••FFI FFI ==failure failurefinding findingtask task interval interval = Unavailability of ••DT DTdevice device = Unavailability of the theprotective protectivedevice device = MTBF of the ••MTBF MTBFdevice device = MTBF of the protective protectivedevice device ••MTBF = MTBF of the MTBFfunction function = MTBF of the protected function protected function = MTBF of the ••MTBF MTBFmultiple multiple = MTBF of the multiple multiplefailure failure

FFI = 2 x DT device x MTBF device

…1

…and that… DTdevice = MTBFfunction / MTBFmultiple

…2

Therefore.. by substituting 2 into 1 gives…

FFI =

2 x MTBFfunction x MTBFdevice MTBFmultiple

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 120

Reliability-Centered Maintenance

Exercise 1 – Steam Turbine The function of a speed sensor on a large steam turbine (680MW) is to measure the rotational speed of the turbine and to shut off the steam supply if the speed exceeds a specified limit. The multiple failures which could occur if this mechanism does not work when required is that the turbine could speed up to the point where centrifugal forces cause it to disintegrate. The electric utility which operates the turbine decides that they will accept a probability of the multiple failures once in 100,000 years for any one turbine. The utility has twenty similar turbines in operation for an average of ten years each, giving a total of 200 years of operating experience. As far as anyone knows, only two of these turbines have tripped out due to over-speeding during this period. This corresponds to an MTBF of the protected function of 100 years for any one turbine. The utility has never found one of the over speed mechanisms to be in a failed state when they have carried out failure finding checks on their own machines, but data from a commercial data bank indicate an MTBF of 500 years. How often should the utility perform a failure finding task on the over speed mechanism in order to reduce the probability of failure of the multiple failure to the desired level?

FFI =

2 x MTBFdevice x MTBFfunction MTBFmultiple

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 121

Reliability-Centered Maintenance

Exercise 2 – Steel Plant A steel producing plant has a need for many product-handling assets to move around the raw iron ore prior to processing. As part of this asset base, they have 15 large conveyors. Each of these has 4 e-stops, one on either side of the head end, and one on either side of the tail end of the conveyor. The management has tasked the maintenance team with determining a frequency for testing the function of each of these e-stops to make sure that when we need them to work they will work. After some discussion, they consulted relevant specifications and determined that they wanted these estops to meet their SIL-2 classification. For this company that means a likelihood of 1:1,000,000 (106) that any one would have a failure in any one year. They found that on their own plant they had never experienced a failure of one of the emergency stops. However, on consulting a commercial data store they found the following information: •

A population was tested over a time period of 108 hours



During this time the item was found to have failed 14 times in an undetected and unsafe manner,



And 60 times in a detected safe fashion They were installed all of the conveyors at roughly the same time 18 years ago. After conversations with a few of the longer serving people they were able to ascertain that they had required to use an e-stop, either to protect people or to protect life, approximately 12 times. What frequency will they need to do for the detective task to maintain the level of risk that the company has deemed as tolerable?

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 122

Reliability-Centered Maintenance

Common Cause Failure Modes

Calculation 2

• Managing more than one hidden failure…

U1

– Any failure mode could take out the protective function Failure Mode 1 Failure Mode 1

– Any failure modes that can be managed via predictive or preventive routines should be managed that way

Failure Mode 1

– All failure modes can be managed via one detective maintenance task

n…

– The detective task does not increase the likelihood of a multiple failure – It is practical to do the task at the required interval

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 123

Reliability-Centered Maintenance

We saw previously that…

Calculation 2

1/MTBFMultiple = (1/MTBFFunction) x DTTOTAL

B Fails C Fails U1

U2

Therefore…

U3

DTTOTAL = DT1 + DT2 + DT3

MTBF Function / MTBF Multiple = DT1 + DT2 + DT3 We can deduce from what we also saw previously… DT Device = FFI / 2 x (MTBF Device)

If we call MTBF of each of the three failure modes MD1, MD2 and MD3 respectively then… MTBF Function / MTBF Multiple =(FFI/2 x MD1)+(FFI/2 x MD2)+(FFI/2 x MD3) Therefore…

FFI =

2 x MTBF Function MTBF Multiple x (1/MD1+1/ MD2+1/ MD3)

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 124

Reliability-Centered Maintenance

Exercise 4 - Hoist A speed sensor on the hoist drum of a crane used in a machine shop is designed to activate the emergency brake on the main hoist if the drum starts turning too fast. If any aspect of the emergency braking system does not work when required and the hoist drum runs away, industry standards statistics suggest that there is a 5% chance that someone could get badly hurt or killed as a result. The group performing the review decides that they would like to reduce the probability of this happening to once in 200,000 years. If there is only a 1 in 20 chance (5%) that the multiple failure of the over speeding drum and failed emergency brakes will hurt or kill someone, an overall probability of 1 death or injury in 200,000 years for this reason can be achieved if the probability of the multiple failure itself is reduced to 1 in 10,000 years. This is a new system, so the users of the crane have no historical data about its performance. However, the suppliers of the speed sensor advise that it has an MTBF in this context of 300 years, and the emergency brake an MTBF in this context of 100 years. No information is available about the reliability of the electrical circuit between the two, but the behavior of similar circuits on similar cranes suggests an MTBF of 200 years. The circumstances under which the drum over speeds and needs the emergency brake occur on average once every 50 years. You are asked to determine how often the emergency braking system should be tested to reduce the multiple failure probability to the required level.

2 x MTBF Function

FFI =

MTBF Multiple x (1/MD1+1/ MD2+1/ MD3)

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 125

Reliability-Centered Maintenance

Options for redesign What if, we re-did the speed sensor example, but with different figures? (A higher level of tolerable risk and a lower device failure rate?)

…The electric utility which operates the turbine decides that they will accept a probability of failure of the multiple failure once in (say) 1,000,000 years for any one turbine. The utility has twenty similar turbines in operation for an average of ten years each, giving a total of 200 years of operating experience. As far as anyone knows, only two of these turbines have tripped out due to over-speeding during this period. This corresponds to an MTBF of the protected function of 100 years for any one turbine. The utility has never found one of the over speed mechanisms to be in a failed state when they have carried out failure finding checks on their own machines, but data from a commercial data bank indicate an MTBF of 100 years. How often should the utility perform a failure finding task on the over speed mechanism in order to reduce the probability of failure of the multiple failure to the desired level?

FFI =

2 x 100

x 100

1,000,000

FFI =

7.3 days

We can… Make the function evident somehow …or… Provide additional layers of protection __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 126

Reliability-Centered Maintenance

One year Protected Function B Protective Device C

10-2 x 10-2 = 1:10-4

Mean Time Between Failures = 5 years

Availability = 75%

Downtime = 25%

Function

10-2 x 10-2 x 10-2 = 1:10-6 Device 1 Device 2

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ ________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 127

Reliability-Centered Maintenance

In this case the unavailability of these devices should be squared in the failure finding formula… 1/MTBFMultiple = (1/MTBFFunction) x (DT Device)2

Formula 3

B Fails U1

Therefore…

C Fails

(MTBF Function / MTBF Multiple)1/2= DT Device

U2

If MTBFFunction and MTBFMultiple are given then… DT Device = (MTBFFunction / MTBFMultiple)1/2

Therefore…

FFI = 2 x MTBF Device x

1/2 MTBF Function MTBF Multiple

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 128

Reliability-Centered Maintenance

What if, we re-did the speed sensor example, but with different figures? (Higher level of tolerable risk, and lower device failure rate?) (Now two sensors) …The electric utility which operates the turbine decides that they will accept a probability of failure of the multiple failure once in (say) 1,000,000 years for any one turbine. The utility has twenty similar turbines in operation for an average of ten years each, giving a total of 200 years of operating experience. As far as anyone knows, only two of these turbines have tripped out due to over-speeding during this period. This corresponds to am MTBF of the protected function of 100 years for any one turbine. The utility has never found one of the over speed mechanisms to be in a failed state when they have carried out failure finding checks on their own machines, but data from a commercial data bank indicate an MTBF of 100 years. How often should the utility perform a failure finding task on the over speed mechanism in order to reduce the probability of failure of the multiple failure to the desired level?

FFI = 2 x MTBF Device x

MTBF Function

1/2

MTBF Multiple 100 FFI = 2 x 100 x

1/2 FFI =

2 years!

1,000,000

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 129

Reliability-Centered Maintenance

Multiple Redundant Devices

• We can do this for any number of devices tested randomly… FFI = 2 x MTBF Device x

MTBF Function

1/n

MTBF Multiple

• If all are tested together then the formula becomes… 1/n FFI = MTBF Device x

(n+1) x MTBF Function MTBF Multiple

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

Exercise 5 – Pumps and PSV’s

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 130

Reliability-Centered Maintenance

A hydraulic system is protected from overpressure by four Pressure Relief Valves (PRV’s). One is placed in the line to the line directly from the duty and standby pumping arrangements, and there is one PSV in each of the supply lines to the accumulators. If the pressure exceeds the safe working pressures then the PRV’s will relieve the pressure in the lines back to the hydraulic oil tank. All PSV’s are set to the same pressure level. The unit operates under extremely high pressures and if the safe working pressure is exceeded there is a chance of a pipe rupturing, exposing people in the surrounding areas to pressures likely to cause serious injuries. Risk ranking structures set-up by the corporate safety department has deemed this asset as a high criticality asset. This means that it will need to be managed to a tolerable probability of failure of 1:1,000,000. PSV 1

Accumulators

PSV 2

In the 12 years that the hydraulic system has been installed it has never once required any PRV to relieve the pressure within the hydraulic circuit to the accumulators. For this system they were unable to find failure rate information in commercial databanks.

However, a quick call to their 5 other plants in their company showed them that PSV 3 there were 4 such systems in the company, with a combined operating life of 80 years. Incident records show that the PRV’s have been used to relieve the pumps 10 times. Evidence from the manufacturer suggests that the PRV’s have a failure rate of 1:100. Given that all three will be tested at the same time, what is the failure finding frequency required to achieve the tolerable probability of a multiple failure?

1/n FFI = MTBF Device x

(n+1) x MTBF Function MTBF Multiple

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 131

Reliability-Centered Maintenance

Managing Risk in Hidden Failures Environment

Safety

Predictive

Preventive Restoration

First establish whether there is an intolerable risk or not. Second determine if a Predictive task is applicable and effective Third determine if a Preventive task is applicable and effective

Preventive Replacement Failure Finding

Redesign

Fourth determine if a Detective task is applicable and effective Fifth – the protection is inadequate

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 132

Reliability-Centered Maintenance

Voting Systems k out Formula of n systems 6

If “r” = number of units that need to be in a failed state before the entire system would fail then…

B Fails U1

U2

U3

r = n – k +1

C Fails

Therefore; if FFI is a very small fraction of MTBF Device it can be shown that: 1/r (n-1)! x r! x (r + 1) x MTBF Function FFI = MTBF Device x

n! x MTBF Multiple

! = Factorial (Used a lot in combinatronics and other probability theory statistical formulae) 5! = 1 x 2 x 3 x 4 x 5

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 133

Reliability-Centered Maintenance

Economic Consequences

But what about economic consequences…? • Operational and economic-only consequences are purely economic – In other words, the only consequence of a multiple failure that does not affect safety or the environment is that it costs money.

• But doing a failure finding task also costs money – So in this case, we need to determine the failure finding task interval that reduces total costs to a minimum, and then ask whether the minimum total cost is acceptable

© Copyright Meridium... Inc. 2007

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 134

Reliability-Centered Maintenance

We saw previously that…

FFI =

2 x MTBFdevice x MTBFfunction MTBFmultiple

Therefore the probability of failure in any one year

FFI 2 x MTBFdevice x MTBFfunction CM x FFI

The annualized cost of failure will be

2 x MTBFdevice x MTBFfunction CFF

The annualized cost of doing a failure finding task

FFI C Device

If FFI is a fairly small fraction of MTBF Function, the annualized cost of repairing the failed protective device will be approximately: MTBF Device

Likewise for the function…

C Function MTBF Function

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 135

Reliability-Centered Maintenance

Annualised cost of a multiple failure

dCTotal

Cmultiple

dFFI

x

FFI

2 x MTBFdevice x MTBFfunction

+

CFF FFI

+

C Device MTBF Device

+

C Function MTBF Function

At a minimum when

Annualised cost of failure finding CFF

Cost

FFI

Interval between failure finding tasks

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 136

Reliability-Centered Maintenance

Annualised cost of a multiple failure Cmultiple

x FFI

2 x MTBFdevice x MTBFfunction

dCTotal

+

CFF FFI

C Device

+

MTBF Device

Cmultiple

=

2 x MTBFdevice x MTBFfunction

-

dFFI

FFI2 = Where: Where:

+

C Function MTBF Function

CFF FFI2

2 x MTBFdevice x MTBFfunction x C FF Cmultiple

•• CCmultiple ==Cost Costof ofone oneMultiple Multiple multiple Failure Failure •• CCFF ==Cost Costof ofone onefailure failure FF finding findingtask task = Failure rate of •• MTBF MTBFdevice device = Failure rate of the protective the protectivedevice device •• MTBF = Failure rate of MTBFfunction function = Failure rate of the protected the protectedfunction function

1/2 2 x MTBFdevice x MTBFfunction x CFF CMultiple

__________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________ __________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 137

Reliability-Centered Maintenance

Exercise 6 – Economic Hidden Failures A hydraulic motor is used to drive an agitator on a reactor vessel in a chemical plant. The oil tank on the hydraulic system contains a low level alarm which is used to remind the operators when to fill the tank with oil. An ultimate low level alarm which is designed to shutdown the hydraulic system if the system runs low on oil and the upper switch fails to warn the operators. If both switches fail and the oil runs out, the motor could be severely damaged and the reactor down for up to 5 hours. This would cost the company $1,500 in lost production and $525 to repair the motor – a total cost of $2,025. The company has three such reactor vessels each driven by its own hydraulic system, and the operators can only recall two occasions in which an ultimate low level switch has needed to stop a motor over a period of twelve years. This means that the mean time between failures of the protected function is 18 years. (MTBFfunction) Until now the low level switches have never been checked, nor have they been in a failed state when called upon to work. In the absence of any other information and after careful study of the configuration of the switches, the RCM Facilitator decides that the MTBF of the ultimate low level switch is likely to be about twice that of the low level alarm, or 36 years. (MTBFdevice) It is difficult to reach the switches, so a full functional check of the ultimate switches requires lowering the level of the tanks under controlled conditions and checking whether the motors cut-out. This task takes about an hour per tank at a cost of $25 per task (CFF). In the light of this information, you are asked to determine the optimum failure finding interval for these switches.

1/2 2 x MTBFdevice x MTBFfunction x CFF CMultiple

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 138

Reliability-Centered Maintenance

RCM-DO-07 The Value of RCM As a cornerstone of the maintenance discipline, RCM can achieve benefits in a vast number of areas depending on where and how it is applied. When properly implemented, Reliability Centered Maintenance provides companies with a tool for achieving lowest asset Net Present Costs (NPC) for a given level of performance and risk. This implies a cashable impact across a multitude of economic activities, covering both OPEX8 and CAPEX9. However, RCM will also provide companies with a range of non-cashable advantages that will have a positive impact throughout the enterprise. This document contains a brief list of potential areas of benefit only, and not the entire range of potential uses of RCM. Along with these areas, the author has previously used RCM for •

capital submissions in regulated industries,



to reduce the risk of legal ramifications in management of environmental integrity,



to establish a tool for contract negotiations related to outsourced maintenance,



reduction of a companies carbon footprint,



and as a means of developing trouble shooting guides

The information in this module is to alleviate some of the benefits anxiety that often surfaces in the early implementation stages of large-scale RCM projects, and to provide guidelines for trainee RCM Analysts.

The Cashable Results of RCM Direct cashable benefits from implementing RCM can emerge in every area where maintenance and operations have an impact. This can include such disparate areas as increased uptime, decreasing energy usage, reductions in chemical utilization, or reductions in inventory holdings and routine maintenance spending. Instead of trying to cover all the potential areas where the method can deliver financial impacts, this section will focus more on how RCM influences the profit and loss of an enterprise. This is evident in two principle areas, •

an increase in potential revenue, and



direct cost reductions.

8 9

OPEX – Operational Expenditure CAPEX – Capital Expenditure

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 139

Reliability-Centered Maintenance

Direct Cost Reductions The main noticeable result of Reliability Centered Maintenance is a dramatic change to the maintenance regimes that are in place.

John Moubray, a pioneer in this field until his recent passing, regularly stated that RCM would achieve “a reduction of between 20% and 70% in routine maintenance where there is an existing scheduled maintenance program.” Based on the experience of the author, this leads primarily to an increased level of cost-effectiveness of maintenance, particularly in industries that are very asset intensive.10 The team is able to claim benefits in these areas where there is a calculable reduction in the cost of labor, materials or consumables to perform maintenance11 over a reasonable amount of time. (Usually a year) Logically, these are only potential benefits at the completion of the analysis, as it will take until the first omitted routine, or the first breakdown requiring reduced resources, before savings begin to accrue. However, once implemented they can easily be counted through direct calculation. For this to be accurate there is a need to quantify both the routine maintenance costs as well as the corrective maintenance costs. There are some real world limitations on attempting to forecast cost reductions purely through accumulated data. The first issue the team can face is that current maintenance regimes often do not exist in the company’s ERP or CMMS program, or they group them at a high level. Data losses, poor ERP management, and distrust of technology means that experienced technicians often keep the knowledge of existing maintenance outside of corporate systems. Further compounding the issue is the disparate way that maintenance routines are stored. At times, they are at an asset level, a maintainable item level, and still other times they can be at higher system or unit levels. A second limitation is that on the occasions when RCM proposes a more rigorous policy, there is a tendency to overlook the change in reactive and corrective maintenance.12 Still, some direct cost reduction cases are obvious and do not require a detailed activity analysis. Every task in an RCM analysis must be both applicable, meaning it is physically possible to do the task, and effective, worthwhile doing in terms of cost and/or risk, before selection as an adequate failure management strategy. 10

Asset-Intensive – Industries where asset maintenance and asset replacement form major parts of OPEX and CAPEX 11 Maintenance refers to both routine and corrective or reactive activities. 12 The issues surrounding RCM and WoL asset management are covered in more detail in “RCM-DO-10 RCM and Whole-of-Life (WoL) Asset Management”

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 140

Reliability-Centered Maintenance

When maintenance is developed using an unstructured method there are common errors that can occur. Ineffective Maintenance One of the great misleading statistics in asset maintenance today is the calculation of average life for bearings. The effect of this is to support the outdated and almost mystical belief of the link between age and failure.

Based on this way of thinking, it is still common to find maintenance departments carrying out hardtime bearing replacement programs as a means of managing risk. However, it has been the experience of the author that hard time bearing replacement policies can increase, rather than decrease, the likelihood of failure while at the same time increasing the direct maintenance costs. This flies in the face of popular beliefs and is an example of how RCM thinking can drive reductions in routine maintenance levels. The original Nowlan and Heap report13 specifically spoke about bearings when addressing failure in complex assets. A complex item, as opposed to a simple item, is one that is subject to many failure modes. As a result, the failure processes may involve a dozen different stress and resistance considerations. Even with complex items, failures related to age will concentrate about an average age for that mode. However, bearings have many failure modes. Where there is no dominant failure mode14, as is the case in complex items such as most bearings, then distribution of the average life of all the failure modes is widely dispersed along the entire exposure axis.15 Therefore, failure will be unrelated to operating age. This is a unique feature of complex items. When deciding maintenance policy for bearings, this issue is further exacerbating by the provision of the L10 life by manufacturers. This number represents the point at which 10% of the items may have failed, meaning that 90% will have survived. Lieblein and Zelen, in their seminal work on the subject of bearing life16, found that the characteristic life, the point where statistically 63.2% of the items will have failed, was roughly 5 times the L10 life. They also found that the “life” forecasts had a median Weibull Beta value of 1.4, indicating a near constant probability of failure. This means that the likelihood of failure at any point in the life of the bearings in their study increased only marginally as the asset aged. Other published analyses have quoted a beta of “1.3” for Ball and Roller Bearings, and a beta of “1” for sleeve bearings.17

13

Reliability-centered Maintenance, F.S. Nowlan et al, United Airlines, San Francisco, December 1978 Dominant failure mode – the most common cause of failure 15 Reliability-centered Maintenance, F.S. Nowlan et al, United Airlines, San Francisco, December 1978 16 Statistical Investigation of the Fatigue Life of Deep Groove Bearings, J. Lieblen and M. Zelen, Journal of Research of the National Bureau of Standards, Vol 57, No 5, November 1956. 14

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 141

Reliability-Centered Maintenance

In process manufacturing industries, we find contaminated oil as one of frequent reasons for early life failures. However, this is only one of the multitudes of stresses that bearings face as complex assets. Others can include poor storage leading to false brinnelling and early corrosion, excessive heat and pressure, overloading, exposure to vibration, abrasions and cracks. All of these could contribute to either early life failures, or premature wear out.

L50 Life

Often, the L10 life is mistaken for an end life point for bearings, thus used as a reference interval for replacement tasks. However, as can be seen from the information above, it is not the end-life, rather a minimum guaranteed life for 90% of bearings under specific load conditions. This is in line with Nowlan and Heaps’ findings and shows that in many cases we are at best wasting a large portion of the bearings useful life, making this an ineffective use of maintenance resources.18

Characteristic Life 63.2%

Complex assets, such as bearings, do not have a dominant failure mode. Instead they many different stresses leading to failure.

L10 Life

Average Life

Conditional probability of failure Likelihood of failure at every point… Constant / Random These failures are distributed along the stress axis, making failure unrelated to age. This is unique to complex assets.

Increased bearing life and decreased labor costs are not the only potential savings. Frequent replacing of bearings on, say, motor shafts we introduce the likelihood of a range of additional failure modes. For example, installation and frequent change out failures include: Wear of the motor shaft, decreasing the adequacy of the interference fit; leading to bearings spinning on the shaft (A failure of the motor, not of the bearing) Over heating of the bearing leading to early life failures and distortion of the inner race Excessive force (i.e. Hammers) instead of bearing pullers, damaging the races of the bearings and leading to early life failures 17

Bloch, Heinz P. and Fred K. Geitner, 1994, Practical Machinery Management for Process Plants, Volume 2: Machinery Failure Analysis and Troubleshooting, 2nd Edition, Gulf Publishing Company, Houston, TX 18 Over one machine, this appears to be a very small maintenance cost item. However, when applied throughout a plant, or on the so-called “critical” assets, it amounts to a significant maintenance cost.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 142

Reliability-Centered Maintenance

Bearing misalignment Wrong bearing selection Pre-failed bearings due to poor storage techniques While we can manage some of these, others are a direct result of frequent bearing changes. Therefore, if we use hard time bearing replacement as a maintenance policy then we are: a) reducing the maximum used life of the bearing, and b) increasing the likelihood of failure through the introduction of several additional failure modes In the Meridium RCM decision algorithm19, a management policy for an Evident Operational and Non-Operational failure mode must comply with the following:

“Over a period of time, the failure management policy must cost less than the cost of the operational consequences (if any) plus the total cost of repair.” Ineffective maintenance is more common than most professionals think, it can also include areas such as maintenance out of context, where maintenance regimes are unaligned with how the asset is used, or practices that decrease an assets efficient operations. Using the decision algorithm in RCM, the first option available to the team is Predictive Maintenance. Where this is both applicable and effective it will increase the effectiveness of maintenance in a range of areas: Predictive Maintenance detects the signs of the onset of failure. As such, it provides the capability to manage all failures, including random failures. It can be done in-situ and often without interfering with the normal operation of the process. It will ensure that the asset utilizes all of its economically useful life. (As opposed to hard-time replacements) Inapplicable Maintenance This mistaken belief that there is always a relationship between age and failure leads maintenance departments to all sorts of policies that, in practice, are achieving nothing.

Often these occur during maintenance turnarounds. The opportunity to access items that are normally in a running state drives people to inspect items just in case a life related failure mode has developed. In particular, this again is a common activity in relation to bearing management. For example, a turbine turnaround occurs once every 3 years (say) for other failure management reasons.

19

The Meridium RCM Decision Algorithm is based on Figure 17 – A Second Decision Diagram Example, page 49, SAE JA1012, 2002-01

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 143

Reliability-Centered Maintenance

The maintenance department has taken this opportunity to perform a dye penetrant check on the bearing to see if any cracks are starting to form, requiring them to take action. On the face of it, this appears to be a perfectly valid, even wise, use of the opportunity. However, on applying the RCM logic a little closer this perception changes dramatically. For the sake of this example, we will say that the P-F interval is about 3 months. Meaning once we detect cracks in this particular bearing, we have around three months of time prior to functional failure. If we test the bearing on a hard-time basis of every three years, and the P-F interval is three months, then the following logic applies. a) The dye penetrant test is only useful if the bearing failure is occurring at the time of inspection. b) This means it had to start developing at less than 3 months prior to opening. As we shutdown every 36 months, the likelihood of this occurring (given the randomness of bearing failure) is around 1:12. Turnaround Interval = 3 years

Moreover, the likelihood of it not occurring is around 11:12. This task does not satisfy the RCM applicability criteria and is a waste of resources. In addition, opening the bearing housing and interfering with the bearing, which presumably is operating fine, we again introduce the possibility of human error.20 Likelihood of detection 1:12 Likelihood of non-detection 11:12 P-F Interval = 3 months

It is difficult to categorize this maintenance practice directly; but the closest match in RCM is Predictive Maintenance. (PTIVE)

In the Meridium RCM decision algorithm, this means the team needs to answer all of the following questions before this task is applicable:

Is there a clear potential failure condition? What is it? What is the P-F interval? Is the interval long enough to take action to avoid or minimise the consequences of failure? Is the P-F interval reasonably consistent? 20

Human error is discussed in detail in module RCM-DO-06a Introducing Human Error.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 144

Reliability-Centered Maintenance

Is it practical to do the task at intervals less than the P-F interval? The team would be able to answer all of the above questions positively except for the last one. For the task of dye penetrant, testing it is not practical to do the task at intervals less than the P-F Interval, therefore the task is not applicable.

Inapplicable maintenance practices are widespread and, in the experience of the author, often reflect the underlying belief of a consistent relationship between age and failure. Increases in Revenue There are two specific areas where an RCM team can claim savings.

a) Where an asset, or system, has a history of failures leading to lost production opportunities. Principally this refers unplanned shutdowns, overrun turnarounds, and start up issues of an asset or system. b) Where an asset, or system, has a history of failures leading to reduced production output. This includes areas such as utilization, quality, and reduced availability. For example: a. Reduced turnaround times b. Increased yield (quality) c. Increased availability for full production rates

Unplanned Shutdowns

Downtime

Shutdown Overruns Startup Failures Off Spec. Production

Under-performance Production Slow Down

Uptime Best Achievable Rate

Planned Capacity

The RCM team can claim these savings only where they can prove they have isolated the cause of the lost, or reduced, production and have recommended a strategy that will mitigate it or prevent it in the future.

These are potential because it will take a reasonable amount of time, nominally one year, before effective measurement can prove reduced production losses. However, it is often the case that there are noticeable increases in available uptime after implementing RCM maintenance policies. Calculating benefits in this case requires the estimation the value of additional uptime, throughput or yield, as well as the reduced costs of labor and materials.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 145

Reliability-Centered Maintenance

As these are historic failures, issues such as quantification of lost production, direct maintenance costs, and the frequency of failure are relatively easy to find out. However, an alternative is to use sophisticated forecasting techniques such as Crow-AMSAA. This is time proven as an accurate method for forecasting failure rates; enabling the team to then calculate savings from the changes to asset maintenance. This is also a valid method for forecasting savings in direct costs. Other Cashable Benefits It is the experience of the author that CAPEX, as opposed to OPEX, benefits often represent the largest cashable advantages to implementing RCM. •

A delayed use of capital, compared to the pre-RCM scenario, allowing deployment elsewhere in the enterprise. This occurs through life-extension, and through higher confidence decision making.



A reduction in operating losses, over the life of the asset base, attributable to correct timing of capital refurbishment and replacement tasks



A potential reduction in the cost of capital and the cost of insuring assets, due to the increased confidence in decision-making



Through the incorporation of risk into the budgeting process, the benefits of this are literally incalculable as they depend on how the organization uses this information in the marketplace.



A calculable reduction in inventory holdings based on the RCM approach.

While there are other cashable benefits, the above listed items represent the most common and the least debated among the reliability communities.

The Non-cashable Results of RCM RCM will increase the teams’ awareness of the limitations and operational requirements of the physical assets they study, often substantially. This results in the following intangible benefits: •

A reduction in the risk of safety and environmental integrity related failure modes.



Increased knowledge of the assets, their functions and their failures



Increased ability to trouble shoot failed assets



Changes to P&IDs specifically, and at times to other process drawings



Changes to operation procedures, training, purchasing, work practices and other related areas



A tangible increase in the quality and integrity of asset data because of the focus of RCM

However, it is often difficult, if not impossible, to measure the extent of the impact or to link them to changes in the profitability of the enterprise. At times, the effort to do this can actually distort or obscure the achievement itself. (Attempts to equate a reduction in the risk of loss of life to a monetary value, is an example of this)

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 146

Reliability-Centered Maintenance

However, it is possible to represent some non-cashable benefits in monetary terms. The most common of these is cost avoidance. Risk Mitigation When the mitigated risk is economic, it is often termed cost avoidance.

Where the team has implemented a policy for a reasonably likely21 failure mode where there was an inadequate existing strategy in place, the team is justified in claiming this as a potential benefit of RCM, even though the failure has not occurred previously. These benefits count as non-cashable for a number of reasons: 1. They will never appear as part of the profit and loss of any enterprise. Nor will they cause a change to maintenance budgets or revenues. 2. The team requires estimates to calculate the cost avoidance benefit. Some failure modes may have similar consequences, affect similar assets, and have overlapping impacts on production. For example, RCM teams can find themselves presenting benefits of several times the value of the entire installation. If not explained correctly this is a false representation, which can erode the credibility of RCM, and of the team attempting to implement it. They are nevertheless valid and important benefits for the RCM team to claim. Note the emphasis on “an inadequate existing strategy”. RCM did not invent maintenance, and often there are adequate existing failure management policies in place. As an output, the team will find that some maintenance regimes will disappear, some will remain, and they will add some new, more sophisticated, regimes. Redundancy

New

Existing pre-RCM routines

Remaining pre-RCM routines Net maintenance tasks

This occurs because some of the maintenance policies in place are redundant, some are either inapplicable or ineffective, yet others are adequate means of managing failure. Thus, there is no justification for claiming benefits where there is an adequate existing strategy to manage the failure mode.

New

Nor is there any justification for claiming benefits where failure modes are not reasonably likely. Other areas of risk mitigation are failure modes that would affect either safety or environmental integrity.

In many cases, these will have direct economic consequences through regulatory penalties, or through secondary economic damages caused by the Pre-RCM

Post-RCM

21

What constitutes reasonably likely is specific to each company, and often to each RCM analysis. Methods for determining reasonableness are not included in this module.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 147

Reliability-Centered Maintenance

failure. Where this is the case then the team can calculate the value of the cost avoided in a similar method to economic only consequences. 22 Where the failure mode will not have significant economic consequences, the delta between the discovered risk and the managed risk can represent the benefit of risk mitigation.

The Principal Barrier to Value Realization The benefits of RCM are obvious to anybody who has studied it or to any maintenance practitioner who can relate to the concepts espoused in the method. All levels within the corporation generally see different advantages to RCM and there is rarely a lack of motivation for improvement. Implementation problems commence due to fundamental misunderstandings about maintenance and the functions of physical asset management23. This leads maintenance departments to see increased risk where it does not exist. Cashable

Non-Cashable

Increased Revenue

Risk Mitigation

Reduced Costs

Knowledge Increases

For example, a maintenance manager could face any of the following recommendations: (Among others) •

Elimination hard-time applicable and effective,

replacement

policies

where



Elimination of invasive inspection while we have the opportunity on planned turnarounds.

This reluctance to change comes from the perception that this is risky, and instead of implementing the policy changes, things stay as they are. The result is more of the same.



Risk of unplanned failure stays provably higher, and



the effectiveness of maintenance stays provably lower.

Moreover, resources remain tight performing maintenance that is not required, or repairing problems caused by the activities that are supposed to prevent them. It is clear that before we can successfully implement the strategy outcomes of RCM, we first need to make sure that there is a deep understanding within the company of modern reliability principles.

22

Cost avoidance calculation methods are available in Handout RCM-DO-07a Calculating Costs Avoided, inspired by the work of Steve Soos on this subject. 23 The Role of the Maintenance Manager, Daryl Mather, 2008: • Design effective maintenance policy • Execute them as efficiently as possible • Collect relevant data for higher confidence decisions in the future.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 148

Reliability-Centered Maintenance

The Role of the RCM Facilitator/Analyst In a time of continual change, the ability to implement is one of the most prized and sought after skill sets. In module RCM-DO-08 Implementation and Execution, we highlight the importance of momentum and the vital role of benefit awareness in creating momentum. RCM often requires the cooperation of a range of departments; including purchasing/stores, human resources/training, operations, maintenance and the engineering department. In the experience of the author, initiatives are not successful over the medium-long term when companies try to order change. If you want to change the way an organization works fundamentally, then people have to want to change. For this to happen they need to understand the logic behind RCM, and they must understand what the benefits are to them in their present role. One of the useful tools for engaging people is a solid, fact based benefits cases for every analysis that is completed. If it is to be effective, then this task should commence during the analysis period itself, and presented before implementation.

© Copyright Meridium, Inc. 2008. All rights reserved.

Document: RCM Fundamentals Training.doc

Page 149

Related Documents

Rcm Compresores
January 2021 1
Ejemplo Rcm
January 2021 1
Presentasi Rcm
January 2021 1
Trumpet Syllabus Rcm 2013
February 2021 2

More Documents from "Emilio Bernardo-Ciddio"