Monday, December 1, 2014

Downtime Follies

The journal "Academy of Management Executive" published a classic article entitled "On the folly of rewarding A, while hoping for B" by Steven Kerr (1995, Vol 9. No. 1).  The article explains why we end up with a less desirable state when we really want something better.  Kerr covers a wide array of sub-optimal situations: medicine, consulting, sports, government, and business.  He concludes that the reward system is at the heart of the issue; there are organizational forces which reward the less desirable behaviors and essentially starve the more desirable ones.  Jim Collins opens his book "Good to Great" with a similar thought: "Good is the enemy of great." We want things to be better, but we continually accept and reward the mediocre.  This is no less true on the plant floor than it is in other areas of life, and asset downtime provides a prime example. 

Asset Performance

This posting is not specifically about machine ​downtime, but about how people respond to downtime and the consequences of their decision-making processes.  Consider the following graphic:
Performance.jpg
It shows two identical machines in running and stopped states over time (the horizontal axis).  Note that OAE is short for "overall asset effectiveness", a metric that is essentially the same as OEE, or overall equipment effectiveness.  In this example, total running time is the same, production is the same, and asset effectiveness is the same (i.e. quality and schedule are equivalent between machines).  Answering the question of which performs better isn't obvious based on these metrics, but common sense would say that Machine A is more desirable than Machine B.  It has a higher reliability (greater mean-time between failure, or MTBF) even though the individual downtime periods are longer.  Since it has fewer downtimes, it is easier to address the downtime causes.  Machine A also has lower operating costs – starting and stopping equipment is inefficient and costly.  There are also greater opportunities for quality issues to occur when there are frequent starts and stops, and of course there is more wear on the equipment which increases maintenance requirements. How is it we end up with assets performing more like Machine B, with more frequent downtime events, but faster recovery (mean-time to restore, or MTTR)?

Unwritten Rules

Let's examine some of the unwritten rules which dominate the production environment:
  • Rule 1: Downtime is bad.
  • Rule 2: Long periods of downtime are worse than short periods of downtime.
  • Rule 3: People should not be idle in the production environment.
All three rules are highly logical; who would question them?  Why would a manufacturer invest capital in equipment that isn't producing, and why pay people to stand around?  But notice these are also exactly the rules that drive behaviors which cause more frequent but shorter downtimes.  Technicians are rewarded for reducing downtime duration without maintaining equipment reliability.  As long as production schedules are being met and asset effectiveness metrics are maintained, equipment reliability and process predictability are of little concern.  I'm familiar with one case where process MTBF was less than 2 ½ minutes, but because the machines could usually recover without operator intervention the failures were not even noticed (MTTR was around 5 seconds).  The MTTR frequency distribution was highly skewed toward the lower end, with median MTTR around 2 seconds.  In short, the machine was effectively unreliable, stopped frequently but recovered quickly, and the operators were essentially unaware of the downtime.  Since this type of performance became accepted practice, budgeting and planning were built around the sub-optimal operation and there was little incentive to change.

The Point

An organization which embarks on an operational excellence initiative must be willing to challenge the unwritten rules which govern plant floor behavior because they have unintended consequences.  There are no business drivers which would cause anyone to consciously choose Machine B over Machine A, but that is exactly what ends up happening as a result of these cultural imperatives.  The ability to measure downtime does not automatically translate into better asset effectiveness, but it should cause the organizational introspection and culture changes which do.  The folly of accepting B while expecting A comes in rewarding inefficient processes because they fit into a flawed set of beliefs.

No comments:

Post a Comment