Understanding Baseball’s Batting Averages with Process Behavior Charts: Is the 2018 Season Truly Different?

47
9

As I mentioned in June, I'm in Knoxville this week to take four days of classes with Donald J. Wheeler, Ph.D. I've long admired Dr. Wheeler and his work and I was thrilled when he wrote the foreword to my latest book.

One lesson from Understanding Variation is not overreacting to a report about a single data point.

I recently saw this headline:

Baseball on pace for lowest batting average since 1972

Just because it's the lowest average in 48 years… it doesn't mean that this year's MLB-wide batting average is low in a way that's statistically meaningful.

What can we do? Go get the data. As Wheeler says, “without context, data have no meaning.” As I've done before, I go to baseball-reference.com to find the annual MLB-wide averages.

If we did what we see in many workplace reports, we'd report something like “the 2018 batting average of .249 is lower than last year's average of .255.” But, we have to, as always, be careful about comparing two data points. Is that lower batting average something that represents a meaningful “signal” or is it just “noise” in the data?

What can we do? Plot the dots. Even a simple run chart for recent years tells us more than a comparison of two data points:

If we create a Process Behavior Chart with an average and calculated Natural Process Limits for these seasons starting in 2010, the 2018 number is above the Lower Limit, so it's not a signal in and of itself:

But what if we get more data and more context? Here is the Process Behavior Chart that I created going back to 1950 (an admittedly arbitrary date). I calculated the average the limits for the first 25 years. We can see how averages have shifted over time. It's not a single predictable system over time. Things have changed in the game of baseball:

Below, I've shifted the average and limits over time as those changes appeared to occur. We know we have sustained shifts in performance when eight or more consecutive years are above or below the established average (and we look for points outside of the limits).

A Process Behavior Chart only tells us something has changed. It doesn't tell us WHAT changed. We need to understand our system. In a Lean workplace, we'd go to the gemba.

It seems that the shifts occurred:

  • 1963 – shift downward (why?)
  • 1973 – shift upward (maybe because the Designated Hitter was added to the American League?)
  • 1993 – shift upward again (due to the “steroid era?”)
  • 2010 – shift downward (due to PEDs being driven out of the game an increase in defensive shifts?)

It's not worth asking “why was the batting average higher or lower?” when those data points are in the realm of noise. It is worth asking why the system changed when we see signals in the metric.

2018's average is low for this century. It's not historically low if you go back more than 50 years.

Zooming in on the last years, the 2018 average of .249 is just above the lower limit of .248. It's not a signal.

But, as the game evolves (more defensive shifts, more emphasis on home runs and power pitchers), is 2018 the start of yet another shift downward in major league batting averages or will it fluctuate back up next year? Only time will tell.

Learn more about how to create and use “Process Behavior Charts” in this blog post or my book.

Here's an updated chart through the 2024 season… definitely a shift downward below the old Lower Limit.


What do you think? Please scroll down (or click) to post a comment. Or please share the post with your thoughts on LinkedIn – and follow me or connect with me there.

Did you like this post? Make sure you don't miss a post or podcast — Subscribe to get notified about posts via email daily or weekly.


Check out my latest book, The Mistakes That Make Us: Cultivating a Culture of Learning and Innovation:

Get New Posts Sent To You

Select list(s):
Previous articleTQM at Toyota and the Influence on Lean – Past and Present, Japan and U.S.
Next articleUnveiling Baseball’s Home Run Secrets Through Statistical Process Control: Insights from Dr. Wheeler’s Workshop
Mark Graban
Mark Graban is an internationally-recognized consultant, author, and professional speaker, and podcaster with experience in healthcare, manufacturing, and startups. Mark's new book is The Mistakes That Make Us: Cultivating a Culture of Learning and Innovation. He is also the author of Measures of Success: React Less, Lead Better, Improve More, the Shingo Award-winning books Lean Hospitals and Healthcare Kaizen, and the anthology Practicing Lean. Mark is also a Senior Advisor to the technology company KaiNexus.

9 COMMENTS

  1. Another interesting discussion that could be had is in deciding where to determine when a process change has occurred. For example 1969 could be used as the start of a new period in the chart since MLB lowered the pitching mound in response to the perception that pitchers had an unfair advantage. Or 2003 with the publication of Moneyball and the shift in evaluating players that de-emphasized traditional statistics such as batting average. Or maybe 2008 as it was the first season after the publication of the Mitchell report. And of course there are other points such as expansion years and others that I’m sure I’m missing.

    • Good question on determining where a process changed. When doing a retrospective chart, you can see signals or evidence of shifts. And then you can ask, “What happened?”

      Or you can say, “I know there were changes… did it appear to have an immediate effect?”

      You’re right about that change in 1969, but I don’t see a signal in the chart there. The MLB batting average was indeed higher in 1969 compared to 1968, but it’s in the range of common cause variation – it appears to be noise.

      “Moneyball” certainly started the trend of emphasizing On-Base Percentage over Batting Average. The chart suggests the last downward shift started in 2009. I doubt the drop in batting average was a result of the financial crisis or President Obama getting elected :-)

    • Here is a Run Chart that shows batting average and on-base percentage together. Visually, they look very correlated. The calculated correlation factor for 1950 to 2018 is 0.93 (where 1.0 is perfect correlation). The correlation from 2010 to 2018 is just 0.73, which maybe isn’t surprising with an increased emphasis on drawing walks (which increases OBP).

  2. Update — the MLB batting average ended at .248 for the 2018 season… so far for 2019, the MLB total average is just a bit higher at .249. We might have to go to four decimal places to see if last season was a signal… but there certainly seems to be a change to the game where the attempt to hit more home runs is leading to lower batting averages.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.