Dalelorenzo's GDI Blog

Peer Learning Among MLB Umpires

A stretching group of social scientists are researching peer learning, looking to answer the question" does an individual learn from their network ?" In this post, I'll present some evidence that MLB umpires "learn" from their peers in their allocated crews.

To quantify this, I calculate" label character" for each adjudicator in each season from 2008 to 2019. Call quality is determined in a similar way to countless umpire composition card appraises: I make PITCHf/ x data for each competition that a opened umpire was assigned to home plate, subset to all called strikes and called missiles, and overlay the real strike area to calculate the proportion of correct calls.

I'm specifically interested in whether an umpire's bellow tone is driven by the call quality of adjudicators they have been assigned to work with in the past.

Crew Assignment and Potential for Learning

MLB umpires acquainted an excellent testbed for investigating peer understand. To study peer learning, there is a requirement to massive amounts of data in order to be able to 😛 TAGEND

Track mortals over age and across many different units/ peer systems. Relate individual-level misstep or character.

Umpiring in the majors has both: adjudicators allocated to crews at the start of the season and are frequently " shuffled" within seasons( due to vacations, hurts, disabilities, etc .) and we can use pitch tracking to determine quality for every call made.

There are also a few different channels through which peer learning might work with adjudicators. Although the final strike or dance call is made by a single home-plate umpire, the umps in a crew travel together, evaluation competition footage and calls together, and are encouraged by the league to work as a legion to preside activities. This creates a decided in which umpires( particularly umpires brand-new to the majors) might pick up on tools of the trade from high-quality peers and generally improve their accuracy.

As a particular example, experienced and high-quality adjudicators might have a lot of knowledge about pitch-framing and how how catchers can try to manipulate the summon. A new umpire working together with a good group of elderly umpires might learn more about how to deal with pitch-framing and hence make better calls in the future.


As I mentioned before, my measure of umpire call quality is the fraction of "correct" calls( true-life strikes called strikes or true-life pellets announced missiles) when a applied adjudicator is behind the plate, based on PITCHf/ x data. I calculate this game-by-game and then aggregate into an average call quality for a returned umpire in a leaved season. Below is a histogram of this call character, with 1,047 umpire-seasons in the data and an average season-level call quality of 0.885( i.e. the average umpire gets 88.5% of strike/ projectile announces correct across the season ).

I'm interested in whether there is a relationship between summon tone of a made adjudicator and the average call quality of the network of adjudicators that they worked with in sports in the previous season. There's one prime matter to take care of -- a general trend of improvement of umpires' orders over day. To account for this, I convert the umpire summon quality calibrate into a z-score by season so that in each season the average adjudicator has a call quality of 0 with a standard deviation of 1.

I then operated a regression of bellow aspect of an umpire on the average call quality of adjudicators they worked with in the previous season. I'd expect the regression coefficient to be positive if there is peer learning, and that is indeed what I ascertained. The coefficient on past-network quality is 0.1442( with a standard-error of 0.0549, for a 95% CI of[ 0.03649, 0.25198 ]). In practical terms, that means that improving the average quality of an umpire's network last-place season by one standard deviation conjures request quality by 0.1442 standard deviations.

Is It All Just Noise and Mean Reversion?

While I study the results are interesting, there are some other things to consider. For example, it could ever only be noise or planned reversion. Consider a hypothetical organization in which there's no peer learning among adjudicators, but umpires sometimes have good or bad seasons, and the league likes to match bad umpires with good adjudicators so that the average quality of a crew is roughly equal. In such a situation, an adjudicator who has a bad year will be matched with umpires "whos had" good times. If there is mean reversion, we would expect that in the following season, the umpire who did inadequately will improve and hence we will see a( specious) relationship between good adjudicators in the past and good adjudicators today. This is the main concern, and I can measure for it in two ways.

Firstly, I can check whether there's any bellow excellence liaison in duty. Using information from Retrosheet and Steve O's Baseball Umpire Sources, I can see the start-of-season crew allocations and in-season crew allocations( after "shuffles" to gangs for various reasons ). I can test for call-quality-based works by operating a regression of an umpire's last-season call quality with the average last-season call quality of adjudicators in the gang( s) that they are assigned to. If MLB assigns umpires based on call quality, there should be a relationship there.

But there isn't. The coefficient is 0.00267( with a standard error of 0.00233, for a 95% CI of[ -0. 001896, 0.0072415 ]), which means that when an adjudicator is assigned to a brand-new gang, a better quality of that gang is virtually random.

Secondly, I can do a placebo experiment by operate a regression of the announcement aspect of an adjudicator on the average call quality of umpires they worked with in the following season( instead of the previous season ). If there is season-to-season mean reversion and quality-based crew naming, then an umpire who does well one year will tend to be assigned to a crew with umpires "whos been" poorly. If there is mean reversion, then we'd expect those who did inadequately to improve, and so there will be a positive relationship between doing well one year and having a better crew next year. Of track, if the result is driven by peer teach, then we'd expect there to be no liaison -- you can't learn lessons from people you haven't worked with yet!

And the regression shows there's no relationship. Running this regression produces a coefficient on next season's structure excellence of -0. 00407( with a standard error of 0.0594, for a 95% CI of[ -0. 12059, 0.11245 ]).


It searches as though umpires learn lessons from their peers in their appointed gangs, so an adjudicator to designate a gang that attains better strike and lump orders will tend to have better quality bawls in the future.

Jed Armstrong is currently working on a PhD in labor economics and writing up these sees into an academic essay. If you have any suggestions or observations, or would like to see the draft, feel free to get in touch in the comments or on Twitter.

var SERVER_DATA= Object.assign( SERVER_DATA || );

Read more: community.fangraphs.com

Comments (0) Trackbacks (0)

No comments yet.

Leave a comment

No trackbacks yet.