Michael Bodek '19
The Elo rating system is a rank system that uses pairwise results to measure the relative ability of thousands of competitors. Though originally designed for chess, it is now used in settings as diverse as sports, education, and tweet ranking. The premise of the Elo rating system is that it is self-correcting in that points flow from weaker competitors to stronger competitors after each comparison. However, I hypothesize that the Elo system cannot accurately measure relative ability in a sparse network where there are many degrees of separation between the players. I test this theory using data from national youth chess championships, where competitors have Elo ratings derived primarily from play in their home region and are playing opponents from different parts of the country for one of the first times. A regression testing whether result equals Elo win probability reveals that this is not the case for five of nine region dummies. The analysis is repeated for two separate age groups. Across the two age groups, all nine region dummies have the same sign, and there is heavy overlap among the significant regions. Pairwise comparison of regions confirms the previous results showing that the level of miscalibration is greater than what would occur purely due to noise. These results demonstrate a miscalibration of local Elo rating pools. Holding ability constant, a player's Elo rating would converge to a different value depending on the local cluster in which he competes.