Sunday, July 3, 2016

Analyzing the Annual Republicans vs. Democrats Congressional Baseball Game


Every year, the United States Congress takes a break from blocking each others bills and plays a charity baseball game. The best part, the teams are broken down by party lines, Republicans vs. Democrats. The tradition started in 1909 by Representative John Tener of Pennsylvania, a former professional baseball player. Last week was the annual game and Republicans were able to break a 7 year winning streak by the Democrats.

Below the net wins over the series is shown. The higher on the y axis the more Republican wins and the lower on the y axis the more Democrat wins. From this graph it is fairly obvious that each party has had long winning streaks. The gray dots represent years when the game was not held or I could not find any information about the game. In 1935, 1937, 1938, 1939, and 1941, games were held between members of congress and the press.


The following graph displays the points scored by each team over time. In the early years of the series, the games had much higher total scores than more recent years.

Next the point differentials were explored. The point differential is the difference between the scores of the two teams. Many of the closest games were held in the late seventies through the nineties. This time period also saw few winning streaks because the competition was fairly even between the parties.
A histogram was formed to understand the distribution of point differences. The Democrats have some extremely large wins with three wins over 20 points and the Republicans have none. Another interesting finding is that only one game ended in a tie. This is surprising because the charity event does not have overtime so it is logical to think out of the 81 games played more than one would end in a tie.


Over the years, the annual game has been held at many different locations.  Each party has had different rates of success at each field. The winning percentage at each field was calculated to understand if either party has a home-field-advantage at any park. Langley High School is a bit of an outlier because it was selected as the location after two rain delays and only hosted one game. American League Park II and Georgetown Field were the first two stadiums to host the game and each only hosted one game. Memorial Stadium had the fourth fewest games with only four, but all other locations had nine or more games.

Ironically, RFK Stadium, named after the famous Democratic U.S. Senator, has given Republicans a strong home-field-advantage. Republicans have won 13 out of the 14 games played at the stadium. Democrats have seen similar success at Nationals Park; winning 7 out of the 9 games.

Currently, I am planning to update these graphs each year after the annual game. Please feel free to add ideas for additional graphs or analysis in the comment section.


Notes:
  1. The data came from https://en.wikipedia.org/wiki/Congressional_Baseball_Game#Game_results
  2. Some of the stadiums were renamed over the years and the original data set contained both names. For the analysis, the same stadiums were combined with the most recent name.
Hallway Mathlete Data Scientist

I am a PhD student in Industrial Engineering at Penn State University. I did my undergrad at Iowa State in Industrial Engineering and Economics. My academic website can be found here.