Comparing XC Speed from Various State Championship Meets to Predict Results at Nike Team Nationals ... And Some Other XC Speed-Related Concepts

by Bill Meylan (July 2005)

This article was prompted by questions from viewers wanting more information about the methods I use to compare XC races from different states, and in particular, more specifics on how I generated the NTN races predictions for 2004 .... Those predictions are available at this link.

The NTN predictions were done fairly quickly as I had neither the desire nor the time to be more precise. With the exception of the New York and New Jersey teams (and a couple of races for York and the Woodlands), I did not follow the NTN teams during the XC season at all ... so nearly all predictions were based strictly on results from respective State championship races.

Background Concepts

The comparison techniques are simply statistical in nature ... I do not have any secret insights or unique knowledge in this regard ... Any person loony enough to spend the time gathering results, placing the data in databases, and evaluating the data with common statistical methods can do it ... My results can be duplicated and reproduced by people other than myself ... One real-life statistician actually confirmed that I am loony (in regards to the amount of time it takes to do my speed ratings and rankings for NY State and Section 3) ... The NTN prediction process was actually simple compared to my normal process because I wasn't concerned about exact precision or individual runners.

One background concept can be explained by the following analogy ... Consider TV ratings ... TV ratings are generated by gathering individual viewer responses and compiling the responses into suitable categories with simple statistics ... The categories might be "All-Ages", "14-to-18 year olds", or "65 and older" ... If trying to predict the favorite TV show of "14-to-18 year olds", using past response data from "65 and older" is probably a bad idea - You want data from the exact group of people you are evaluating.

My sole goal has been to evaluate the speed of varsity high school XC runners ... Therefore, all the data I collect relates specifically to this group of runners ... And this group of runners has a number a sub-groups which include (but is not limited to) the following:
... (1) Boys
... (2) Girls
... (3) All boy or girl varsity runners
... (4) Varsity runners competing in State championships

This is important - speed data for high school runners can NOT adequately evaluate the speed of college level runners or "all-age" runners in 5K road races ... If somebody wants to speed rate college runners, you MUST collect data for that group of runners and produce a base-line for that group of runners (my high school graphs will NOT work correctly).

Base-Line

A "base-line" is just a profile that describes the overall speed of a group or sub-group ... Making a base-line is fairly straightforward - For example, the most common base-line in high school XC applies to all varsity boys or all varsity girls ... To make this base-line, find an invitational (or similar race) that includes a large number of runners from many different schools (large, medium and small; good, average, not-so-good) run on the same day at the same course under the same conditions (so merging results is possible).

The graph below illustrates two possible base-lines ...

The SUNY-Utica line above is the actual base-line I use for high school boy's results - It came from results at the 1999 Section 3 championship meet held at the SUNY-Utica XC course (which is a very slow course compared to most courses) ... the other line above came from results at the 2004 Shore Coaches at Holmdel Park ... I could easily use the Shore Coaches as my base-line in NY (the only difference in the two is the difference in course speed ... the red lines are exactly parallel).

A base-line graph simply plots actual race times versus finish-position ... when results of two different races are plotted on the same graph, the number of runners in each races MUST be scaled so they are exactly equal (for example, if one race has twice as many runners as another, just divide each finish-position of the bigger race by two).

NTN Application

The base-line graph above works well for determining the relative speed of races as long as they apply to the "all varsity boys" sub-group ... Since I'm considering State Championship Meet results (a different, higher quality sub-group of runners), the base-lines above can NOT be used for finding the relative race speeds ... So the only solution is to make base-line graphs pertaining to State Meets ... Fortunately, New York has two State Meets, and I have good statistical handle on how fast the runners are in those two meets (State Class Meet and Federation Meet).

The graph below has three base-lines illustrated ... the "Adjusted NY Federation" line is the actual results of the Boys Federation Meet with the runners from some bottom teams removed (to make it a higher quality result) ... The three base-lines demonstrate some graph concepts I've been asked to explain.

The red-lines are drawn through the straight-line portions of the actual race results ... the straight-line portion primarily represents what I call "the average runners" in the race (the largest segment of the population) ... The important aspect of the red-lines above is the slope of the line ... As the quality of the race increases (more better runners), the slope of the line decreases (this is a very common phenomenon with these types of graphs).

Why is the slope so important?? ... The slope of a graph (race time vs. finish position) is a direct measure of the rate at which runners are slowing down ... Predominantly, in higher quality races, the "average" runners slow down less than "average" runners in lower quality races ... Remember, we are looking at groups of runners (nameless) and not individual runners.

STATE MEETS

Here comes a blanket statement (with no confirming data because I don't have time to post any) ... The quality of different State Championship Meets from around the country is NOT equal ... The quality at some meets is better than others ... How the meets are divided into classes (or not) also makes a difference ... So How Did I Make a Comparison? ... These are the steps:

(1) Get the complete results for the State Meet being evaluated (at times, this was the most time-consuming step because full results for some meets were hard to find or were in some awful electronic format that was hard to get into a database or spreadsheet).

(2) Graph the results in several ways (if possible) including merged results (which also takes time because I had to do the merging myself in many cases)

(3) Match the Out-of-State Meet result graphs against the various NY State Meet base-lines I derived ... Find the best match (or matches) ... the slope was a key consideration.

(4) After finding the best match(s), use the known speed of the NY runners to determine the relative speed of the out-of-state runners (as speed ratings).

Here's an example using the York (Illinois) Boy's State Meet:

The graph above plots the actual (full) race results for the Boys NY Federation Race and the Illinois State Meet (that included the York team) ... the solid red lines are the best-fits for the "average" runner data (which is exactly the same as finding the base-lines in the previous graphs) ... the dotted red line is exactly parallel to the Illinois line, and has been raised to intersect the solid NY Federation line - this shows the difference in slopes between the NY Federation race and Illinois race ... This Illinois boy's State race was higher in quality (speed) than the NY Federation race - and as a point of reference, it was the highest overall quality race of all the State championship races I examined.

The dotted blue line above is also parallel to the Illinois race ... It represents the best-fit using "adjusted" Federation data (that's why I showed an "Adjusted Federation" example above) ... The time difference between the dotted blue line and the Illinois line is about 80 seconds which means that the Illinois race was 80 seconds faster on average than the Federation race - Therefore, the course conversion between the 3.1 mile Bowdoin Park (NY) course and the 3.0 mile Detweiller Park (IL) was about 80 seconds for these two race days ... As a side note concerning course conversions - any course conversion method must account for daily variations due to weather and other factors that may slow or accelerate race times ... I use course conversion charts myself, but demand some statistical proof of the daily variation; otherwise, some nasty inaccuracies can be introduced into a comparison.

Continuing with the NY Federation and Illinois State race comparison ... I had already determined that the daily course conversion between Bowdoin Park and my base-line course (SUNY-Utica) was 15 seconds ... so I added 15 seconds to all the race times at NY Federations ... Since the conversion between Bowdoin and the Illinois course was 80 seconds, I added 95 seconds (15 + 80) to the Illinois State race to make it comparable to SUNY-Utica ... Those adjusted race times were then converted to speed ratings ... that's how Sean McNamara got a rating between 197 and 198 and the Dettman's got ratings just above 190 ... I prefer speed ratings to the adjusted times because it's much easier to do math on a number than a time (and I perform a fair amount of number-crunching).

That's how I did it ... I did it as quickly as possible simply to get it done! ... And I posted the ratings for every individual runner so viewers could see exactly where my team prediction numbers came from ... Not surprisingly, most complaints centered around ratings for individual runners (as opposed to the team scores) ... But this exercise centered on teams, not individuals ... I spent absolutely no time evaluating individual runners.

Also

Often, I spend more time on girl's ratings than the guys (even though the guys complain a lot more about ratings) ... NTN was different ... It was a foregone conclusion that Saratoga would win easily and Smoky Hills would be second, so I didn't waste time with "high" precision or a re-evaluation of the Texas girls because their State Meet is a miniscule 3K.