Project With Tom Loughin
ARCADE: A Real Comparison Arena for Determining Effectiveness of Prediction Methods
It is known that no one regression prediction method works best on all data sets. We hypothesize that the relative performance of different statistical learning (SL) prediction methods may be influenced by measurable properties of the data sets on which they are used. We have constructed a pilot arena called ARCADE in which SL methods can be compared on different data sets, and their performance can be related to measurable properties of those data sets. We seek now to expand the pilot by vastly increasing the number of data sets in the arena. The successful candidate will: learn how ARCADE operates; learn what data sets currently reside within it and how their properties are measured; identify existing online repositories of data; download, clean, and preprocess new data sets; measure their properties; add them to ARCADE; and produce new results on the relationship between relative performance of SL methods and the properties of these sets.