I was looking for an intuitive way to demonstrate to my students the need for parsimony in model building, as well as the problem of overfitting and I remembered the humorous paper by James Wel: showing that elephants are obviously created by Fourier sine series! I went a step further and implemented some popular selection methods and interpolation. It is interesting to see how the different selection methods perform, given different number of sines and how they (over-/under-) fit, when asked to interpolate – in a predictive modelling spirit.

This is very nice. I love my dada-ist interpolated overfitted elephant! (What does the red vertical line in the “Selected variables” bar at the bottom mean?)

Thanks! The red line is where the number of variables + constant equals the number of data points (p+1=n).