My approach (4.22 CV / 3.704 public LB)


I’d like to detail my approach and share the validation results.
All validations results are calculated as an average from moon 50 to moon 268.

I used four types of models:

  • LightGBM (CV 3.74) (100 features) (loss rmse)
  • ExtraTreesRegressor (CV 3.65) (100 features) (loss squared_error)
  • Catboost (CV 3.68) (89 features) (loss rmse)
  • NN (CV 3.56) (89 features) (loss mse)

I selected the features with forward feature selection.

Validation and Results:
Through stacking these models I got CV 4.22 (LB 3.704).

Target transformation

Extra features
Number of IDs in the previous moons

CV-LB consistency
I found that big improvements were reflected in LB but sometimes small improvements were not reflected due to certain degree of randomness that exists

Historical data and retraining
I found that just using last 10 moons for training gives the best results in my case, so I put 1 retrain every moon and the training data is the last 10 moons.

Observation on Randomness:
During the course of the competition, my best public submission (4.15 LB) was obtained inadvertently due to a bug in the code. This discovery sheds light on the presence of a certain degree of randomness inherent in the competition.


What was the train frequency set?

I train it every moon because my training time is short.

The CV scores you presented were calculated using a rolling-origin CV (retrain on each date using last 10 dates). Correct?

yes, exactly!

Can you explain the rationale behind this target transformation?

I am curious now. How are the features selected ?