All submissions failed this week's OOS scoring — feature schema changed 1150 → 1157 with no advance notice?

Hi team,

Every one of my submissions failed this week’s out-of-sample scoring with the same error. After going through the run logs, the cause is a change to the dataset’s feature schema that — as far as I can tell — wasn’t communicated in advance.

What happened

All of my models errored during infer with an XGBoost feature-name mismatch:

ValueError: feature_names mismatch: [...]

training data did not have the following fields:

Feature_1151, Feature_1152, Feature_1153, Feature_1154, Feature_1155, Feature_1156, Feature_1157

This week’s data release (data-releases/240) ships 1157 raw Feature_* columns instead of 1150 — seven new features (Feature_1151Feature_1157) were added. The original 1150 features are all still present, so it’s a purely additive change.

Why it breaks

Following the documented convention, my code selects features with column.startswith("Feature_"). That now sweeps in the seven new columns and feeds them to models that were trained before those columns existed, so XGBoost rejects them at predict time. This hit every model I’ve submitted — all of which scored cleanly in previous weeks. Nothing on my side changed; the input schema changed underneath the existing submissions.

Impact

All my submissions for this scoring cycle failed and (as far as I can tell) produced no score for the week.

Questions

  1. Was this feature addition announced anywhere ahead of time? I couldn’t find any notice — if one exists, please point me to it.

  2. Going forward, where will feature-set / schema changes be communicated, and how much advance notice will participants get?

  3. Will additive feature changes come with a backward-compatibility or transition window so that already-submitted models don’t break silently?

  4. Given the lack of notice, will affected submissions for this cycle be re-scored or exempted from any ranking/reward penalty?

Happy to share full run logs and submission IDs (e.g. 56135, 49572, plus the rest) if that helps support reproduce it. Thanks.

Hi b-ionut-r,

This is a mistake on our end, we are already aware of the situation.

Those new features were added by accident by our data provider.

Our fix will be to remove all runs from this week, and re-run them with the correct dataset.

1 Like

I see. Thanks for the quick response!

Another question, if i may. Does train_frequency being set to 0 (default) mean the model is only trained once and then only inference is run each week? Is this the core of the issue here?

Having a train frequency of 0 is indeed that, your train function is only run once at the very beginning, and only the inference is run each week.

No, the code issue is on our platform, it is not related to user models themselves.

1 Like

The models are currently being rerun. Both your extra-augmented model and your xgboost-augmented model have already finished with an error (others are still running).

We have also removed the old run to avoid cluttering up your dashboard.