Question about updated data and possible train/test distribution shift

@enzo

Following the email we received about the “29 time series with a break at step zero,” I wanted to confirm that the downloadable data in the Resources section has been fully updated accordingly.

Could you please confirm that all relevant files were updated consistently, including any train/test files, index files, labels, used by the evaluation pipeline?

I also wanted to ask whether participants should expect a meaningful domain or regime shift between the train and test sets. In my experiments so far, even very simple online detectors seem to show a noticeable difference between train/CV behavior and public leaderboard behavior. I understand that some distribution shift may be intentional, but I would like to make sure this is expected and not related to a partial data refresh or mismatch between files.

Thanks a lot for checking.

Hello @meware,

I wanted to confirm that the downloadable data in the Resources section has been fully updated accordingly.

Yes. Are you using it to manually download the data? If so, why do you prefer this method over using the CLI?

Could you please confirm that all relevant files were updated consistently, including any train/test files, index files, labels, used by the evaluation pipeline?

Only the file y_train.index.parquet needed updating.

I also wanted to ask whether participants should expect a meaningful domain or regime shift between the train and test sets.

We are only considering 29 time series out of 10,000, or 0.29%. During our internal testing, we did not observe a significant change between the old and new datasets.

I’ve no preferences for manual download or CLI. I’ve just compared all files from manual download and CLI and they are the same.

Thank you for your answer.