Is there an ‘out of sample’ phase for the structural breaks competition, or is the leaderboard data always out of sample? Does the data used for the leaderboard ever change?
Yes there will be an out of sample phase, with another 10’000 datasets that your code never saw to predict.
Thanks, I’m still a bit confused.
My understanding is that now, when you submit to the leaderboard, , it sees data you’ve never seen. Does this public leaderboard data change? Or is it static throughout the competition?
You are indeed running on 10’000 unseen dataset for the public leaderboard, but as it could be overfitted by repeated attempts, there will also be a out of sample leaderboard when the competition ends.
We will just re-run your model with this new data.
Is the local test data of 100 samples a part of the 10000 samples used in the cloud run? Also, do we expect the 10,000 further samples for private leaderboard evaluation to be a lot different in distributions ?
Yes, that is right.
The first 100 datasets given for local testing are indeed part of the 10’000 datasets used in the cloud.
And there will be another 10’000 datasets for the out of sample.