Will the public leaderboard data be used in the final scoring?

So, there are 32 time_id (cross sections) in the “public” out of sample data that are used to compute the leaderboard score.

I was wondering whether these data will be used to compute the final OOS score. Or will we be evaluated on completely new data?

Thank you!

In the out-of-sample, the data will be the continuation of the public set.
So no unknown data.

Thanks for your quick reply.

Maybe I was a bit unclear in my question. What I meant was: will the public test be used in the final scoring?

thanks

Sorry I didn’t saw your answer.

No the public leaderboard is only indicative, and only the private leaderboard will be taken into account for the out of sample phase.

Don’t overfit :stuck_out_tongue:

awesome, thanks for your reply!

1 Like

Another point @enzo, somewhat related to the above point so I am not starting a new thread.

Time id 268 is both in the training and the public test set. However, it is not used into the computation of the score (I used both my model’s predictions and the truth and got an identical score).

Just making sure that this is intended behaviour and not breaking anything subsequently :slight_smile:

Cheers!

Even though we running date 268 to 272 in the cloud, they are excluded from the final score.

They just serve as indicative values to make sure your code is running and you are able to read the logs.

1 Like

So can I assume the looping moon of start of out-of-sample, which is 2023-08-18, will be 300? Also, will the public leaderboard data be included for the final training data? @enzo

1 Like

Due to an embargo of 1 date, the out-of-sample will start on the 301 date.

Yes. The training data will be from date 0 to date 299 and extend to the OOS data iteratively if you set your train_frequency higher than 0.

Does that answer your questions?

Thanks for your reply!

1 Like

Hi @enzo

I find it confusing that it is a continuation… So basically, it will not be live data that has not occurred yet? Why would we have only 3 dates per week if you guys already have all the data available?

thanks

One thing is the frequency at which the score of the competition is updated - which is 3 times a week. This may or may not be related to the actual frequency of the data of the competition. ADIA Lab prefers not to disclose this information. The private test set is designed to allow a meaningful, unbiased, and accurate assessment of the scores.