Will the public leaderboard data be used in the final scoring?

vivacious-thomas · July 25, 2023, 8:22am

So, there are 32 time_id (cross sections) in the “public” out of sample data that are used to compute the leaderboard score.

I was wondering whether these data will be used to compute the final OOS score. Or will we be evaluated on completely new data?

Thank you!

enzo · July 25, 2023, 9:22am

In the out-of-sample, the data will be the continuation of the public set.
So no unknown data.

vivacious-thomas · July 25, 2023, 9:25am

Thanks for your quick reply.

Maybe I was a bit unclear in my question. What I meant was: will the public test be used in the final scoring?

thanks

enzo · July 25, 2023, 2:38pm

Sorry I didn’t saw your answer.

No the public leaderboard is only indicative, and only the private leaderboard will be taken into account for the out of sample phase.

Don’t overfit

vivacious-thomas · July 25, 2023, 4:17pm

awesome, thanks for your reply!

vivacious-thomas · July 25, 2023, 10:16pm

Another point @enzo, somewhat related to the above point so I am not starting a new thread.

Time id 268 is both in the training and the public test set. However, it is not used into the computation of the score (I used both my model’s predictions and the truth and got an identical score).

Just making sure that this is intended behaviour and not breaking anything subsequently

Cheers!

enzo · July 25, 2023, 10:28pm

Even though we running date 268 to 272 in the cloud, they are excluded from the final score.

They just serve as indicative values to make sure your code is running and you are able to read the logs.

newbee · August 2, 2023, 2:32am

So can I assume the looping moon of start of out-of-sample, which is 2023-08-18, will be 300? Also, will the public leaderboard data be included for the final training data? @enzo

xgilbert · August 3, 2023, 9:53am

Due to an embargo of 1 date, the out-of-sample will start on the 301 date.

Yes. The training data will be from date 0 to date 299 and extend to the OOS data iteratively if you set your train_frequency higher than 0.

Does that answer your questions?

newbee · August 3, 2023, 1:34pm

Thanks for your reply!

vivacious-thomas · August 3, 2023, 5:26pm

Hi @enzo

I find it confusing that it is a continuation… So basically, it will not be live data that has not occurred yet? Why would we have only 3 dates per week if you guys already have all the data available?

thanks

cruncher-jean · August 4, 2023, 4:44pm

One thing is the frequency at which the score of the competition is updated - which is 3 times a week. This may or may not be related to the actual frequency of the data of the competition. ADIA Lab prefers not to disclose this information. The private test set is designed to allow a meaningful, unbiased, and accurate assessment of the scores.

Topic		Replies	Views
Current score is from how many time periods ADIA Lab	5	423	August 14, 2023
Is the Out-of-Sample scoring started? ADIA Lab	1	188	August 27, 2023
How many dates in OOS ADIA Lab	4	342	August 11, 2023
Which model will be running Out-of-Sample (OOS) ADIA Lab	3	316	May 31, 2023
I didn't see any score for out of sample ADIA Lab	20	277	August 29, 2023

Will the public leaderboard data be used in the final scoring?

Related topics