Leakage across Dates?

casual-andy · June 17, 2023, 7:51pm

Within the training dataset there are 250+ dates. If we are splitting the data into various train/test sets by date, should there be a gap of N dates between the train and test sets to avoid overtraining leakage? If so, what is the recommended N?

xgilbert · June 21, 2023, 10:50am

Hi casual-andy!

The embargo between the train/test set is of 1 date.
So if you want to do multiple train test sets by date in a walk forward fashion, I would consider having one date obfuscated/dropped between any of your train/test set

multiple-masahiro · June 23, 2023, 4:37am

Mr xgilbert, I have a question.

The embargo between the train/test set is of 1 date.

Does this mean that there is a one-day gap between the last date of X_train passed to the train method and the date of X_test passed to the infer method when these codes are executed on the competition server?

xgilbert · June 23, 2023, 8:19am

A date is a time period. It can be one week, one month or another time period. CrunchDAO doesn’t have this information.
Hope this helped

multiple-masahiro · June 23, 2023, 12:49pm

I apologize for asking in a way that could be misunderstood.
I would like to know how many time index gaps there are between training and inferring.
Specifically, if the last “date” of train is t, then the “date” of infer is t+1, not t+2, is that correct?
I am a little confused by the word “embargo”.

xgilbert · June 23, 2023, 1:53pm

If the last date of train is t and there is one date embargo, then the first date of infer is t+2.
The embargo is t+1 date since it contains information about t+2 target. This date thus can’t be used.

Embargo is just the gap between the train and test sets to avoid leakage.

multiple-masahiro · June 23, 2023, 2:46pm

Thank you for the clear explanation. I understand now!

Topic		Replies	Views
Save X_test at inference time for use in train ADIA Lab	6	251	August 14, 2023
Crunch.test() returns incorrect X_train, y_train data DataCrunch 2025	6	33	June 2, 2025
Couldn't understand Resource limit rule ADIA Lab	1	164	June 30, 2023
Test function retraining frequency ADIA Lab	1	200	August 3, 2023
How many dates in OOS ADIA Lab	4	338	August 11, 2023

Leakage across Dates?

Related topics