In the DataCrunch competition, when I used crunch.load_data() to load X_train and y_train, the moon column has values from 0 to 468. But when running crunch.test() the X_train has moons: 0 to 455. Is this a bug?
The overview page for the competition states that last 13 moons are used as test_data in X_test and y_test, that would mean moons 456 to 468 are being used as test data but this is not the case, since X_test moons are 469 to 480.
It doesn’t make sense as to why moons 456-468 are neither in the train data nor the test data. Some clarification and help would be appreciated, thanks
When loading the data, you are given a fixed number of moons to train.
However, when running the test, your code can retrain based on the moon (like every 2 moons) with the data used to predict previous moons. Else retraining on the same moons would be useless.
to clarify the question a bit…
When running the notebook, when my model is making prediction on moon 469, it is given training data (through X_train, y_train in train function) of moons 0 to 455 only, this doesn’t include the latest 13 moons. My assumption was that the model would be trained on moons 0 to 468 and used to predict moon 469, but the most recent data (456-468) is missing in the training data
Next when it is predicting moon 470, it is given training data of moons 0 to 456, still the latest 13 moons remain missing
Is this intended, is the model not supposed to have access to the latest 13 moons?
the part where i wanted clarification was:
when moon=469, X_train has data up to moon 455. shouldn’t this be 468, so the model can use latest data to make its prediction? or is this gap intentional?
Loading X_train through crunch.load_data, I see moons from 0 to 468. but X_train in train function has moons 0 to 455, why are the last 13 moons removed?