Confused about prediction file size and test set

technical-sara · July 5, 2025, 5:12pm

When I run my local pipeline using X_test.reduced.parquet and y_test.reduced.parquet, my prediction.parquet output always contains 101 rows, matching the reduced test set. However, I was expecting to generate predictions for the full test set (10,000+ time series), but I can’t find a full version of X_test.parquet or y_test.parquet in the data files I have. Will I get the accurate ROC AUC score from the full y_test,x_test only if I upload my code and model ?

enzo · July 5, 2025, 7:25pm

That is correct, locally you are only able to access a (very) small subset of the dataset and you must run in the cloud to make sure you code indeed run.