Crunch 1 deliverables - CSV or Notebook with training function

Hello,

Thank you for organizing the challenge!

  1. Could you please clarify the submission format for the Crunch 1 challenge? In some places, it mentions submitting a CSV file, while in others, it states that only the submission notebook with the train and infer functions is required. Additionally, there are two threads in the forum discussing this, and the information seems a bit unclear. Where could we find the most up-to-date information?

  2. If the expected submission is a notebook with train and infer functions, do you expect the model to be re-trained on your end meaning all required files + data preprocessing should be uploaded to your infrastructure as well, or is the training function only an artifact? What are your system resource requirements? Resources Limit | CrunchDAO Docs V3 (The time limits are different per competition, you must read the competition page to know the value.)

  3. Or we should just upload the predictions as a CSV file and load them in the infer function?

Thank you for clarifying this!

  1. The train and infer are mandatory in all CrunchDAO Competitions, with these functions, your code will be called and you must return a dataframe in the infer function. The “output format as csv” in the documentation is a general representation of the fact that your code should just produce a csv (aka a dataframe). The code interface will always be: Code Interface | CrunchDAO Docs V3.
  2. The train function can be called if you wish for and is not mandatory. Some people prefer to train locally and upload their model as a file.
    1. Currently there is no detailed quota because we don’t really know what participants will submit. We will increase it depending on the situation. Currently the time quota is 10 hours.
  3. Even if possible, we discourage it. As if we might provide more data during the out-of-sample phase. If you only submit a prediction file, you will never be able to profit from it. (again, might, we also might not)

Hello @enzo,

Thank you for your input. Do you already have information about whether there will be additional out-of-distribution samples? Would it be possible to inform us in advance if the validation and test sets differ from the provided data?

Thank you!

No this time the data is the same. And you have to predict the full set at once.