Random_submission.ipynb: train() function very confusing

In the random_submission.ipynb notebook there is a train() function, with this comment:

In the training function, users build and train the model to make inferences on the test data.

Your model must be stored in the resources/ directory.

That would mean, that training should be done on the crunchDAO server.

I cannot imagine that this is what the organizers want and expect. At least when we want to use external gene expression data, how should that work.

But how can I upload the weights? Can I upload any files ? Do I need the crunch-cli for that ?

The resources/ directory is persisted across runs and is used to store your model so that state is preserved.

If you are submitting a notebook, you can submit “Model Files (optional)” on the
https://hub.crunchdao.com/competitions/broad-1/submit/via/notebook.

If you are submitting via crunch-cli, it is sufficient to have files in the resources/ directory for the crunch push command to detect them.

Thank you that explains that I could upload the weights and access them during the infer() from the resources/ directory.

But what you have not answered is the train() function. Do we get different training data during a run ? If not is it not better to do the training before and just submit the weight and the infer() logic ?

The train function is run only once, while the infer function is run for each file (data_file_path).

In the train function, you are supposed to read the data however you want (located in data_directory_path) to train your model. The function’s return value is ignored.

In the infer function, the returned value is a prediction for the current file and is used for scoring.

Thank you. That makes things a lot clearer now. One thing is still left: Are the training data in data_directory_path the same as we have already downloaded via the notebook ?

The parameters are only available when running your code with crunch.test(), see all parameters here.

But yes.