Random_submission.ipynb: train() function very confusing

soviet-manfred · November 6, 2024, 10:35am

In the random_submission.ipynb notebook there is a train() function, with this comment:

In the training function, users build and train the model to make inferences on the test data.

Your model must be stored in the resources/ directory.

That would mean, that training should be done on the crunchDAO server.

I cannot imagine that this is what the organizers want and expect. At least when we want to use external gene expression data, how should that work.

But how can I upload the weights? Can I upload any files ? Do I need the crunch-cli for that ?

enzo · November 6, 2024, 11:07am

The resources/ directory is persisted across runs and is used to store your model so that state is preserved.

If you are submitting a notebook, you can submit “Model Files (optional)” on the
https://hub.crunchdao.com/competitions/broad-1/submit/via/notebook.

If you are submitting via crunch-cli, it is sufficient to have files in the resources/ directory for the crunch push command to detect them.

soviet-manfred · November 6, 2024, 5:41pm

Thank you that explains that I could upload the weights and access them during the infer() from the resources/ directory.

But what you have not answered is the train() function. Do we get different training data during a run ? If not is it not better to do the training before and just submit the weight and the infer() logic ?

enzo · November 6, 2024, 5:55pm

The train function is run only once, while the infer function is run for each file (data_file_path).

In the train function, you are supposed to read the data however you want (located in data_directory_path) to train your model. The function’s return value is ignored.

In the infer function, the returned value is a prediction for the current file and is used for scoring.

soviet-manfred · November 6, 2024, 8:35pm

Thank you. That makes things a lot clearer now. One thing is still left: Are the training data in data_directory_path the same as we have already downloaded via the notebook ?

enzo · November 6, 2024, 9:36pm

The parameters are only available when running your code with crunch.test(), see all parameters here.

But yes.

soviet-manfred · November 7, 2024, 9:56am

Thank you again. Last question please: That means, I can do the training on my PC and then upload the code for infer(), the model weights and leave the train() method empty ?

enzo · November 7, 2024, 10:26am

Indeed.

The train function is still mandatory, but you can write:

def train():
    pass

soviet-manfred · November 7, 2024, 5:42pm

Thank you very much. Now everything is clear !!

energetic-jiachen · November 27, 2024, 11:37pm

Hi, I have follow-up questions on the submission. I am planning to submit a notebook as the final product, and my questions are:

(1) I should use random-submission.ipynb as template, right? This means, there will be train(), infer(), crunch.test() functions in my submitted .ipynb file?

(2) Currently I am training the model locally. In the future, should I copy and paste these codes inside the train() function for submission? In addition, I would expect the output of this function is a trained model, but from the random_submission.ipynb, it seems that nothing is returned - could you clarify this?

(3) Finally, for the infer() function, based on its internal code, I don’t understand how it is using the trained model from the train() above, as it is not calling the train() function - could you comment more on this as well?

Thank you very much!

spicy-questo · November 28, 2024, 8:04am

you could use train and infer for cloud,
and crunch.test for local testing purpose which will simulate the train() infer() on your local machine
when you test both train and infer functions on local machine you will see how it behave in cloud.
when train or infer functions used you may put any logic you want there, for example save the model trained in train function in /tmp/ folder and then load for inference in infer
i do not use train , it is dummy , and it can be skipped if you select “No”

be-unique · November 30, 2024, 3:11am

@enzo This is the format of the data downloaded in my folder.

Just wnat to know this is the training data? and we need to use this data to train the model? after training the model , the models need to be saved at resources folder for each DC or UC? Still lot of confusion on how to use this for local testing? Can you please clarify on how the inference happens and infernce on which data?

be-unique · November 30, 2024, 3:17am

Wihtout using train - then how will your model predicts for inference?

be-unique · December 2, 2024, 10:54am

@enzo Now I have trained locally and I stored all my model files in resource directory. Now when I am submitting the notebook with inference code. How does the model know my resources folder in my local machine when I submit a notebook (.ipynb)? Please clarify …As the models are locally in my machine. What to do? The first option : Do I need to submit the resources folder in zipped format to “Model Files (Optional”") or Option 2: If I run this code locally # Test the implementation
crunch.test(
no_determinism_check=True,
) the resources folder will automatically replicated or pulled into your server? Can you please help me to undersntand? @enzo thanks

enzo · December 2, 2024, 1:03pm

Your infer function will called on each .zarr file that you need to predict.

Your model directory will always be available via the model_directory_path (list of parameters). Even when submitting a notebook.

When submitting files via “Model Files”, you need to select a directory that will be sent along with your code. Like this, you can include multiple files. No need to compress it.

Yes your model will always be downloaded in the runner before running your code.

Topic		Replies	Views
Crunch 1 deliverables - CSV or Notebook with training function Broad Institute Crunch #1	3	117	December 2, 2024
Code quality of submissions Broad Institute Crunch #3	2	45	January 31, 2025
Crunch 3 submission format Broad Institute Crunch #3 2025	3	61	April 30, 2025
Trained model weights ADIA Lab	5	357	May 22, 2023
Regarding labels for infer method Broad Institute Crunch #1	4	48	January 22, 2025

Random_submission.ipynb: train() function very confusing

In the training function, users build and train the model to make inferences on the test data.

Your model must be stored in the resources/ directory.

Related topics