Submission format-- CSV or notebook?

dizzy-rachel · November 2, 2024, 3:05am

For the Broad Autoimmune competition, are the submissions expected to be in the format of a CSV or as a model in a notebook? Some of the documentation says CSV (such as on this page: Broad Institute Autoimmune Disease | CrunchDAO Docs V3), but the dummy submission is a notebook with train and infer methods (Submit a Notebook - Broad Institute Autoimmune Disease Competition - CrunchDAO) .

combative-lusheng · November 2, 2024, 5:22pm

I also want to ask how to use the random-submission.ipynb. If the host would like to show demo case will be very helpful.

cruncher-abde · November 5, 2024, 4:42pm

Hello,

We have corrected the discrepancy in the document and the notebook; the correct expected format is the one provided by the notebook. Sorry for the confusion

spicy-questo · November 24, 2024, 5:56pm

Having pipeline, which is crops and featurize images , all in all from 8 zarr standard datasets it delivers ~60gb+ , how much capacity we have in cloud, is it going to handle such size?

spicy-questo · November 25, 2024, 5:52am

alright,if we need only predict on validation and test, the train data might be cut out, that way we can avoid to allocation of excessive space in cloud.

enzo · November 25, 2024, 10:36am

The large dataset is not available in the Runner.

You can read about resource limitations in the documentation: Resources Limit | CrunchDAO Docs V3

spicy-questo · November 25, 2024, 11:22am

i currently use Standard Dataset only, because the large dataset is excessive to my model

i checked the sizes of folders:
the 1.1 gb is the size of initial DC1.zarr
13 gb is the size of postprocessed DC1.zarr images where is most share is belong to centroid cropped images and additional features.
I’m gonna try to reduce it by throwing out the train data and leaving only the validation and test

enzo · November 25, 2024, 11:40am

The GPU Runner has a disk size of 100GB.
If you encounter disk size problems, contact me as soon as possible and I will increase it.
But be careful as your model cannot be larger than 10gb.

spicy-questo · November 28, 2024, 11:18am

does rm command in infer() function will work?
or it need sudo, ! or % as precursor?
rm -r ‘/tmp/DC1.zarr’
!rm -r ‘/tmp/DC1.zarr’
%rm -r ‘/tmp/DC1.zarr’

enzo · November 28, 2024, 11:35am

import shutil

shutil.rmtree("/tmp/DC1.zarr")

or

import os

os.system("rm -r '/tmp/DC1.zarr'")

Topic		Replies	Views
Crunch 1 deliverables - CSV or Notebook with training function Broad Institute Crunch #1	3	117	December 2, 2024
Random_submission.ipynb: train() function very confusing Broad Institute Crunch #1	14	139	December 2, 2024
Unable to submit notebook with large model file Broad Institute Crunch #1	11	116	January 26, 2025
Code quality of submissions Broad Institute Crunch #3	2	45	January 31, 2025
Starting in the competition ADIA Lab 2025	14	181	July 2, 2025

Submission format-- CSV or notebook?

Related topics