For the Broad Autoimmune competition, are the submissions expected to be in the format of a CSV or as a model in a notebook? Some of the documentation says CSV (such as on this page: Broad Institute Autoimmune Disease | CrunchDAO Docs V3), but the dummy submission is a notebook with train and infer methods (Submit a Notebook - Broad Institute Autoimmune Disease Competition - CrunchDAO) .
I also want to ask how to use the random-submission.ipynb. If the host would like to show demo case will be very helpful.
Hello,
We have corrected the discrepancy in the document and the notebook; the correct expected format is the one provided by the notebook. Sorry for the confusion
Having pipeline, which is crops and featurize images , all in all from 8 zarr standard datasets it delivers ~60gb+ , how much capacity we have in cloud, is it going to handle such size?
alright,if we need only predict on validation and test, the train data might be cut out, that way we can avoid to allocation of excessive space in cloud.
The large dataset is not available in the Runner.
You can read about resource limitations in the documentation: Resources Limit | CrunchDAO Docs V3
i currently use Standard Dataset only, because the large dataset is excessive to my model
i checked the sizes of folders:
the 1.1 gb is the size of initial DC1.zarr
13 gb is the size of postprocessed DC1.zarr images where is most share is belong to centroid cropped images and additional features.
I’m gonna try to reduce it by throwing out the train data and leaving only the validation and test
The GPU Runner has a disk size of 100GB.
If you encounter disk size problems, contact me as soon as possible and I will increase it.
But be careful as your model cannot be larger than 10gb.
does rm command in infer() function will work?
or it need sudo, ! or % as precursor?
rm -r ‘/tmp/DC1.zarr’
!rm -r ‘/tmp/DC1.zarr’
%rm -r ‘/tmp/DC1.zarr’
import shutil
shutil.rmtree("/tmp/DC1.zarr")
or
import os
os.system("rm -r '/tmp/DC1.zarr'")