Regarding the way runs are scored

So, this is my second time participating in a crunch hacathon. I had one doubt the way the runs are scored. I submitted 4 models on saturday (Means 4 completely different models). However, the leaderboard only shows the best and the last run metrics. I want to know following things:

Which submission corresponds to the best score in the metric?

Second, is it possible to get metrics for all the models I submitted or do i need to submit each model on Monday to get the results?

You can find the exact score for each run on its detail page.
Unfortunately, we don’t have a “summary” view that displays all the scores at once.

The system doesn’t keep track of it internally. You will have to find it yourself.

All of the Run are scored on Monday, so you should be able to look at them all.

1 Like

Hello!

This is also my first time participant n a crunch hacathon. I’m wondering why my run was finished sussessfully but the score is still unknow.

I started the scoring mechanism, but got distracted.

I just published them.

Thank you for your help! I also confirm whether the score of the submission would only be announced every Monday?

Yes that correct.

Scores are released at what we call checkpoint, which happen every Monday at 6 p.m. UTC.

Thank you for the information. I am currently trying to push my submission using the Crunch CLI, but I encountered the following error:

Crunch not found.

The competition may be over or the server is not correctly configured.

If you think that is an error, please contact an administrator.

I would like to ask whether this message appears because the submission is currently only being scored at checkpoints, or if there might be an issue on my side (e.g., configuration or competition access).

Could you please advise on how to resolve this issue?

Thank you very much for your help.

Hi ding-yang-wang,

The system was stuck and our monitoring did not reported the problem.

Apologies for the delay, it is now fixed!

Thank you very much for your help! I would like to ask whether we will receive the final checkpoint results before selecting our final submission. I understand that the deadline is tomorrow, and according to the description, it states that predictions will also be scored at the beginning of the selection period. I would just like to confirm whether we are allowed to update or change our selected submission at the start of the selection period.

Additionally, I would like to clarify the definition of the validation and test sets mentioned in the description.

Are the validation and test perturbations both included within the list of 62 perturbations that we are asked to generate predictions for? Or should we also prepare code that can generate predictions for perturbations beyond those 62, so that you can evaluate additional unseen perturbations during testing?

I just want to make sure our inference pipeline is sufficiently general and compatible with the evaluation procedure.

Hi ding-yang-wang,

The final checkpoint leaderboard will be released tomorrow, and you will then up to the 17th to make your final selection. This allows people to wait until they have finished their run before making their final choice.

Your models will then be rerun on a new dataset, but only after the selection period has ended. Sorry about the typo! I will correct it right away!

Unfortunately, I cannot tell you which perturbations will be required for the test set. Your code must be able to generalize based on the inputs provided via the predict_perturbations and predict_genes parameters.

Thanks for the clarification!

I would like to confirm one detail: will the test perturbations only involve the same 19 base genes (including NC) that appear in the training set and perturbation list?

Currently, both the training data and the provided perturbation list are restricted to combinations of these 19 base genes. If the test set introduces genes outside this set, it would require precomputing conditional embeddings for all possible genes in advance. This would significantly complicate the pipeline design, as the conditioning embeddings would need to be prepared for the full gene space rather than only the observed subset.

Therefore, I would like to confirm whether we can safely assume that all perturbations in the test phase will still be composed only of the same 19 base genes (including NC), even though the combinations themselves may differ.

Also, I would like to ask whether there will be any submission required tomorrow. At the moment, my pipeline is designed specifically based on the current perturbation list.

I just want to make sure I still have chance to make my pipeline aligned with the expected generalization setting.

Thanks again for your help!

Disclaimer: I’m afraid of making a mistake, so I asked a colleague, but he hasn’t answered yet. Since there is only have 24 hours left, I’ll still try to help.

The local set (program_proportion_local_gtruth.csv) contains 6 base genes (NC is not in it), for a total of 6 perturbations.
The validation set contains 17 base genes (including NC), for a total of 21 perturbations.
The test set contains 19 base genes (including NC), for a total of of 41 perturbations.

I don’t know where you took the 19 base genes (that because I am not very good with this data sorry), so if you could tell me I could try to refine my answer.

I think the confusion comes from how I interpreted the perturbation space. Based on the file predict_perturbations_2.txt, there are 62 perturbations in total. These 62 perturbations are all combinations constructed from the same set of 19 single-gene perturbations ([‘CEBPA’, ‘CEBPB’, ‘CEBPD’, ‘CREB1’, ‘FOXO1’, ‘KIF11’, ‘KLF15’, ‘MLXIPL’, ‘NC’, ‘NR3C1’, ‘POLR2D’, ‘PPARG’, ‘PPARG2’, ‘SF3B1’, ‘SREBF1’, ‘STAT5A’, ‘STAT5B’, ‘TCF7L2’, ‘ZBED3’]). From your description, it seems that the predict_perturbations_2.txt is already containing test set and val set at once. In this case, I don’t think we need additional generalization mechanism for extra test (Please correct me if I’m wrong).

In obesity_challenge_2.h5ad, the column obs[‘gene’] contains 237 perturbations. These 237 perturbations also appear to be combinations formed from the same 19 single-gene perturbations listed in predict_perturbations_2.txt. This is what I was referring to when mentioning the 19 base genes.

So my understanding is that the conditional embeddings for the 19 single genes already cover the full perturbation space needed for inference, and the pipeline does not need to be modified to support unseen genes.

Please let me know if this interpretation is correct.

Apologies, I did not understand properly. Thanks for the details!

You are correct; neither predict_perturbations_2.txt nor obesity_challenge_2.h5ad change in this challenge set. This means that there will be no change in the 19 base genes.

Dear enzo, I noticed the leaderboard is updating but my latest score is still invisible. Could you help me check?

My mistake, I just published the leaderboard without the scores in your dashboard.

I just fixed the issue, and recomputed the leaderboard, just in case.