Re-apply normalization

many-kalin · December 3, 2024, 10:33pm

Hi, I was reviewing the code base for the challenge and noticed a function that enforces log1p normalization on the predictions in the scoring scheme: feat(broad-1/scoring): re-apply normalization if necessary · crunchdao/competitions@3d72103 · GitHub.

I would argue that this step may not be necessary or desirable. Introducing normalization based on the sum of all predictions could create a dependence on the predicted variables. This might be problematic if the gene expressions for different genes are treated as independent predictions. For example, if I’m predicting some genes (X) well and others (Y) poorly, normalizing everything by the sum of both X and Y could lead to underperformance. This is because the mispredictions in Y might negatively impact the overall performance, even if X predictions are strong.

Therefore, it may be better to leave the predictions untransformed as they are. What are your thoughts on this?

cruncher-abde · December 6, 2024, 4:05pm

Following your feedback and after discussing with the Broad team, we have decided not to reapply the normalization.

relieved-jingzhe · December 6, 2024, 4:36pm

Just to clarify, so now we are supposed to predict gene expression counts instead of log1p-transformed values, right?

cruncher-abde · December 6, 2024, 5:05pm

No, you still need to normalize the gene expressions you are predicting.

We have only modified the scoring function to no longer apply the normalization. Initially, we intended to include a check to ensure the normalization was done, and if not, apply it automatically. Now, we’ve decided to remove that step.

many-kalin · December 6, 2024, 7:08pm

Thank you for taking the feedback into account @cruncher-abde

Topic		Replies	Views
Log1p normalization scale factor Broad Institute Crunch #2	1	48	November 21, 2024
0 capping in the case of negative values and rounding values to 2 decimal points Broad Institute Crunch #1	7	73	December 18, 2024
How is the log1p-normalized data calculated? Broad Institute Crunch #1	1	30	November 28, 2024
When will the results be available for the first checkpoint, which was on 30.11.2024? Broad Institute Crunch #1	6	61	December 5, 2024
Unknown evaluation metric Broad Institute Crunch #1	5	104	January 3, 2025

Re-apply normalization

Related topics