Re-apply normalization

Hi, I was reviewing the code base for the challenge and noticed a function that enforces log1p normalization on the predictions in the scoring scheme: feat(broad-1/scoring): re-apply normalization if necessary · crunchdao/competitions@3d72103 · GitHub.

I would argue that this step may not be necessary or desirable. Introducing normalization based on the sum of all predictions could create a dependence on the predicted variables. This might be problematic if the gene expressions for different genes are treated as independent predictions. For example, if I’m predicting some genes (X) well and others (Y) poorly, normalizing everything by the sum of both X and Y could lead to underperformance. This is because the mispredictions in Y might negatively impact the overall performance, even if X predictions are strong.

Therefore, it may be better to leave the predictions untransformed as they are. What are your thoughts on this?

Following your feedback and after discussing with the Broad team, we have decided not to reapply the normalization.

1 Like

Just to clarify, so now we are supposed to predict gene expression counts instead of log1p-transformed values, right?

No, you still need to normalize the gene expressions you are predicting.

We have only modified the scoring function to no longer apply the normalization. Initially, we intended to include a check to ensure the normalization was done, and if not, apply it automatically. Now, we’ve decided to remove that step.

Thank you for taking the feedback into account @cruncher-abde