Is the MSE the right metric for benchmark?

Hi @separate-orr , it looks like the correlation score is being computed cell-wise instead of gene-wise. In the state-of-art literature, the correlation score is computed gene-wise (e.g., HEST1k presented now at Neurips 2024) Here is also their code base HEST/src/hest/bench/trainer.py at 5a0cbba61550ed21c66bc81c36fa1780e853245d · mahmoodlab/HEST · GitHub

The gene wise correlation is then averaged across regions and samples.

It also is more meaningful because we ask how our prediction of gene A correlates with the ground truth of gene A and so on for the remaining genes…

When cell-wise, we are looking how well the gene is predicted ONLY in the specific cell (less meaningful and easier task). When gene-wise, we are looking how well the gene is predicted across ALL cells for the given region/sample.

Happy to provide more context if needed.

Could you please look into this?
Thanks!