Unknown evaluation metric

Hi,

Throughout Crunch 1 and Crunch 2, there have been various discussions and changes regarding the evaluation metrics (MSE vs Pearson vs Spearman). However, it remains unclear which metrics will be used. There are also concerns that the current metrics may not align with state-of-the-art metrics referenced in recent literature (cell-wise vs gene-wise correlation).

It is challenging to optimize models effectively without a clear understanding of the challenge objectives. Could you kindly provide more details on the evaluation metrics soon?

Thank you!

  1. Is the MSE the right metric for benchmark? - #9 by many-kalin

  2. Re-apply normalization - #5 by many-kalin

  3. Is Spearman's rank correlation the right metric for benchmarking? - #3 by soviet-manfred

1 Like

@enzo @cruncher-abde Would it be possible to add the Spearman computed gene-wise to the leaderboard as a side column just to look at it while we are waiting for decision? It requires a small tweak in the code and anyway the metrics are not decided yet. Maybe it will be even helpful for the final decision.

Sorry, but this is quite a decision. We need to ask the broad team if they are okay with it.

Could you also provide us an implementation similar to the cell-wise implementation?
I am not sure how to implement your previous comment.

Hi, here is a gene-wise implementation similar to your cell-wise implementation. Also, I am including a second shorter implementation with less transformation - just utilizing pandas.

import pandas
import numpy
import scipy

def _spearman_cell_wise(
    prediction: pandas.DataFrame,
    y_test: pandas.DataFrame,
):
    cell_count = len(y_test.index)
    weight_on_cells = numpy.ones(cell_count) / cell_count

    A = y_test.to_numpy()
    B = prediction.to_numpy()

    rank_A = scipy.stats.rankdata(A, axis=1)
    rank_B = scipy.stats.rankdata(B, axis=1)

    corrs_cell = (
        numpy.multiply(rank_A - numpy.mean(rank_A), rank_B - numpy.mean(rank_B)).mean(axis=1)
        / (numpy.std(rank_A, axis=1) * numpy.std(rank_B, axis=1))
    )

    corrs_cell[numpy.isnan(corrs_cell)] = 0

    return numpy.sum(weight_on_cells * corrs_cell)

def _spearman_gene_wise(
    prediction: pd.DataFrame,
    y_test: pd.DataFrame,
):
    # Ensure that y_test and prediction have the same number of columns
    assert prediction.shape[1] == y_test.shape[1], "Prediction and y_test must have the same number of features (columns)"
    
    feature_count = prediction.shape[1]
    weight_on_features = np.ones(feature_count) / feature_count

    # Convert DataFrames to numpy arrays for easier manipulation
    A = y_test.to_numpy()
    B = prediction.to_numpy()

    # Compute ranks for both y_test and prediction for each feature (column-wise)
    rank_A = scipy.stats.rankdata(A, axis=0)  # Rank along features (columns)
    rank_B = scipy.stats.rankdata(B, axis=0)

    # Compute Spearman correlation for each feature
    corrs_feature = (
        np.multiply(rank_A - np.mean(rank_A, axis=0), rank_B - np.mean(rank_B, axis=0)).mean(axis=0)
        / (np.std(rank_A, axis=0) * np.std(rank_B, axis=0))
    )

    # Handle any NaNs in correlation values (can happen if there's no variation in a feature)
    corrs_feature[np.isnan(corrs_feature)] = 0

    # Return the weighted sum of feature-wise correlations
    return np.sum(weight_on_features * corrs_feature)


def _spearman_cell_wise(
    prediction: pandas.DataFrame,
    y_test: pandas.DataFrame,
):
    score = A.corrwith(B, method="spearman", axis=1).fillna(0).mean()
    return score


def _spearman_gene_wise(
    prediction: pandas.DataFrame,
    y_test: pandas.DataFrame,
):
    score = A.corrwith(B, method="spearman", axis=0).fillna(0).mean()
    return score

I encourage you to use the second simpler implementation, which only utilizes pandas. You could also easily switch to the person correlation if needed.

Thanks you very much.

I originally just implemented what the broad team gave me.
Making a mistake here would be horrible. Just copy-pasting their code make my responsability lighter.