Unknown evaluation metric

many-kalin · December 23, 2024, 1:23pm

Hi,

Throughout Crunch 1 and Crunch 2, there have been various discussions and changes regarding the evaluation metrics (MSE vs Pearson vs Spearman). However, it remains unclear which metrics will be used. There are also concerns that the current metrics may not align with state-of-the-art metrics referenced in recent literature (cell-wise vs gene-wise correlation).

It is challenging to optimize models effectively without a clear understanding of the challenge objectives. Could you kindly provide more details on the evaluation metrics soon?

Thank you!

many-kalin · January 1, 2025, 2:20pm

@enzo @cruncher-abde Would it be possible to add the Spearman computed gene-wise to the leaderboard as a side column just to look at it while we are waiting for decision? It requires a small tweak in the code and anyway the metrics are not decided yet. Maybe it will be even helpful for the final decision.

enzo · January 3, 2025, 11:34am

Sorry, but this is quite a decision. We need to ask the broad team if they are okay with it.

Could you also provide us an implementation similar to the cell-wise implementation?
I am not sure how to implement your previous comment.

many-kalin · January 3, 2025, 11:54am

Hi, here is a gene-wise implementation similar to your cell-wise implementation. Also, I am including a second shorter implementation with less transformation - just utilizing pandas.

import pandas
import numpy
import scipy

def _spearman_cell_wise(
    prediction: pandas.DataFrame,
    y_test: pandas.DataFrame,
):
    cell_count = len(y_test.index)
    weight_on_cells = numpy.ones(cell_count) / cell_count

    A = y_test.to_numpy()
    B = prediction.to_numpy()

    rank_A = scipy.stats.rankdata(A, axis=1)
    rank_B = scipy.stats.rankdata(B, axis=1)

    corrs_cell = (
        numpy.multiply(rank_A - numpy.mean(rank_A), rank_B - numpy.mean(rank_B)).mean(axis=1)
        / (numpy.std(rank_A, axis=1) * numpy.std(rank_B, axis=1))
    )

    corrs_cell[numpy.isnan(corrs_cell)] = 0

    return numpy.sum(weight_on_cells * corrs_cell)

def _spearman_gene_wise(
    prediction: pd.DataFrame,
    y_test: pd.DataFrame,
):
    # Ensure that y_test and prediction have the same number of columns
    assert prediction.shape[1] == y_test.shape[1], "Prediction and y_test must have the same number of features (columns)"
    
    feature_count = prediction.shape[1]
    weight_on_features = np.ones(feature_count) / feature_count

    # Convert DataFrames to numpy arrays for easier manipulation
    A = y_test.to_numpy()
    B = prediction.to_numpy()

    # Compute ranks for both y_test and prediction for each feature (column-wise)
    rank_A = scipy.stats.rankdata(A, axis=0)  # Rank along features (columns)
    rank_B = scipy.stats.rankdata(B, axis=0)

    # Compute Spearman correlation for each feature
    corrs_feature = (
        np.multiply(rank_A - np.mean(rank_A, axis=0), rank_B - np.mean(rank_B, axis=0)).mean(axis=0)
        / (np.std(rank_A, axis=0) * np.std(rank_B, axis=0))
    )

    # Handle any NaNs in correlation values (can happen if there's no variation in a feature)
    corrs_feature[np.isnan(corrs_feature)] = 0

    # Return the weighted sum of feature-wise correlations
    return np.sum(weight_on_features * corrs_feature)


def _spearman_cell_wise(
    prediction: pandas.DataFrame,
    y_test: pandas.DataFrame,
):
    score = A.corrwith(B, method="spearman", axis=1).fillna(0).mean()
    return score


def _spearman_gene_wise(
    prediction: pandas.DataFrame,
    y_test: pandas.DataFrame,
):
    score = A.corrwith(B, method="spearman", axis=0).fillna(0).mean()
    return score

many-kalin · January 3, 2025, 11:57am

I encourage you to use the second simpler implementation, which only utilizes pandas. You could also easily switch to the person correlation if needed.

enzo · January 3, 2025, 12:07pm

Thanks you very much.

I originally just implemented what the broad team gave me.
Making a mistake here would be horrible. Just copy-pasting their code make my responsability lighter.

Topic		Replies	Views
Is the MSE the right metric for benchmark? Broad Institute Crunch #1	15	229	January 14, 2025
Is Spearman's rank correlation the right metric for benchmarking? Broad Institute Crunch #1	2	55	December 9, 2024
How is the ranking being determined in crunch 2? Broad Institute Crunch #2	5	65	March 11, 2025
Crunch 1 learderboard ranking Broad Institute Crunch #1	5	94	January 4, 2025
The current scoring is unstable Broad Institute Crunch #2	7	101	February 20, 2025

Unknown evaluation metric

Related topics