Hi @separate-orr, thank you for your response. In single-cell analysis, the focus is often on the population or cluster level, not the individual cell. When performing differential gene expression analysis, individual gene expression across populations or clusters is compared. This makes it crucial to ensure accurate gene expression measurements across cells.
To illustrate this, here is a toy example demonstrating how perfect cell-wise correlation does not always translate into a perfect gene-wise score. Such discrepancies can result in flawed downstream analyses (e.g., incorrect clustering; dge).
import pandas as pd
# True values (Y)
Y = pd.DataFrame({
"F1": [10, 15], # Feature 1
"F2": [20, 25], # Feature 2
"F3": [30, 35], # Feature 3
}, index=["S1", "S2"]) # Samples (S1, S2)
# Predicted values (Y_hat)
Y_hat = pd.DataFrame({
"F1": [5, 10],
"F2": [30, 25],
"F3": [45, 40],
}, index=["S1", "S2"])
print("Cell-wise: ", Y.corrwith(Y_hat, axis=1, method='spearman').mean())
print("Gene-wise: ", Y.corrwith(Y_hat, axis=0, method='spearman').mean())
Cell-wise: 1.0
Gene-wise: -0.3333333333333333