How is the log1p-normalized data calculated?

energetic-jiachen · November 28, 2024, 12:20am

I am trying to reproduce how the normalized data (sdata[‘anucleus’].X) is calculated. I am applying the defined log1p-normalized function (at the end of the notebook basic-EDA) to the raw count data:

def log1p_normalization(arr):
return np.log1p((arr/np.sum(arr, axis=1)) * 100)

gene_name_list = sdata[‘anucleus’].var[‘gene_symbols’].values
x_count = pd.DataFrame(sdata[‘anucleus’].layers[‘counts’],
columns=gene_name_list) # raw count

x_count_norm_using_def = log1p_normalization(x_count)

The above code returns NAN results… Could anyone help me with this? In addition, what is the rationale behind this specific normalization method? Is it better to analyze the raw count data or the normalized data?

Thank you!

spicy-questo · November 28, 2024, 7:47am

i checked the log1p-normalized target, it is indeed gives the same output as sdata[‘anucleus’].X:

gene_name_list = sdata[‘anucleus’].var[‘gene_symbols’].values
cell_id_train = sdata[‘cell_id-group’].obs[sdata[‘cell_id-group’].obs[‘group’] == ‘train’][‘cell_id’].to_numpy()
cell_id_train = list(set(cell_id_train).intersection(set(sdata[‘anucleus’].obs[‘cell_id’].unique())))
ground_truth_example = sdata[‘anucleus’].layers[‘counts’][sdata[‘anucleus’].obs[‘cell_id’].isin(cell_id_train),:]
y = pd.DataFrame(ground_truth_example, columns= gene_name_list, index = cell_id_train)

def log1p_normalization1(arr):
arr_sum = np.sum(arr) # Sum over all elements
if arr_sum == 0: # Avoid division by zero
np.zeros_like(arr) # Return an array of zeros if the sum is zero
np.log1p((arr / arr_sum) * 100)

print(sdata[‘anucleus’].X[0],log1p_normalization1(y.iloc[0,:].values))

Topic		Replies	Views
Log1p normalization scale factor Broad Institute Crunch #2	1	48	November 21, 2024
Re-apply normalization Broad Institute Crunch #1	4	65	December 6, 2024
Meaning of ranking Broad Institute Crunch #2	9	93	December 16, 2024
DC1 anucleus is empty (n_obs = 0) Broad Institute Crunch #1	5	106	November 7, 2024
Crunch 3 eval random forest input: raw or log counts? normalized or unnormalized counts? what sequencing depth? Broad Institute Crunch #3	0	31	April 18, 2025

How is the log1p-normalized data calculated?

Related topics