In the Advanced Exploratory Data Analysis notebook, I am having difficulties understanding how ranking of features works below.
def compute_features_of_interest_local(data): #This is per date, will be called once per existing date
n,d = data.shape
feats = list(data.columns)[1:]
centroid = []
sds = []
data.loc[:,'sum_rank'] = 0
data.loc[:,'sum_vals'] = 0
for feat in feats:
df = data[feat]
dfs = np.array(sorted(enumerate(df),key= lambda x: x[1],reverse=True))[:,0] #rankings of each feature w.r.t the others (low rank higher score)
data.loc[:,'sum_rank'] = dfs + data.loc[:,'sum_rank']
centroid.append(df.mean())
data.loc[:,'centroid_l2'] = data.loc[:,feats].apply(lambda x: calc_dist(2,centroid,x),axis=1)
data.loc[:,'centroid_l1'] = data.loc[:,feats].apply(lambda x: calc_dist(1,centroid,x),axis=1)
data.loc[:,'centroid_linf'] = data.loc[:,feats].apply(lambda x: calc_dist(0,centroid,x),axis=1)
data.loc[:,'sum_vals'] = data.apply(lambda x: sum(x[1:max_feats]),axis=1) #We quickly can add another feature to summarize the overall ranking
return data
Lets zoom into the following line of code.
data.loc[:,'sum_rank'] = dfs + data.loc[:,'sum_rank']
By adding dfs to the ‘sum_rank’ column, we would have wrongly assigned the original assets in data df to sum_rank, haven’t we?
For eg if asset in row 0 has some feature values and rank value, now after the assignment of dfs it would have been assigned a rank that is not due to its feature values, yes?
This is the value of dfs after 1 iteration.
[[ 3. 4.1196394 ]
[522. 4.06719589]
[403. 2.95877194]
...
[246. -2.28429198]
[294. -2.54271579]
[251. -2.62656593]]
After assigning dfs to the ‘sum_rank’ column, row 0 now has a rank of 3. Is that right?
Thank you.