python - List Highest Correlation Pairs from a Large Correlation Matrix in Pandas? -


how find top correlations in correlation matrix pandas? there many answers on how r (show correlations ordered list, not large matrix or efficient way highly correlated pairs large data set in python or r), wondering how pandas? in case matrix 4460x4460, can't visually.

you can use dataframe.values numpy array of data , use numpy functions such argsort() correlated pairs.

but if want in pandas, can unstack , order dataframe:

import pandas pd import numpy np  shape = (50, 4460)  data = np.random.normal(size=shape)  data[:, 1000] += data[:, 2000]  df = pd.dataframe(data)  c = df.corr().abs()  s = c.unstack() = s.order(kind="quicksort")  print so[-4470:-4460] 

here output:

2192  1522    0.636198 1522  2192    0.636198 3677  2027    0.641817 2027  3677    0.641817 242   130     0.646760 130   242     0.646760 1171  2733    0.670048 2733  1171    0.670048 1000  2000    0.742340 2000  1000    0.742340 dtype: float64 

Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

c# - must be a non-abstract type with a public parameterless constructor in redis -

ajax - PHP/JSON Login script (Twitter style) not setting sessions -