python - List Highest Correlation Pairs from a Large Correlation Matrix in Pandas? -
how find top correlations in correlation matrix pandas? there many answers on how r (show correlations ordered list, not large matrix or efficient way highly correlated pairs large data set in python or r), wondering how pandas? in case matrix 4460x4460, can't visually.
you can use dataframe.values
numpy array of data , use numpy functions such argsort()
correlated pairs.
but if want in pandas, can unstack
, order
dataframe:
import pandas pd import numpy np shape = (50, 4460) data = np.random.normal(size=shape) data[:, 1000] += data[:, 2000] df = pd.dataframe(data) c = df.corr().abs() s = c.unstack() = s.order(kind="quicksort") print so[-4470:-4460]
here output:
2192 1522 0.636198 1522 2192 0.636198 3677 2027 0.641817 2027 3677 0.641817 242 130 0.646760 130 242 0.646760 1171 2733 0.670048 2733 1171 0.670048 1000 2000 0.742340 2000 1000 0.742340 dtype: float64
Comments
Post a Comment