python - New column based on conditional selection from the values of 2 other columns in a Pandas DataFrame -


i've got dataframe contains stock values.

it looks this:

>>>data open high low close volume adj close date                                                        2013-07-08  76.91  77.81  76.85  77.04  5106200  77.04 

when try make conditional new column following if statement:

data['test'] =data['close'] if data['close'] > data['open'] else data['open'] 

i following error:

traceback (most recent call last):   file "<pyshell#116>", line 1, in <module>     data[1]['test'] =data[1]['close'] if data[1]['close'] > data[1]['open'] else data[1]['open'] valueerror: truth value of array more 1 element ambiguous. use a.any() or a.all() 

i used a.all() :

data[1]['test'] =data[1]['close'] if all(data[1]['close'] > data[1]['open']) else data[1]['open'] 

the result entire ['open'] column selected. didn't condition wanted, select every time biggest value between ['open'] , ['close'] columns.

any appreciated.

thanks.

from dataframe like:

>>> df          date   open   high    low  close   volume  adj close 0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04 1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04 2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04 

the simplest thing can think of be:

>>> df["test"] = df[["open", "close"]].max(axis=1) >>> df          date   open   high    low  close   volume  adj close   test 0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04 1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04 2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23 

df.ix[:,["open", "close"]].max(axis=1) might little faster, don't think it's nice at.

alternatively, use .apply on rows:

>>> df["test"] = df.apply(lambda row: max(row["open"], row["close"]), axis=1) >>> df          date   open   high    low  close   volume  adj close   test 0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04 1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04 2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23 

or fall numpy:

>>> df["test"] = np.maximum(df["open"], df["close"]) >>> df          date   open   high    low  close   volume  adj close   test 0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04 1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04 2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23 

the basic problem if/else doesn't play nicely arrays, because if (something) coerces something single bool. it's not equivalent "for every element in array something, if condition holds" or that.


Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

html - Unable to style the color of bullets in a list -

c# - must be a non-abstract type with a public parameterless constructor in redis -