python - New column based on conditional selection from the values of 2 other columns in a Pandas DataFrame -


i've got dataframe contains stock values.

it looks this:

>>>data open high low close volume adj close date                                                        2013-07-08  76.91  77.81  76.85  77.04  5106200  77.04 

when try make conditional new column following if statement:

data['test'] =data['close'] if data['close'] > data['open'] else data['open'] 

i following error:

traceback (most recent call last):   file "<pyshell#116>", line 1, in <module>     data[1]['test'] =data[1]['close'] if data[1]['close'] > data[1]['open'] else data[1]['open'] valueerror: truth value of array more 1 element ambiguous. use a.any() or a.all() 

i used a.all() :

data[1]['test'] =data[1]['close'] if all(data[1]['close'] > data[1]['open']) else data[1]['open'] 

the result entire ['open'] column selected. didn't condition wanted, select every time biggest value between ['open'] , ['close'] columns.

any appreciated.

thanks.

from dataframe like:

>>> df          date   open   high    low  close   volume  adj close 0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04 1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04 2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04 

the simplest thing can think of be:

>>> df["test"] = df[["open", "close"]].max(axis=1) >>> df          date   open   high    low  close   volume  adj close   test 0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04 1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04 2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23 

df.ix[:,["open", "close"]].max(axis=1) might little faster, don't think it's nice at.

alternatively, use .apply on rows:

>>> df["test"] = df.apply(lambda row: max(row["open"], row["close"]), axis=1) >>> df          date   open   high    low  close   volume  adj close   test 0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04 1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04 2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23 

or fall numpy:

>>> df["test"] = np.maximum(df["open"], df["close"]) >>> df          date   open   high    low  close   volume  adj close   test 0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04 1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04 2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23 

the basic problem if/else doesn't play nicely arrays, because if (something) coerces something single bool. it's not equivalent "for every element in array something, if condition holds" or that.


Comments