python - New column based on conditional selection from the values of 2 other columns in a Pandas DataFrame -
i've got dataframe contains stock values.
it looks this:
>>>data open high low close volume adj close date 2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04 when try make conditional new column following if statement:
data['test'] =data['close'] if data['close'] > data['open'] else data['open'] i following error:
traceback (most recent call last): file "<pyshell#116>", line 1, in <module> data[1]['test'] =data[1]['close'] if data[1]['close'] > data[1]['open'] else data[1]['open'] valueerror: truth value of array more 1 element ambiguous. use a.any() or a.all() i used a.all() :
data[1]['test'] =data[1]['close'] if all(data[1]['close'] > data[1]['open']) else data[1]['open'] the result entire ['open'] column selected. didn't condition wanted, select every time biggest value between ['open'] , ['close'] columns.
any appreciated.
thanks.
from dataframe like:
>>> df date open high low close volume adj close 0 2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04 1 2013-07-00 77.04 79.81 71.81 72.87 1920834 77.04 2 2013-07-10 72.87 99.81 64.23 93.23 2934843 77.04 the simplest thing can think of be:
>>> df["test"] = df[["open", "close"]].max(axis=1) >>> df date open high low close volume adj close test 0 2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04 77.04 1 2013-07-00 77.04 79.81 71.81 72.87 1920834 77.04 77.04 2 2013-07-10 72.87 99.81 64.23 93.23 2934843 77.04 93.23 df.ix[:,["open", "close"]].max(axis=1) might little faster, don't think it's nice at.
alternatively, use .apply on rows:
>>> df["test"] = df.apply(lambda row: max(row["open"], row["close"]), axis=1) >>> df date open high low close volume adj close test 0 2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04 77.04 1 2013-07-00 77.04 79.81 71.81 72.87 1920834 77.04 77.04 2 2013-07-10 72.87 99.81 64.23 93.23 2934843 77.04 93.23 or fall numpy:
>>> df["test"] = np.maximum(df["open"], df["close"]) >>> df date open high low close volume adj close test 0 2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04 77.04 1 2013-07-00 77.04 79.81 71.81 72.87 1920834 77.04 77.04 2 2013-07-10 72.87 99.81 64.23 93.23 2934843 77.04 93.23 the basic problem if/else doesn't play nicely arrays, because if (something) coerces something single bool. it's not equivalent "for every element in array something, if condition holds" or that.
Comments
Post a Comment