python 2.7 - Round pandas datetime index? -
i reading multiple spreadsheets of timeseries pandas dataframe , concatenating them common pandas datetime index. datalogger logged timeseries not 100% accurate makes resampling annoying because depending on if time higher or lower interval being sampled create nans , starts make series broken line. here's code
def loaddata(filepaths): t1 = time.clock() in range(len(filepaths)): xl = pd.excelfile(filepaths[i]) df = xl.parse(xl.sheet_names[0], header=0, index_col=2, skiprows=[0,2,3,4], parse_dates=true) df = df.dropna(axis=1, how='all') df = df.drop(['decimal year day', 'decimal year day.1', 'record'], axis=1) if == 0: dfs = df else: dfs = concat([dfs, df], axis=1) t2 = time.clock() print "files loaded dataframe in %s seconds" %(t2-t1) files = ["london lysimeters corrected 5min.xlsx", "london water balance 5min.xlsx"] data = loaddata(files)
here's idea of index:
data.index
class 'pandas.tseries.index.datetimeindex'> [2012-08-27 12:05:00.000002, ..., 2013-07-12 15:10:00.000004] length: 91910, freq: none, timezone: none
what fastest , general round index nearest minute?
here's little trick. datetimes in nanoseconds (when viewed np.int64
). round minutes in nanoseconds.
in [75]: index = pd.datetimeindex([ timestamp('20120827 12:05:00.002'), timestamp('20130101 12:05:01'), timestamp('20130712 15:10:00'), timestamp('20130712 15:10:00.000004') ]) in [79]: index.values out[79]: array(['2012-08-27t08:05:00.002000000-0400', '2013-01-01t07:05:01.000000000-0500', '2013-07-12t11:10:00.000000000-0400', '2013-07-12t11:10:00.000004000-0400'], dtype='datetime64[ns]') in [78]: pd.datetimeindex(((index.asi8/(1e9*60)).round()*1e9*60).astype(np.int64)).values out[78]: array(['2012-08-27t08:05:00.000000000-0400', '2013-01-01t07:05:00.000000000-0500', '2013-07-12t11:10:00.000000000-0400', '2013-07-12t11:10:00.000000000-0400'], dtype='datetime64[ns]')
Comments
Post a Comment