Pytables time performance -
i'm working on project related text detection in natural images. have train classifier , i'm using pytables store information. have:
- 62 classes (a-z,a-z,0-9)
- each class has between 100 , 600 tables
- each table has 1 single column store 32bit float
each column has between 2^2 , 2^8 rows (depending on parameters)
my problem after train classifier, takes lot of time read information in test. example: 1 database has 27900 tables (62 classes * 450 tables per class) , there 4 rows per table , took aprox 4hs read , retrieve information need. test program read each table 390 times (for classes a-z, a-z) , 150 times classes 0-9 info need. normal? tried use index option unique column , dont see performance. work on virtualmachine 2gb ram on hp pavillion dv6 (4gb ram ddr3, core2 duo).
this because column lookup on tables 1 of slower operations can , of information lives. have 2 basic options increase performance tables many columns , few rows:
pivot structure such have table many rows , few columns.
move more efficient data structure carray or earray every row / column.
additionally, can try using compression speed things up. sort of generic advice, because haven't included code.
Comments
Post a Comment