matlab - Fast algorithms for finding pairwise Euclidean distance -
i know matlab has built in pdist function calculate pairwise distances. however, matrix large 60000 300 , matlab runs out of memory.
this question follow on matlab euclidean pairwise square distance function.
is there workaround computational inefficiency. tried manually coding pairwise distance calculations , takes full day run (sometimes 6 7 hours).
any appreciated!
well, couldn't resist playing around. created matlab mex c file called pdistc
implements pairwise euclidean distance single , double precision. on machine using matlab r2012b , r2015a it's 20–25% faster pdist
(and underlying pdistmex
helper function) large inputs (e.g., 60,000-by-300).
as has been pointed out, problem fundamentally bounded memory , you're asking lot of it. mex c code uses minimal memory beyond needed output. in comparing memory usage of pdist
, looks 2 virtually same. in other words, pdist
not using lots of memory. memory problem in memory used before calling pdist
(can use clear
remove large arrays?) or because you're trying solve big computational problem on tiny hardware.
so, pdistc
function won't able save memory overall, may able use feature built in. can calculate chunks of overall pairwise distance vector. this:
m = 6e3; n = 3e2; x = rand(m,n); sz = m*(m-1)/2; = 1:m:sz-m d = pdistc(x', i, i+m); % mex c function, x transposed relative pdist ... % process chunk of pairwise distances end
this considerably slower (10 times or so) , part of c code not optimized well, allow less memory use – assuming don't need entire array @ 1 time. note same thing more efficiently pdist
(or pdistc
) creating loop passed in subsets of x
directly, rather of it.
if have 64-bit intel mac, won't need compile i've included .mexmaci64
binary, otherwise you'll need figure out how compile code machine. can't that. it's possible may not able compile or there compatibility issues you'll need solve editing code yourself. it's possible there bugs , code crash matlab. also, note may different outputs relative pdist
differences between 2 in range of machine epsilon (eps
). pdist
may or may not fancy things avoid overflows large inputs , other numeric issues, aware code not.
additionally, created simple pure matlab implementation. massively slower mex code, still faster naïve implementation or code found in pdist
.
all of files can found here. zip archive includes of files. it's bsd licensed. feel free optimize (i tried blas calls , openmp in c code no avail – maybe pointer magic or gpu/opencl further speed up). hope can helpful or else.
Comments
Post a Comment