matlab - Fast algorithms for finding pairwise Euclidean distance -


i know matlab has built in pdist function calculate pairwise distances. however, matrix large 60000 300 , matlab runs out of memory.

this question follow on matlab euclidean pairwise square distance function.

is there workaround computational inefficiency. tried manually coding pairwise distance calculations , takes full day run (sometimes 6 7 hours).

any appreciated!

well, couldn't resist playing around. created matlab mex c file called pdistc implements pairwise euclidean distance single , double precision. on machine using matlab r2012b , r2015a it's 20–25% faster pdist(and underlying pdistmex helper function) large inputs (e.g., 60,000-by-300).

as has been pointed out, problem fundamentally bounded memory , you're asking lot of it. mex c code uses minimal memory beyond needed output. in comparing memory usage of pdist, looks 2 virtually same. in other words, pdist not using lots of memory. memory problem in memory used before calling pdist (can use clear remove large arrays?) or because you're trying solve big computational problem on tiny hardware.

so, pdistc function won't able save memory overall, may able use feature built in. can calculate chunks of overall pairwise distance vector. this:

m = 6e3; n = 3e2; x = rand(m,n); sz = m*(m-1)/2;  = 1:m:sz-m     d = pdistc(x', i, i+m); % mex c function, x transposed relative pdist     ...                     % process chunk of pairwise distances end 

this considerably slower (10 times or so) , part of c code not optimized well, allow less memory use – assuming don't need entire array @ 1 time. note same thing more efficiently pdist (or pdistc) creating loop passed in subsets of x directly, rather of it.

if have 64-bit intel mac, won't need compile i've included .mexmaci64 binary, otherwise you'll need figure out how compile code machine. can't that. it's possible may not able compile or there compatibility issues you'll need solve editing code yourself. it's possible there bugs , code crash matlab. also, note may different outputs relative pdist differences between 2 in range of machine epsilon (eps). pdist may or may not fancy things avoid overflows large inputs , other numeric issues, aware code not.

additionally, created simple pure matlab implementation. massively slower mex code, still faster naïve implementation or code found in pdist.

all of files can found here. zip archive includes of files. it's bsd licensed. feel free optimize (i tried blas calls , openmp in c code no avail – maybe pointer magic or gpu/opencl further speed up). hope can helpful or else.


Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

c# - must be a non-abstract type with a public parameterless constructor in redis -

ajax - PHP/JSON Login script (Twitter style) not setting sessions -