Tuesday, May 05, 2009

Matlab: Vectorization.

OP here.

Q:

I have


c = cell(n,1); % Each cell is an array of integers.


How do you vectorize this loop? it takes very long.



for i=1:n-1
for j=i+1:n
lic(i,j)=length(intersect(c{i},c{j}));
end
end


A1:

Generate some data for testing:



the data:

n=200; % number of sets
m=100; % element value are from 0 to m-1
meannumel=50; % average number of elements in lists
s = 20; % standard deviation of number of elements
c=cell(1,n);

for k=1:n
nelk = meannumel+round(s*randn);
nelk = min(max(nelk,1),m);
c{k} = floor(m*rand(1,nelk));
end


The engine:


n = length(c);
c = cellfun(@(x) unique(x(:)), c, 'uni', false);

s = cumsum([1 cellfun(@length,c)]).'; % boundary of cell, in vector x;
x = cat(1,c{:}); % all elements, in one long vector;
[x,p] = sort(x); % x is sorted now;

u = cumsum([1;diff(x) ~= 0]);
% before cumsum: boundary of unique (different) elements in x; cumsum: to get a vector u, so that u = 11111223333, # of index equal to count of value;

t = cumsum(accumarray(s,1));
% accumarray: here equal to tmp = zeros(s(end),1); tmp(s) = 1; cumsum: to get a vector t, so that t = 1111222223333 # of index equal to count of that value. here s is boundary of cell, so this t is actually which cell each element belongs;

A = sparse(u,t(p),1,u(end),n);
% sparse: only store non-zero elements, save memory space; syntax: p = sparce(i,j,s,m,n) will create a m x n sparse matrix, with elements p(i(k),j(k)) = s(k); here the intention is to add 1 at, the unique element index, vs. the cell index that each elements belongs.
% A is in the format of: unique item index by which cell, here A is essentially a logical matrix (valued 0, 1 only)

A = triu(full(A'*A),1);
% A'*A is the trick to count intersected values; colums and rows in A are only 1 or 0. matrix muptiplication sums the ones.




A2:
A2 is slightly improved over A1 in that the 'unique' is only called once.


% c must be a list of *row* vectors
[dummy dummy cmap] = unique([c{:}]);
% length(dummy) is the number of unique values in c. cmap is in the form of [1 2 3 1 1 2 2 2 4 ...], which specifies unique values indeces in [c{:}]. length(cmap) = numel([c{:}];
nele = cellfun(@length,c(:));
setid = cumsum(accumarray(cumsum([1; nele]),1));
A = sparse(setid(1:end-1),cmap(:),1,length(c),length(dummy));
A = spones(A); % SPONES Replace nonzero sparse matrix elements with ones.
A = triu(full(A*A.'),1);

Speed comparison:

The for loop: 3.55 sec.
A1: 0.022 sec.
A2: 0.015 sec.

No comments: