# scipy: pdist indexing

Yao Yao on January 31, 2018

scipy.spatial.distance.pdist(X) gives the pair-wise distances of X, $\operatorname{dist}(X[i], X[i])$ not included. The distances are stored in a dense matrix D.

Question: how to get $\operatorname{dist}(X[i], X[j])$ without expanding this dense matrix by scipy.spatial.distance.squareform(D)?

A good answer from How does condensed distance matrix work? (pdist):

def square_to_condensed(i, j, n):
assert i != j, "no diagonal elements in condensed matrix"
if i < j:
i, j = j, i
return n*j - j*(j+1)/2 + i - 1 - j


Take $n$ == X.shape == $4$ as an example. In the code, $i$ and $j$ are swapped if $i < j$, so only consider the bottom triangular region, i.e. $i > j$ always holds in the discussion below.

i\j 0 1 2 3
0 - D D D
1 D - D D
2 D D - D
3 D D D -

Suppose the indexing function is $\operatorname{f}$: $\operatorname{f}(i, j) = k$ when $\operatorname{dist}(X[i], X[j]) = D[k]$

len(D) = ${n \choose 2} = (n-1) + (n-2) + \dots + 1$.

• When $j = 0$, $\operatorname{f}(i, j) = i - j - 1$
• When $j = 1$, $\operatorname{f}(i, j) = (n-1) + (i - j - 1)$
• When $j = 2$, $\operatorname{f}(i, j) = (n-1) + (n-2) + (i - j - 1)$

So the pattern here is:

Further more: