Spatial is Not Special – Nearest Neighbor Index

It is nice to get back to the book, and start talking about Statistical Problem Solving in Geography again.  Today we are going to look at the Nearest Neighbor Index.  You can refer to chapter 14 where we illustrate the computation of the nearest neighbor index using a set of 7 points:

Then, for each point we determine the nearest neighbor, its distance, and ultimately the average nearest neighbor distance over all the points:

To develop the SQL, we will take it one step at a time.  First, we want to compute the distance from every point to every other point:

SELECT a.pt, b.pt,distance(a.[Geom (I)],b.[Geom (I)]) AS dist FROM points AS a, points AS b WHERE a.pt <> b.pt ORDER BY dist

This query gives us a table of the distance from every point to every other point.  We also play that game again where we rename the table “points” as “a” and “b” so that SQL thinks we have two different tables.  We also have to put a WHERE clause in to make sure we aren’t measuring from one point to itself – because the distance will be 0.

This only gets us part of the way there.  What we really want is the nearest neighbor for each point, not every neighbor.  So, to do that, we will wrap the previous query into another query where we get the minimum (min) distance for the points in layer “a”.  So, to do that, we use the min function which is an aggregate clause, and we also get the first point from “b” which we call “NN” because it is the nearest neighbor to the point in layer “a”.  Since we have an aggregate clause, we close it off with the GROUP BY statement.  That gives us the identical table you see in the book.

SELECT min(dist) AS dist, a.pt AS PT, first(b.pt) AS NN FROM (    SELECT a.pt, b.pt,distance(a.[Geom (I)],b.[Geom (I)]) AS dist    FROM points AS a, points AS b    WHERE a.pt <> b.pt    ORDER BY dist ) GROUP BY

But again, we are only 99% of the way there.  We don’t really want the distance from every point to its nearest neighbor (which the table above gives us), we want the average nearest neighbor distance.  So, we just wrap the previous query in an AVG function.

SELECT AVG(dist) AS NNI FROM (   SELECT min(dist) AS dist, a.pt AS PT, first(b.pt) AS NN   FROM (     SELECT a.pt, b.pt,distance(a.[Geom (I)],b.[Geom (I)]) AS dist     FROM points AS a, points AS b     WHERE a.pt <> b.pt     ORDER BY dist ) GROUP BY a.pt )

© 2020, Arthur J. Lembo, Jr.