k-nearest neighbor with SQL in Radian Studio

I wanted to give you another look at some features that Radian Studio will offer. I’ve shown how we can use SQL to replicate the ARC/INFO NEAR function, and how to perform Nearest Neighbor Analysis. But, another useful took is the ability to identify k-nearest neighbors. That is, rather than just identifying the nearest neighbor, you might want to identify the two, three, or k nearest neighbors.

Radian will allow that functionality by using the COLLECT aggregate clause. The COLLECT aggregate collects values from a subgroup into a table, returning a table with one or more fields.

it is like a SELECT which runs on a group. COLLECT takes a table and returns a table without requiring us to write a FROM section as we would with a SELECT. This is stuff that the real grown up databases like Oracle use, and Manifold is going to give it to us as part of Radian Studio.

SELECT park_no1,
SPLIT(COLLECT park_no2, dist
ORDER BY dist ASC FETCH 3
)
FROM
(
SELECT a.name AS park_no1, b.name AS park_no2,
GeomDistance(a.[geom (i)], b.[geom (i)], 0) AS dist
FROM [parks Table] AS a , [parks Table] AS b
WHERE a.name <> b.name
)
GROUP BY park_no1


So what’s happening here? Let’s walk through it:

Selecting distances from each park to every other park

First, the inner query (lines 7-10) is getting us the distances from every point to every other point. In this case, I’m using the same layer (parks) and therefore playing the little trick I’ve shown you before – renaming the parks table as A and B so that the SQL engine thinks they are two different tables.  So, the result would look something like this:

Table 1. Result of a query to find the distance from every park to every other park


Collecting the Results

Remember that the COLLECT aggregate collects values from a subgroup into a table – we can’t actually see the table from the COLLECT aggregate – we have to split the table using the SPLIT function.  But you’ll notice that in the COLLECT portion (lines 1-3) we fetch 3 of the nearest neighbors for each park (notice that we are sorting the data in ASCENDING order so that the three closest parks are selected).  Notice the GROUP BY in the last line -that will quantify the results by each park number so that the resulting table looks like this:

Table 2.  Resulting query that selects the three nearest parks to every other park.


This is pretty powerful stuff. There are all kinds of ways to use it – finding the three closest ATMs to every pub in the city, or finding the 5 closest  fire stations to every factory. Try and return the 3 nearest neighbors for each feature in another GIS, and you’ll be scratching your head for awhile (there is a reason why I’ve demonstrated nearest neighbor in the past, but not demonstrated k-neighbors!).

And of course, the best part of SQL is you can simply wrap the entire query in parentheses, and add another line like:

SELECT avg(dist) FROM

and you have yourself a nearest neighbor calculation for a k-nearest neighbor.

© 2020, Arthur J. Lembo, Jr.