# k-Nearest Neighbor with PostGRES

Well, even though you didn’t ask for it, I decided to create a *k*-nearest neighbor analysis in PostGRES/PostGIS. This has lots of potential: find the 5 nearest ATMs to every fast food restuarant; find the 3 closest medical clinics to each school; or find the 10 closest bars to each college.

Now, the **LIMIT** clause does allow us to find the *k* closest features to a particular feature. But in this case, I’m not interested in a particular feature, I want the results for **all **features.

To do this in Postgres, I’m going to show you two things that I haven’t used until today:** rank()** and **partition.**

This entire query falls under a heading of a **Window** function in Postgres.

Modified from Postgres help:

* A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. *

Here is the code we’ll talk about:

SELECT * FROM (SELECT park_no1, park_no2, dist, rank() over (partition by park_no1 order by dist asc) as rank FROM ( SELECTa.nameAS park_no1,b.nameAS park_no2, ST_Distance(a.geometry, b.geometry) AS dist FROM parks AS a , parks AS b WHEREa.name&lt;&gt;b.name) AS T1 ) AS T2 WHERE rank &lt; 4

And our results are here:

Table 1. Each park is selected with it’s three closest parks along with the distance to the park.

Now let’s talk about it…

__Getting the distances__

Lines 6-10 is your basic SELECT statement where I get the distances from each park to every other park, and sort it in ASCENDING order (that way, we get the closest parks first). I add to that two little tricks I’ve shown you countless times:

renaming the parks table

*A*and*B*to trick the SQL engine into thinking there are two tablesnot performing the distance calculations when the names are the same (to avoid 0 distances).

__Partitioning the data__

This one was new to me. Partitioning is really cool and refers to splitting the resulting table returned by the query into smaller physical pieces. This is part of the Window capabilities in Postgres. In this case, I am partitioning the data into groups by the park_no1 value (the park name).

__Rank the data__

So, we’ve grouped out the data based on the park_no, so now we take those interesting grouped pieces and put a rank on them. The rank() function does just that. But Postgres wants to know what we want to rank. In this case, we need to use the OVER clause. The OVER clause determines exactly how the rows of the query are split up for processing by the Window function we are using. So, for each group of parks, we now have a rank from 1..N for the group like the following:

Table 2. Result of ranked query, showing each ranking for Auburn Park and their *n* nearest neighbors.

Table 3. Scrolling further down the table, you can see that Auburn Park is ranked from 1..N. When we get to Baker Park, we start again at 1.

So now we have this really big (nxn) X 4 table. But remember, we want the closest three parks. So, the thing to do is wrap the entire query in parenthesis, and then select all records with a rank column less than 4. The resulting table is shown below.

Table 4. Here we see each park, and their three closest parks.

Don’t worry if you don’t understand this. Like they say at the end of the TV show Dragnet: *the names have been changed to protect the innocent*. So, just copy this code in, and start changing names. If you are working with burglaries and not parks, then simply change the name of the parks table to burglaries. You can also change my unique name for the park with whatever your unique name is. The, when you have it working for your data, you’ll probably have a better shot at understanding it.

Want to learn more about writing SQL? Use a coupon code to take my course on Spatial SQL for under $30 right ** here**.

Also, tell your friends. If I can get 100 people to take the course in the next month I’ll ramp up more posts like this 🙂