I wrote a paper in 2006 about using SQL for the Modifiable Area Unit Problem (MAUP). The example was an agricultural experiment field, and I showed how SQL could be used to calculate the descriptive statistics rather easily, and we also used SQL to develp a trend surface map to better understand the underlying trends of the data and determine the best areal units to use for grouping the data. So, we thought we would use that example in the book. MAUP is a conundrum that geographers have been thinking about for a long time, and quite frankly we will think about in the future. I won’t go into what MAUP is here, for that you can buy the book! But, one of our figures showed the data for 3 different blocking patterns and the corresponding descriptive statistics:
The key is to develop descriptive statistics for each point that is located within each block. This is easily accomplished through the CONTAINS and the GROUP BY clauses as follows:
SELECT avg(AVG_P) AS AVG_P, avg(AVG_K) AS AVG_K, var(AVG_K) AS VAR_K, var(AVG_P) AS VAR_P FROM ( SELECT grid.id, avg(p) AVG_P, avg(k) AVG_K FROM grid, samples WHERE contains(grid.id,samples.id) GROUP BY grid.id )
The subquery within the parenthesis computes the average Phosphorous (P) and Potassium (K) values for the soils, and groups it by the grid the points are contained in. Based on our figure above, that means we will have 6 values for K and P. If you wanted to leave it at that, you could put those values into a scatter plot to view the correlation, or wrap or copy the data to Excel and compute the correlation coefficient. But, for us, we will take the 6 results from the sub query and compute the averages and variances of P and K for the overall grouped data.