Misplaced Pages

Join count statistic

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Statistics of spatial association


Join count statistics are a method of spatial analysis used to assess the degree of association, in particular the autocorrelation, of categorical variables distributed over a spatial map. They were originally introduced by Australian statistician P. A. P. Moran. Join count statistics have found widespread use in econometrics, remote sensing and ecology. Join count statistics can be computed in a number of software packages including PASSaGE, GeoDA, PySAL and spdep.

Binary data

Join counts for binary data on a 10 × 10 {\displaystyle 10\times 10} grid using 'rook' (north, south, east, west) neighbors. Left: black is never next to black, nor white to white resulting in zeros values of J B B , J W W {\displaystyle J_{BB},J_{WW}} . Centre: random pattern shows no bias for pairing colours, resulting in approximately equal values for all join count statistics. Right: A solid patch of black in a white background results in high values for J B B , J W W {\displaystyle J_{BB},J_{WW}} and low values of J B W {\displaystyle J_{BW}} , since black is only next to white along the patch boundary.

Given binary data x i { 0 , 1 } {\displaystyle x_{i}\in \{0,1\}} distributed over N {\displaystyle N} spatial sites, where the neighbour relations between regions i {\displaystyle i} and j {\displaystyle j} are encoded in the spatial weight matrix

w i j = { 1 i  neighbor of  j 0 otherwise {\displaystyle w_{ij}={\begin{cases}1\qquad &i{\text{ neighbor of }}j\\0&{\text{otherwise}}\end{cases}}}

the join count statistics are defined as

J = J B B + J B W + J W W {\displaystyle J=J_{BB}+J_{BW}+J_{WW}}

Where

J B B = 1 2 i j , i j w i j x i x j {\displaystyle J_{BB}={\frac {1}{2}}\sum _{ij,i\neq j}w_{ij}x_{i}x_{j}}
J B W = 1 2 i j , i j w i j ( x i x j ) 2 {\displaystyle J_{BW}={\frac {1}{2}}\sum _{ij,i\neq j}w_{ij}(x_{i}-x_{j})^{2}}
J W W = 1 2 i j , i j w i j ( 1 x i ) ( 1 x j ) {\displaystyle J_{WW}={\frac {1}{2}}\sum _{ij,i\neq j}w_{ij}(1-x_{i})(1-x_{j})}
J = 1 2 i j , i j w i j {\displaystyle J={\frac {1}{2}}\sum _{ij,i\neq j}w_{ij}}

The B , W {\displaystyle B,W} subscripts refer to 'black'=1 and 'white'=0 sites. The relation J = J B B + J B W + J W W {\displaystyle J=J_{BB}+J_{BW}+J_{WW}} implies only three of the four numbers are independent. Generally speaking, large values of J B B {\displaystyle J_{BB}} and J W W {\displaystyle J_{WW}} relative to J B W {\displaystyle J_{BW}} imply autocorrelation and relatively large values of J B W {\displaystyle J_{BW}} imply anti-correlation.

To assess the statistical significance of these statistics, the expectation under various null models has been computed. For example, if the null hypothesis is that each sample is chosen at random according to a Bernoulli process with probability

p = number of black cells N = N 1 N {\displaystyle p={\frac {\text{number of black cells}}{N}}={\frac {N_{1}}{N}}}

then Cliff and Ord show that

E ( J B B ) = 1 2 S 0 p 2 {\displaystyle E(J_{BB})={\frac {1}{2}}S_{0}p^{2}}
v a r ( J B B ) = p 2 ( 1 p ) 4 ( [ S 1 ( 1 p ) + S 2 p ] ) {\displaystyle var(J_{BB})={\frac {p^{2}(1-p)}{4}}()}
E ( J B W ) = S 0 p ( 1 p ) {\displaystyle E(J_{BW})=S_{0}p(1-p)}
v a r ( J B W ) = p ( 1 p ) 4 [ 4 S 1 + S 2 ( 1 4 p ( 1 p ) ) ] {\displaystyle var(J_{BW})={\frac {p(1-p)}{4}}}

where

S 0 = i j w i j {\displaystyle S_{0}=\sum _{ij}w_{ij}}
S 1 = 1 2 i j ( w j i + w i j ) 2 {\displaystyle S_{1}={\frac {1}{2}}\sum _{ij}(w_{ji}+w_{ij})^{2}}
S 2 = i ( j w j i + j w i j ) 2 {\displaystyle S_{2}=\sum _{i}(\sum _{j}w_{ji}+\sum _{j}w_{ij})^{2}}

However in practice an approach based on random permutations is preferred, since it requires fewer assumptions.

Local join count statistic

Anselin and Li introduced the idea of the local join count statistic, following Anselin's general idea of a Local Indicator of Spatial Association (LISA). Local Join Count is defined by e.g.

J B B i = x i j w i j x j {\displaystyle J_{BBi}=x_{i}\sum _{j}w_{ij}x_{j}}

with similar definitions for B W {\displaystyle BW} and W W {\displaystyle WW} . This is equivalent to the Getis-Ord statistics computed with binary data. Some analytic results for the expectation of the local statistics are available based on the hypergeometric distribution but due to the multiple comparisons problem a permutation based approach is again preferred in practice.

Extension to multiple categories

Join counts for 3 category data on a 10 × 10 {\displaystyle 10\times 10} grid using 'rook' (north, south, east, west) neighbors. Left: each category never has a neighbour of its own type, resulting in zeros on the diagonal. Centre: random pattern shows no bias for pairing colours, resulting in approximately equal values for all join count statistics. Right: Since different types are only adjacent on the edge of the patches this results in small values for J r s {\displaystyle J_{r\neq s}} .

When there are k 2 {\displaystyle k\geq 2} categories join count statistics have been generalised

J r s = 1 2 i j I r ( x i ) I s ( x j ) {\displaystyle J_{rs}={\frac {1}{2}}\sum _{ij}I_{r}(x_{i})I_{s}(x_{j})}

Where I r ( x i ) = δ r , x i {\displaystyle I_{r}(x_{i})=\delta _{r,x_{i}}} is an indicator function for the variable x i {\displaystyle x_{i}} belonging to the category r {\displaystyle r} . Analytic results are available or a permutation approach can be used to test for significance as in the binary case.

References

  1. Moran PA. The interpretation of statistical maps. Journal of the Royal Statistical Society. Series B (Methodological). 1948 Jan 1;10(2):243-51.
  2. Anselin L. Spatial econometrics. Handbook of spatial analysis in the social sciences. 2022 Nov 15:101-22.
  3. Congalton RG, Green K. Assessing the accuracy of remotely sensed data: principles and practices. CRC press; 2019 Aug 8.
  4. ^ Dale MR, Fortin MJ. Spatial analysis: a guide for ecologists. Cambridge University Press; 2014 Sep 11.
  5. https://www.passagesoftware.net/
  6. "Esda.Join_Counts — esda v0.1.dev1+ga296c39 Manual".
  7. "Spdep: Spatial Dependence: Weighting Schemes, Statistics and Models version 0.6-15 from R-Forge".
  8. ^ Cliff, A.D. and Ord, J.K. (1981). Spatial Processes: Models & Applications. Pion. ISBN 9780850860818.{{cite book}}: CS1 maint: multiple names: authors list (link)
  9. ^ Sokal RR, Oden NL. Spatial autocorrelation in biology: 1. Methodology. Biological journal of the Linnean Society. 1978 Jun 1;10(2):199-228.
  10. "Local Spatial Autocorrelation (4)".
  11. ^ Anselin L, Li X. Operational local join count statistics for cluster detection. Journal of geographical systems. 2019 Jun 1;21:189-210.
  12. ^ "Local Spatial Autocorrelation (4)".
  13. Anselin, Luc. 1995. “Local Indicators of Spatial Association — LISA.” Geographical Analysis 27: 93–115.
  14. Epperson, B.K., 2003. Covariances among join-count spatial autocorrelation measures. Theoretical Population Biology, 64(1), pp.81-87.
Categories:
Join count statistic Add topic