1.Spatial Analysis
L: Introduction to (Spatial) Regression Models
Spatial analysis aims to:
- Evaluate how entities are spatially distributed
- Determine the underlying spatial processes
- Analyse the relationships between patterns
The focus here is on spatial entities represented as polygons (e.g. municipalities) or points (e.g
houses) » qualitative and quantitative attributes are attached.
Spatial Autocorrelation (SAC)
Grounds on the First Law of Geography: All things are related, but nearby things are more related
than distant things.
➢ Values observed at one location depend on the values of neighbouring observations.
Positive SAC
Similar values are spatially close-by
Negative SAC
Dissimilar values are spatially close-by
No SAC
Spatially random distribution (geography does not matter)
Spatial Heterogeneity (SH)
Characteristics of a population / sample depends on the absolute location
Patterns vary over space, there are no ‘average places’.
Why does space matter?
➢ Why is it important to know how a pattern is distributed?
- The data is not independent
- SAC has serious consequences for non-spatial statistical analysis » it might result in wrong
conclusions.
Exploratory spatial data analysis
Aims to discover spatial patterns
3 kinds of approaches:
Mapping
- E.g. choropleths
Global methods
Local methods
Global statistics
The spatial characteristics of a pattern are summarized globally
➢ One single number represents the pattern » it approximates an ‘average’ value.
, ➢ Spatial variations cannot be detected
Methods:
- Join Count statistic: for nominal data
- Moran`s I: interval / ratio data
- Geary`s C: interval / ratio data
- Autoregressive models
Step 1: definition of the spatial system
Contiguity (nabijheid) :
Rooks contiguity (touches only the 4 line-sharing polygons) & Queens contiguity (touches all point-
sharing polygons)
K-nearest neighbours:
K closest entities are defined as neighbours, this avoids island effects.
Threshold distance:
Entities within a particular distance (circle area) are defined as neighbours.
➢ E.g. points within 100 meters are defined as neighbours.
Interaction:
Spatial closeness results in similarity. I.e. closer entities have greater influence than more distant
ones.
For example: 1 = full interaction; 0 = no interaction.
Common functions are:
- Inverse distance weighting: wij = 1 / dij
- Squared distance weighting: wij = 1 / dij2
o Relative influence drops off more rapidly.
The W matrix (n x n)
Row standardisation:
Spatial weights are rarely used in their binary form, W is often standardised.
In row standardisation, each weight is divided by the sum of its row. So each row sums up to 1.
This allows comparison between parameters.
Step 2: select a statistic
Moran’s I
Moran’s I tests for global spatial autocorrelation:
“Are (dis)similar values in close proximity to each other or are they randomly distributed?”
The range is from + 1 (Positive SAC) to - 1 (Negative SAC). When around 0: no correlation (spatial
randomness).
Permutation approach (to check significance):
Calculate for a high number of maps (e.g. 999 runs) the Moran’s I.
If the observed Moran’s I lies in a tail of the distribution, then this is evidence for a significant value.
Local statistics
When using the global SAC, this provides evidence concerning spatial associations, but no
statements about the ‘where’ are possible.
Local statistics have the following advantages:
- Detection of clusters
, - Output of many parameters
- Visualisation capabilities
- Explore heterogeneity.
The following methods can be used:
- Local Moran’s I
- G* - statistic
- GWR
Local Moran’s I
This is a local disaggregation of the global coefficient.
It determines attribute similarity for each unit in comparison to its neighbourhood.
This enhances the detection of:
- Hot spots: High values surrounded by high values
- Cold spots: Low values surrounded by low values
- Outliers: High values surrounded by low values
Low values surrounded by high values
Moran scatterplot
The Moran scatterplot describes the linear relation of attribute values to its neighbours.
High – high: hotspots » Positive SAC
Low – low: cold spots » Positive SAC
High – low: outliers » Negative SAC
Low – high: outliers » Negative SAC.
Covariance and correlation
Covariance
Measures the association between 2 continuous variables
Pearson product-moment correlation coefficient
Standardised measure of the linear association between 2 variables.
Regression
Regression informs about the form and the nature of a relationship
➢ E.g. how is distance to the core city related to housing prices?
Simple regression:
1 response variable (dependent, metric scale), 1 independent variable (predictor)
➢ E.g. house price = f(floor area)
, Intercept = point at which the line crosses the vertical axis.
Ordinary least squares (OLS) approach
This is a statistical approach to determine the ‘best’ fitting line in a scatterplot.
- Minimizes the squared residuals.
o The black line ^ describes the data as close as possible.
In the equation (see above):
ß’s give insights into the nature of the association
ß0 gives the estimated value of y when x = 0
ß1 says how y varies when x is increased by 1 unit.
Model validation:
After estimating a regression, the following needs to be done:
- Validation of the model quality
- Statistical significance of the estimated parameters
- Fundamental model assumptions
Essential are:
- (adjusted) coefficient of determination (R2)
- T-statistic
- F-statistic
- Akaike Information Criterion (AIC)
- Moran’s I of the regression residuals.
Coefficient of determination (R2)
R2 tests how well a models explains the data.