Advanced examples
This page demonstrates advanced examples of using SPICE.
Controlling for parcels in SPICE-NS
Several arguments in spice.effective_sample_size_estimation() and spice.covariance_estimation() can be used to control the behavior of parcellations used in SPICE-NS, including: x_parc (y_parc), max_clusters, min_clusters, and min_cluster_size.
x_parc (and y_parc) determines whether SPICE or SPICE-NS is used, and also how data x (and y) are pacellated in SPICE-NS. They can be one of the following 3 scenarios:
\(None\): SPICE will be used.
String \('auto'\): SPICE-NS will be used, with parcellation in map is determined by data-driven spatial clustering, conditioned on parameters including: max_clusters, min_clusters, and min_cluster_size
int ndarray \((N,)\) with \(k\) unique integers ranging from \(0\) to \(k-1\), each indicate a parcellation: SPICE-NS will be used, and parcellation is determined as use-specified. A warning will be raised if the number of user-specified parcellation is larger than the number of data-driven determined number conditioned on parameters including: max_clusters, min_clusters, and min_cluster_size.
max_clusters (default 10) and min_cluster_size (default 500) jointly determines the largest number of parcels can be used in data-driven parcellation with spatial clustering. The maximum number is the smaller of max_clusters or \(floor(N / {min\_cluster\_size})\). The min_cluster_size is included such that the average size (i.e., number of observations) of parcels is not too small to get accurate and reliable variogram estimation.
min_clusters (default 1) determines the smallest number of parcels can be used, which should be always set to 1 in practice. This argument is used in our study to investigate the impact of forced parcellation.
Note
Use x_parc and y_parc to manually set the choice of parcellations in SPICE-NS
Warning
May need to change the input of max_clusters and min_cluster_size when apply SPICE-NS with data-driven spatial clustering for small data patches, depending on your data.
Run with distance matrix (python only)
SPICE can be run when the spatial coordinates of observations are missing but the pairwise distances \(D\) is known.
SPICE-NS with data-driven spatial clustering can be run when the spatial coordinates of observations are missing but the pairwise distances \(D\) and the spatial dimension \(dim\) is known, where clusters are determined using K-Medoids clustering implemented with Scikit-learn-extra.
Examples below:
import spice
# SPICE
pef, rX, nef, run_status, n_parc, p_naive, fc_para1, fc_para2 = spice.effective_sample_size_estimation(x, y, D=D)
# SPICE-NS
pef, rX, nef, run_status, n_parc, p_naive, fc_para1, fc_para2 = spice.effective_sample_size_estimation(x, y, D=D, dim=dim, xparc='auto', yparc='auto')
Large-scale pairwise evaluation
The spice.effective_sample_size_estimation() funciton internally estiamtes the covariance matrices of spatial maps, which can increase computational cost when evaluating pairwise associations between a large number of spatial maps because of repeatitive covariance estimation. spice.covariance_estimation() can be used in this case to improve computational efficiency, by decoupling covariance estimation and significance inference. In brief, the covariance matrices estimated can be saved and reloaded for later inference of statistical significance.
Below is an example of running this decoupled pipeline of SPICE, to infer significance of associations between 3 different spatial maps.
Step 1: covariance estimation
spatial_maps = {x, y, z};
mapnames = {'x', 'y', 'z'};
cov_path = '.' % save covariance matrices in the current path
for i = 1:length(spatial_maps)
mapi = spatial_maps{i};
% computed covariance matrix may contain NaN if the spatial data contains NaN or Inf
covmat = covariance_estimation(x,coord) % change function parameters for SPICE-NS and other settings
save(fullfile(cov_path, sprintf('%s_cov.mat', mapnames{i})), 'covmat')
import spice
import numpy as np
import os
spatial_maps = [x, y, z]
mapnames = ['x', 'y', 'z']
cov_path = '.' # save covariance matrix in the current dir
for _, (mapi, namei) in enumerate(zip(spatial_maps, mapnames)):
covmat = spice.covariance_estimation(mapi, coord) # change function parameters for SPICE-NS and other settings
np.save(os.path.join(cov_path, f'{namei}_cov.npy'), covmat)
Step 2: infer significance
spatial_maps = {x, y, z};
mapnames = {'x', 'y', 'z'};
cov_path = '.'
ps = zeros(length(spatial_maps)) % initiate for saving
rXs = zeros(length(spatial_maps)) % initiate for saving
for i = 1:length(spatial_maps)-1
fpr j = i+1:length(spatial_maps)
mapi = spatial_maps{i};
mapj = spatial_maps{j};
covi = load(fullfile(cov_path, sprintf('%s_cov.mat', mapnames{i})));
covj = load(fullfile(cov_path, sprintf('%s_cov.mat', mapnames{j})));
% because data mapi and mapj, and covariance matrices covi and covj may contain NaN and Inf, remove these data points
valid = isfinite(mapi) & isfinite(mapj);
covi = covi(valid, valid);
covj = covj(valid, valid);
[rXs(i, j), ~] = corr(mapi(valid), mapj(valid));
[nef, run_status] = covs2nef(covi,covj)
ps(i,j) = nef2p(rXs(i,j), nef)
import spice
import numpy as np
import os
from scipy.stats import pearsonr
spatial_maps = [x, y, z]
mapnames = ['x', 'y', 'z']
savepath = '.'
n_maps = len(spatial_maps)
rXs = np.zeros((n_maps, n_maps)) # initiate for saving
ps = np.zeros((n_maps, n_maps)) # initiate for saving
for i in np.arange(n_maps-1):
for j in np.arange(i+1, n_maps):
mapi = spatial_maps[i]
namei = mapnames[i]
mapj = spatial_maps[j]
namej = spatial_maps[j]
covi = np.load(os.path.join(cov_path, f'{namei}_cov.npy'))
covj = np.load(os.path.join(cov_path, f'{namej}_cov.npy'))
valid = np.logical_and(np.isfinite(mapi), np.isfinite(mapj))
rX, p_naive = pearsonr(x[valid], y[valid])
nef = cov2nef(covi[np.ix_(valid,valid)],covj[np.ix_(valid,valid)])
pef = nef2p(rX, nef) if nef>2 else np.nan
rXs[i,j], ps[i,j] = rX, pef
Impact of spatial trends
Coming soon.