Abstract
Introduction: The population and spatial characteristics of COVID-19 infections are poorly understood.Testing data were obtained from the New York City Department of Health and Mental Hygeine (NYC DOHMH) releasing COVID19 tests results posted on GitHub. A detailed description of data creation can be found below. Covariate are drawn primarily from 2010 US Census results.
Prep the data by ordering to match the spatial dataframe, and add a sequential area ID variable for the model statements.
library(maptools, quietly = T)
library(sp)
library(spdep, quietly = T)
library(INLA, quietly = T)
library(ggplot2)
library(kableExtra)
library(DescTools)
## Registered S3 method overwritten by 'DescTools':
## method from
## reorder.factor gdata
options(scipen = 10) # don't use scientific notation
options(fmt.num=structure(list(digits=2, big.mark=","), class="fmt")) # force use commas as thousands separator and 2 decimal places
testDat<-readRDS("~/testDat.RDS")
nyc <- readRDS("~/nycMap4COVID.RDS")
nycDat <- attr(nyc, "data")
order <- match(nycDat$ZCTA5CE10,testDat$zcta)
testDat<- testDat[order,]
testDat$ID.area<-seq(1,nrow(testDat))
nyc.adj <- paste("~/nyc.graph")
There were 177 ZCTA’s in the data set. The mean COVID-19 test rate per 10,000 ZCTA population was 166.2 (95% CI 156.7, 175.7). The distribution appeared multimodal. The mean COVID-19 test rate per 10,000 tests was 5,176.0 (95% CI 5,045.9, 5,306.1) and appeared skewed. The The 5 ZCTAs with highest positive COVID-19 test numbers per 10,000 population were the same as those with the highest proportion per 10,000 tests (10464, 10470, 10455, 10473, 11234, and 11210). The 5 lowest ZCTAs were also the same for both measures 1(1103, 11102, 11693, 11369, 11363, and 10308). Table one presents comparative statistics for the ZCTA’s with the highest and lowest quantiles for population-based positive test rates.
## -------------------------------------------------------------------------
## Positive COVID-19 Tests per 10,000 ZCTA Population
##
## length n NAs unique 0s mean
## 177 177 0 = n 0 166.18312
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 71.10536 84.17626 113.19039 162.97451 215.08084 248.76561
##
## range sd vcoef mad IQR skew
## 281.17539 64.25910 0.38668 75.00825 101.89045 0.18618
##
## meanCI
## 156.65093
## 175.71531
##
## .95
## 267.41561
##
## kurt
## -0.81984
##
## lowest : 42.02216, 49.81734, 50.22831, 60.10303, 61.66783
## highest: 282.27373, 297.51712, 316.63043, 322.37247, 323.19755
## -------------------------------------------------------------------------
## Positive COVID-19 Tests per 10,000 ZCTA Tests
##
## length n NAs unique 0s mean
## 177 177 0 = n 0 5'175.967
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 3'608.459 3'907.126 4'519.016 5'380.117 5'852.102 6'039.269
##
## range sd vcoef mad IQR skew
## 4'747.624 876.960 0.169 780.456 1'333.086 -0.589
##
## meanCI
## 5'045.879
## 5'306.055
##
## .95
## 6'198.431
##
## kurt
## -0.128
##
## lowest : 2'586.207, 2'937.063, 2'972.973, 3'142.857, 3'178.295
## highest: 6'481.481, 6'614.493, 6'818.530, 6'982.703, 7'333.831
## [1] 10464 10470 10455 10473 11234 11210
## [1] 11103 11102 11693 11369 11363 10308
## [1] 10464 10470 10455 10473 11234 11210
## [1] 11103 11102 11693 11369 11363 10308
## var total High
## 1 n 72 36 (50.0%)
## 2 MHI 57'758.653 (24'986.680) 55'314.528 (19'700.606)
## 3 School Density 5.167 (4.605) 2.658 (1.977)
## 4 Population Density 16'584.935 (11'770.940) 9'486.726 (7'238.221)
## 5 Housing Density 18'165.214 (19'748.001) 8'784.774 (6'788.215)
## 6 Congdon Index -0.089 (1.996) -1.118 (1.978)
## 7 Black 0.234 (0.263) 0.355 (0.308)
## 8 Hispanic 0.124 (0.045) 0.134 (0.045)
## 9 Heart Disease 0.111 (0.207) 0.174 (0.268)
## 10 COPD 2.011 (1.932) 2.276 (2.482)
## Low V5
## 1 36 (50.0%)
## 2 82'917.333 (27'557.029) *** '
## 3 7.289 (5.379) *** '
## 4 26'000.104 (13'418.550) *** '
## 5 37'361.654 (33'665.045) *** '
## 6 1.603 (1.942) *** '
## 7 0.069 (0.125) *** '
## 8 0.120 (0.051) '
## 9 0.072 (0.160) '
## 10 1.554 (1.420) '
We start with a simple model of postive COVID-19 tests per 10,000 ZCTA population.
cut.rate<-quantile(testDat$rate,probs = seq(0, 1, by = 0.20))
testDat$rate.cut<-cut(testDat$rate,breaks=cut.rate,include.lowest=TRUE)
# summary(testDat$rate.cut)
nyc.df <- fortify(nyc, region='ZCTA5CE10')
nyc.df <- merge(nyc.df, testDat, by.x="id", by.y="zcta", all=FALSE)
nyc.df <- nyc.df[order(nyc.df$group, nyc.df$order), ]
p1<-ggplot(nyc.df, aes(x=long, y=lat))
p2<-p1+geom_polygon(aes(group=group, fill=rate.cut))
p3<- p2 + scale_fill_brewer(palette="Blues", name="Fitted Values")
map3<- p3+ theme(axis.text.x = element_blank(), axis.text.y = element_blank(),
axis.ticks = element_blank()) +
theme(panel.background = element_rect(colour = NA)) +
xlab("")+ylab("")+
ggtitle("Positive COVID19 Tests per 10,000 Population,
New York City ZIP Code Tabulation Areas April 3-22, 2019")
print(map3)
Next, we map the proportion of positive COVID-19 tests per 10,000 positive tests.
cut.prop<-quantile(testDat$prop,probs = seq(0, 1, by = 0.20))
testDat$prop.cut<-cut(testDat$prop,breaks=cut.prop,include.lowest=TRUE)
# summary(testDat$prop.cut)
nyc.df <- fortify(nyc, region='ZCTA5CE10')
nyc.df <- merge(nyc.df, testDat, by.x="id", by.y="zcta", all=FALSE)
nyc.df <- nyc.df[order(nyc.df$group, nyc.df$order), ]
p1<-ggplot(nyc.df, aes(x=long, y=lat))
p2<-p1+geom_polygon(aes(group=group, fill=prop.cut))
p3<- p2 + scale_fill_brewer(palette="Blues", name="Fitted Values")
map4<- p3+ theme(axis.text.x = element_blank(), axis.text.y = element_blank(),
axis.ticks = element_blank()) +
theme(panel.background = element_rect(colour = NA)) +
xlab("")+ylab("")+
ggtitle("Positive COVID19 Tests per 10,000 Tests,
New York City ZIP Code Tabulation Areas April 3-22, 2019")
print(map4)
As a baseline against which to measure more informative models, run a simple random effects (spatially unstructured heterogeneity) model of the total number of positive tests in a ZIP Code Tabulation Area with the total number of tests performed as an offset variable 1.
\(y_i = \alpha + \upsilon_i\)
The following code sets up, runs the model and reviews the results.
formulaUH <- Positive~ f(ID.area, model = "iid", offset(Total)) # specify model
resultUH <- inla(formulaUH,family="poisson", data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultUH)
##
## Call:
## c("inla(formula = formulaUH, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.7359 0.2593 0.2731 1.2682
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 4.1734 0.0786 4.0167 4.1742 4.3256 4.1758 0
##
## Random effects:
## Name Model
## ID.area IID model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant
## Precision for ID.area 214039.80 25751.73 167309.23 212751.27 268376.81
## mode
## Precision for ID.area 210420.59
##
## Expected number of effective parameters(std dev): 174.43(0.1474)
## Number of equivalent replicates : 1.015
##
## Deviance Information Criterion: 1831.58
## Effective number of parameters: 174.50
##
## Marginal Likelihood: -1547.62
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
exp(resultUH$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 64.93475 1.081825 55.51482 64.98622 75.61373 65.09194 1
mean(testDat$Positive, na.rm=T) # model results differ because adjusted for population
## [1] 772.9209
This first model illustrates the random effect term, with no explicit spatial component. The deviance information criteria for this model is 1831.58, with 174.5 effective parameters. The fixed effects for this model consists only of the intercept, which exponentiates to a mean 2 of 64.9 positive tests per ZCTA (sd= 1.1, 95% Credible Interval 55.5 , 75.6) and a mode 3 of 65.1
The variance results are stored in the “region” part of “summary.random” in the named list object created from the INLA run. We extract and examine the mean random effect term for each county. Recall, this is the random variation, on the log scale, around the mean or intercept value of the number of positive test in a NYC ZCTA with an offset term for the total ZCTA population.
The line graph plots the density of the random effect term around the mean value restricted on some of the negative values to better illustrated the (approximately) normal distribution of the random effects.
re1<-resultUH$summary.random$ID.area[,2]
# plot(density(re1)) # plot density random effects
plot(density(re1[re1>-2e-04]), main="Density Plot Random Effects Frailty Model") # restrict
We next map the random effects term to illustrate the distribution of the unstructured heterogeneity or variance. The results (as expected) appear generally as a spatially random distribution of variance above and below the mean value.
# add random effects results to dataframe
testDat$UH<-re1
# create cuts
cuts <-quantile(testDat$UH,probs = seq(0, 1, by = 0.20))
testDat$UH.cut<-cut(testDat$UH,breaks=cuts,include.lowest=TRUE)
#merge dataframes, save to data slot of map
testDat$ZCTA5CE10<-testDat$zcta
attr(nyc,"data") <- merge(nycDat,testDat,by="ZCTA5CE10")
# map
spplot(nyc, "UH.cut", col.regions= rev(grey.colors(5)),main="Spatial Plot Random Effects Frailty Model")
The following image recreates the plot with ggplot
.
nyc.df <- fortify(nyc, region='ZCTA5CE10')
nyc.df <- merge(nyc.df, testDat, by.x="id", by.y="zcta", all=FALSE)
nyc.df <- nyc.df[order(nyc.df$group, nyc.df$order), ]
library(RColorBrewer)
p1<-ggplot(nyc.df, aes(x=long, y=lat))
p2<-p1+geom_polygon(aes(group=group, fill=UH.cut))
p3<- p2 + scale_fill_brewer(palette="Blues", name="Random\nEffects")
map1<- p3+ theme(axis.text.x = element_blank(), axis.text.y = element_blank(),
axis.ticks = element_blank()) +
theme(panel.background = element_rect(colour = NA)) +
xlab("")+ylab("")+
ggtitle("Unstructured Heterogeneity (Above and Below Mean), Positive COVID19 Tests,
New York City ZIP Code Tabulation Areas April 3-8, 2019")
print(map1)
The convolution models add a spatially-structured conditional autoregression term (\(\nu\)) to the spatially-unstructured heterogeneity random effect term (\(\upsilon\)) of the frailty model.
\[ y_i = \alpha + \upsilon_i + \nu_i \]
The model statement now specifies a Besag-York-Mollie model, and references a spatial neighbor adjacency matrix which allows for connected components 4.
The DIC of 1807.60 (with 175.98 effective parameters) is an improvment over the baseline unstructured heterogeneity frailty model, indicating the spatial component is adding information to the simple unstructured model.
the mean count of positive tests within a ZCTA has increased to 76.5 (95% Cr I 56.5, 103.0), indicating that some of the information in the model has shifted from unstructured or random variance to structured spatial variance. The incidence density of the unstructured heterogeneity appears similar to the frailty model. The density plot of the spatial heterogeneity is similarly normally bunced around the mean.
formulaCAR <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)
resultCAR <- inla(formulaCAR,family="poisson", data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCAR)
##
## Call:
## c("inla(formula = formulaCAR, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 1.4306 0.6584 0.9391 3.0281
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 4.3379 0.1529 4.0338 4.3392 4.6348 4.3418 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 78563.94 10034.25 60460.59
## Precision for ID.area (spatial component) 49807.85 8164.09 35555.22
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 78027.34 99834.11 77051.76
## Precision for ID.area (spatial component) 49206.19 67569.48 48068.37
##
## Expected number of effective parameters(std dev): 175.98(0.0778)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1807.60
## Effective number of parameters: 175.98
##
## Marginal Likelihood: -1606.26
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
exp(resultCAR$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 76.54614 1.165231 56.47445 76.644 103.0036 76.84784 1
re<-resultCAR$summary.random$ID.area[1:length(testDat$zcta),2]
plot(density(re[re>-0.002]), main="Density Plot Random Effects Convolution Model") # plot density random effects
car<-resultCAR$summary.random$ID.area[(length(testDat$zcta)+1):(2*length(testDat$ZCTA)),2]
plot(density(car[car>-0.0005]),main="Density Plot Spatial Effects Convolution Model") # plot density spatial effects
We next map the results of the spatial heterogeneity. Again, these are variance values above and below zero, only now they represent spatial variance. To the eye, there appears to be some more structure to the data than in the simple frailty model, with some ZCTAs demonstrating higher variance estimates than others.
# add car results to dataframe
testDat$car<-car
# create cuts
cuts <- quantile(testDat$car,probs = seq(0, 1, by = 0.20))
summary(cuts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -8.357e-03 4.581e-05 2.802e-04 -1.036e-03 5.189e-04 1.016e-03
testDat$car.cut<-cut(testDat$car,breaks=cuts, include.lowest=TRUE, include.highest=TRUE)
nyc.df <- fortify(nyc, region='ZCTA5CE10')
nyc.df <- merge(nyc.df, testDat, by.x="id", by.y="zcta", all=FALSE)
nyc.df <- nyc.df[order(nyc.df$group, nyc.df$order), ]
# library(RColorBrewer)
p1<-ggplot(nyc.df, aes(x=long, y=lat))
p2<-p1+geom_polygon(aes(group=group, fill=car.cut))
p3<- p2 + scale_fill_brewer(palette="Blues", name="Spatial\nEffects")
map2<- p3+ theme(axis.text.x = element_blank(), axis.text.y = element_blank(),
axis.ticks = element_blank()) +
theme(panel.background = element_rect(colour = NA)) +
xlab("")+ylab("")+
ggtitle("Spatially Structured (CAR) Heterogeneity,Positive COVID19 Tests,
New York City ZIP Code Tabulation Areas April 3-22, 2019")
print(map2)
Using calcuations 5 from the fitted convolution model, we first map the so-called fitted variance values, which are the sum of the mean value, plus the unstructured and spatially structured variance components (\(\theta = \alpha + \upsilon + \nu\)).
#risk (theta = alpha + upsilon + nu)
CARfit <- resultCAR$summary.fitted.values[,1]
testDat$fit<-CARfit
cut.fit<-quantile(testDat$fit,probs = seq(0, 1, by = 0.20))
testDat$fit.cut<-cut(testDat$fit,breaks=cut.fit,include.lowest=TRUE)
# summary(testDat$fit.cut)
nyc.df <- fortify(nyc, region='ZCTA5CE10')
nyc.df <- merge(nyc.df, testDat, by.x="id", by.y="zcta", all=FALSE)
nyc.df <- nyc.df[order(nyc.df$group, nyc.df$order), ]
p1<-ggplot(nyc.df, aes(x=long, y=lat))
p2<-p1+geom_polygon(aes(group=group, fill=fit.cut))
p3<- p2 + scale_fill_brewer(palette="Blues", name="Fitted Values")
map5<- p3+ theme(axis.text.x = element_blank(), axis.text.y = element_blank(),
axis.ticks = element_blank()) +
theme(panel.background = element_rect(colour = NA)) +
xlab("")+ylab("")+
ggtitle("Fitted Model Values, Heterogeneity,Positive COVID19 Tests,
New York City ZIP Code Tabulation Areas April 3-22, 2019")
print(map5)
The spatial risk estimate is calculated as the sum of the unstructured and spatially structured variance components ((\(\zeta = \upsilon + \nu\)))
#spatial risk (zeta = upsilon + nu)
CARmarginals <- resultCAR$marginals.random$ID.area[1:177]
CARzeta <- lapply(CARmarginals,function(x)inla.emarginal(exp,x)) # exponentiate
testDat$risk<-unlist(CARzeta)
cut.risk<-quantile(testDat$risk, probs = seq(0, 1, by = 0.20))
testDat$risk.cut<-cut(testDat$risk,breaks=cut.risk,include.lowest=TRUE)
# summary(testDat$risk.cut)
nyc.df <- fortify(nyc, region='ZCTA5CE10')
nyc.df <- merge(nyc.df, testDat, by.x="id", by.y="zcta", all=FALSE)
nyc.df <- nyc.df[order(nyc.df$group, nyc.df$order), ]
p1<-ggplot(nyc.df, aes(x=long, y=lat))
p2<-p1+geom_polygon(aes(group=group, fill=risk.cut))
p3<- p2 + scale_fill_brewer(palette="Blues", name="Spatial\nRisk")
map6<- p3+ theme(axis.text.x = element_blank(), axis.text.y = element_blank(),
axis.ticks = element_blank()) +
theme(panel.background = element_rect(colour = NA)) +
xlab("")+ylab("")+
ggtitle("Spatially Structured Risk Estimates, Heterogeneity,Positive COVID19 Tests,
New York City ZIP Code Tabulation Areas April 3-22, 2019")
print(map6)
Finally, we calculate and map the probability of relative risk greater than 1, is a type of “hot spot” map.
# exceedence probability
a=0
CARexceed<-lapply(resultCAR$marginals.random$ID.area[1:177], function(X){
1-inla.pmarginal(a, X)
})
testDat$exceed<-unlist(CARexceed)
cut.exceed<-quantile(testDat$exceed,probs = seq(0, 1, by = 0.30))
testDat$exceed.cut<-cut(testDat$exceed, breaks=cut.exceed, include.lowest=TRUE)
# summary(testDat$exceed.cut)
# testDat$exceed
nyc.df <- fortify(nyc, region='ZCTA5CE10')
nyc.df <- merge(nyc.df, testDat, by.x="id", by.y="zcta", all=FALSE)
nyc.df <- nyc.df[order(nyc.df$group, nyc.df$order), ]
p1<-ggplot(nyc.df, aes(x=long, y=lat))
p2<-p1+geom_polygon(aes(group=group, fill=exceed.cut))
p3<- p2 + scale_fill_brewer(palette="Blues", name="Exceedence")
map7<- p3+ theme(axis.text.x = element_blank(), axis.text.y = element_blank(),
axis.ticks = element_blank()) +
theme(panel.background = element_rect(colour = NA)) +
xlab("")+ylab("")+
ggtitle("Probability Exceedence RR > 1, Heterogeneity,Positive COVID19 Tests,
New York City ZIP Code Tabulation Areas April 3-22, 2019")
print(map7)
Finally, we estimate the proportion of the variance explained 6 by geographic variation or place, which for this model is approximately 32%
mat.marg <- matrix(NA, nrow=177, ncol=1000)
m<-resultCAR$marginals.random$ID.area
for (i in 1:177){
u<-m[[177+i]]
s<-inla.rmarginal(1000, u)
mat.marg[i,]<-s}
var.RRspatial<-mean(apply(mat.marg, 2, sd))^2
var.RRhet<-inla.emarginal(function(x) 1/x,
resultCAR$marginals.hyper$"Precision for ID.area (iid component)")
var.RRspatial/(var.RRspatial+var.RRhet) # 0.3200823
## [1] 0.3204903
While important, place alone does not account for the majority of the variation in the data. We explore to see if there are other factors that can more fully explain the variation. In the following sections we look at population density, median household income, school density, housing density, and a measure of social disorder called the Congdon Index.
This first model adds a scaled estimate of population density to the convolution model above. The DIC for this model is 1805.89 on 176.08 effective parameters, similar to the convolution model. The relative risk estimate for population density is 1.5 (Cr I 1.2, 2.2), indicating that for every 1 unit increase in a standardized measure of population density there is an approximately 50% increase in the risk of a positive COVID-19 test in a ZCTA, adjusted for the number of tests performed.
formulaCOV1 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(POPDEN2010)
resultCOV1 <- inla(formulaCOV1,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV1)
##
## Call:
## c("inla(formula = formulaCOV1, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1644 0.7708 0.0327 0.9680
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 4.4556 0.1592 4.1394 4.4567 4.7652 4.4592 0
## scale(POPDEN2010) 0.4164 0.1789 0.0661 0.4159 0.7687 0.4150 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 81441.14 10410.38 62456.43
## Precision for ID.area (spatial component) 48642.69 7979.13 34706.10
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 80985.37 103232.31 80259.74
## Precision for ID.area (spatial component) 48057.71 65992.86 46954.25
##
## Expected number of effective parameters(std dev): 176.09(0.0698)
## Number of equivalent replicates : 1.005
##
## Deviance Information Criterion: 1805.89
## Effective number of parameters: 176.08
##
## Marginal Likelihood: -1608.73
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV1$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 86.105746 1.172605 62.764857 86.205575 117.350329
## scale(POPDEN2010) 1.516417 1.195876 1.068316 1.515743 2.156968
## mode kld
## (Intercept) 86.414637 1
## scale(POPDEN2010) 1.514399 1
By contrast, for each unit increase in a standardized measure of median household income in a ZIP Code Tabulation Area, there is a an approximately 46% decrease in the number of positive COVID19 tests. (Incidence Density Ratio = 0.54, 95% CrI 0.43, 0.69). This model had a slightly improved DIC, at 1803.51.
formulaCOV2 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(MHI)
resultCOV2 <- inla(formulaCOV2,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV2)
##
## Call:
## c("inla(formula = formulaCOV2, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1160 0.6455 0.0420 0.8035
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 5.0191 0.1917 4.6399 5.0199 5.3936 5.0215 0
## scale(MHI) -0.6119 0.1194 -0.8473 -0.6117 -0.3781 -0.6112 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 89559.50 11265.11 69221.85
## Precision for ID.area (spatial component) 53825.38 8606.83 38843.93
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 88959.01 113435.35 87860.53
## Precision for ID.area (spatial component) 53170.83 72595.67 51904.56
##
## Expected number of effective parameters(std dev): 175.93(0.0805)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1803.51
## Effective number of parameters: 175.90
##
## Marginal Likelihood: -1599.16
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV2$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 151.2791564 1.211339 103.5375077 151.397811 219.9958200
## scale(MHI) 0.5422973 1.126800 0.4285869 0.542424 0.6851286
## mode kld
## (Intercept) 151.6386961 1
## scale(MHI) 0.5426838 1
The density of schools in a ZIP Code Tabulation Area does not appear to be associated with risk of positive COVID19 tests.
formulaCOV3 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(schDen)
resultCOV3 <- inla(formulaCOV3,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV3)
##
## Call:
## c("inla(formula = formulaCOV3, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1115 0.7999 0.0318 0.9432
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 4.3285 0.1536 4.023 4.3298 4.6266 4.3325 0
## scale(schDen) -0.1731 0.1713 -0.511 -0.1727 0.1621 -0.1718 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 78425.13 10012.90 60088.02
## Precision for ID.area (spatial component) 49497.02 8134.05 35268.88
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 78013.35 99340.81 77391.14
## Precision for ID.area (spatial component) 48910.27 67154.77 47812.06
##
## Expected number of effective parameters(std dev): 176.09(0.0703)
## Number of equivalent replicates : 1.005
##
## Deviance Information Criterion: 1806.91
## Effective number of parameters: 176.09
##
## Marginal Likelihood: -1610.97
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV3$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 75.8290784 1.166004 55.8665076 75.928513 102.163694
## scale(schDen) 0.8410627 1.186810 0.5999054 0.841428 1.175933
## mode kld
## (Intercept) 76.1351561 1
## scale(schDen) 0.8421715 1
The proportion of a population older than 65 nearly doubles the risk of higher positive COVID19 testing counts. (IDR = 1.9, 95% CrI 1.6, 2.4), The DIC for this model (1800.70) was better than that for population density or median household income.
formulaCOV4 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(propOld)
resultCOV4 <- inla(formulaCOV4,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV4)
##
## Call:
## c("inla(formula = formulaCOV4, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1119 0.6099 0.0328 0.7545
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 4.9971 0.1742 4.6525 4.9978 5.3373 4.9993 0
## scale(propOld) 0.6556 0.1098 0.4410 0.6552 0.8723 0.6544 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 95113.30 11926.66 73574.50
## Precision for ID.area (spatial component) 53282.13 8415.75 38629.79
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 94479.78 120384.80 93321.20
## Precision for ID.area (spatial component) 52642.56 71629.27 51399.96
##
## Expected number of effective parameters(std dev): 175.89(0.0846)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1800.70
## Effective number of parameters: 175.84
##
## Marginal Likelihood: -1594.69
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV4$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 147.983479 1.190321 104.847646 148.091430 207.95868
## scale(propOld) 1.926341 1.116080 1.554211 1.925564 2.39237
## mode kld
## (Intercept) 148.312848 1
## scale(propOld) 1.924017 1
A larger proportion of persons in a ZCTA who identify as Asian was associated with a lower risk of positive ZCTA tests (IDR=0.43, 95 % CrI 0.23, 0.77), though the DIC of 1806.07 was not as good as previous models.
formulaCOV5 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(propAsian)
resultCOV5 <- inla(formulaCOV5,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV5)
##
## Call:
## c("inla(formula = formulaCOV5, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1134 0.6491 0.0434 0.8059
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 4.6164 0.1782 4.2630 4.6174 4.9637 4.6195 0
## scale(propAsian) -0.8473 0.3006 -1.4399 -0.8467 -0.2586 -0.8454 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 81808.20 10377.00 62934.41
## Precision for ID.area (spatial component) 50496.98 8201.52 36111.30
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 81338.78 103544.68 80568.21
## Precision for ID.area (spatial component) 49920.89 68257.31 48849.87
##
## Expected number of effective parameters(std dev): 175.98(0.0778)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1806.07
## Effective number of parameters: 175.96
##
## Marginal Likelihood: -1606.97
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV5$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 101.1252844 1.195115 71.0259479 101.2294592 143.1201598
## scale(propAsian) 0.4285804 1.350657 0.2369453 0.4288386 0.7721541
## mode kld
## (Intercept) 101.4435898 1
## scale(propAsian) 0.4293655 1
Every one unit increase in a measure of housing density, was associated with a doubling of risk of positive COVID19 testing (IDR=2.0, 95% CrI 1.2, 3.2)
formulaCOV6 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(houseDensity1)
resultCOV6 <- inla(formulaCOV6,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV6)
##
## Call:
## c("inla(formula = formulaCOV6, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1124 0.6280 0.0323 0.7727
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode
## (Intercept) 4.5503 0.1693 4.2139 4.5515 4.8796 4.5541
## scale(houseDensity1) 0.6703 0.2552 0.1692 0.6702 1.1719 0.6698
## kld
## (Intercept) 0
## scale(houseDensity1) 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 81855.74 10468.71 62953.53
## Precision for ID.area (spatial component) 49927.30 8144.44 35663.53
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 81303.88 104022.04 80309.16
## Precision for ID.area (spatial component) 49346.23 67591.52 48260.46
##
## Expected number of effective parameters(std dev): 176.05(0.0733)
## Number of equivalent replicates : 1.005
##
## Deviance Information Criterion: 1806.69
## Effective number of parameters: 176.03
##
## Marginal Likelihood: -1607.65
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV6$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 94.661422 1.184524 67.622736 94.779031 131.573926
## scale(houseDensity1) 1.954897 1.290739 1.184397 1.954539 3.228231
## mode kld
## (Intercept) 95.022405 1
## scale(houseDensity1) 1.953856 1
The Congdon Index was a associated with a slightly lower 7 risk of positive COVID19 tests in a ZCTA. (IDR=0.8, 95% CrI 0.8, 0.9)
formulaCOV7 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ congdonIndex
resultCOV7 <- inla(formulaCOV7,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV7)
##
## Call:
## c("inla(formula = formulaCOV7, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1262 0.7949 0.0330 0.9541
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 4.7698 0.1734 4.4263 4.7707 5.1080 4.7726 0
## congdonIndex -0.1944 0.0440 -0.2812 -0.1942 -0.1082 -0.1940 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 87309.25 11040.87 67231.24
## Precision for ID.area (spatial component) 50767.25 8199.27 36405.21
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 86807.63 110432.49 85979.96
## Precision for ID.area (spatial component) 50180.97 68556.06 49079.52
##
## Expected number of effective parameters(std dev): 176.03(0.0735)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1802.85
## Effective number of parameters: 176.00
##
## Marginal Likelihood: -1603.24
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV7$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 117.8948121 1.189400 83.6173881 118.0020326 165.3421251
## congdonIndex 0.8233678 1.045012 0.7548613 0.8234657 0.8974261
## mode kld
## (Intercept) 118.2231074 1
## congdonIndex 0.8236664 1
The proportion of non-English speaking person in a ZCTA was not associated in a meaningful way with risk to positive COVID-19 tests. (IDR=1.3, 95% CrI 0.9, 1.8)
formulaCOV8 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(nonEnglish)
resultCOV8 <- inla(formulaCOV8,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV7)
##
## Call:
## c("inla(formula = formulaCOV7, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1262 0.7949 0.0330 0.9541
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 4.7698 0.1734 4.4263 4.7707 5.1080 4.7726 0
## congdonIndex -0.1944 0.0440 -0.2812 -0.1942 -0.1082 -0.1940 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 87309.25 11040.87 67231.24
## Precision for ID.area (spatial component) 50767.25 8199.27 36405.21
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 86807.63 110432.49 85979.96
## Precision for ID.area (spatial component) 50180.97 68556.06 49079.52
##
## Expected number of effective parameters(std dev): 176.03(0.0735)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1802.85
## Effective number of parameters: 176.00
##
## Marginal Likelihood: -1603.24
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV8$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 83.495958 1.175825 60.5306784 83.594114 114.404302
## scale(nonEnglish) 1.305086 1.186461 0.9333936 1.304686 1.827248
## mode kld
## (Intercept) 83.798237 1
## scale(nonEnglish) 1.303904 1
The proportion of Black residents in a ZCTA was strongly associated with the risk of positive COVID-19 tests. For every one unit increase in a scaled standardized measure of the proportion of Black residents, there was a nearly five-fold in increase in the risk of a positive COVID19 test. (IDR = 4.8, 95% Cr I 2.4, 9.7)
formulaCOV9 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(propBlack)
resultCOV9 <- inla(formulaCOV9,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV9)
##
## Call:
## c("inla(formula = formulaCOV9, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.9618 0.6605 0.4468 2.0691
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 5.1477 0.2293 4.6943 5.1486 5.5958 5.1504 0
## scale(propBlack) 1.5752 0.3555 0.8792 1.5743 2.2758 1.5724 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 88487.14 11247.53 68209.30
## Precision for ID.area (spatial component) 51798.83 8301.79 37223.87
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 87877.18 112354.34 86757.12
## Precision for ID.area (spatial component) 51218.45 69770.98 50135.27
##
## Expected number of effective parameters(std dev): 175.90(0.0832)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1805.57
## Effective number of parameters: 175.88
##
## Marginal Likelihood: -1601.16
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV9$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 172.035496 1.257761 109.325940 172.186336 269.291294
## scale(propBlack) 4.831777 1.426879 2.408876 4.827191 9.735518
## mode kld
## (Intercept) 172.496211 1
## scale(propBlack) 4.818144 1
A higher proportion of Hispanic residents in a ZCTA was associated with a lower risk of positive COVID19 tests (IDR=0.5, 95% CrI 0.4, 0.8)
formulaCOV10 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(propHisp)
resultCOV10 <- inla(formulaCOV10,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV10)
# resultCOV1$dic
exp(resultCOV10$summary.fixed)
Proportion of persons on public assistance was not associated with higher risk of positive COVID19 tests. (IDR=1.2, 95% CrI 0.9, 1.6)
formulaCOV10 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(propPublicAssist2)
resultCOV10 <- inla(formulaCOV10,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV10)
##
## Call:
## c("inla(formula = formulaCOV10, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1357 0.9525 0.0331 1.1213
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 4.3881 0.1563 4.0775 4.3893 4.6919
## scale(propPublicAssist2) 0.2066 0.1396 -0.0672 0.2064 0.4813
## mode kld
## (Intercept) 4.3918 0
## scale(propPublicAssist2) 0.2059 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 79224.57 10112.20 60708.68
## Precision for ID.area (spatial component) 49261.95 8078.45 35125.14
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 78807.61 100349.52 78175.92
## Precision for ID.area (spatial component) 48681.40 66792.73 47595.62
##
## Expected number of effective parameters(std dev): 176.05(0.072)
## Number of equivalent replicates : 1.005
##
## Deviance Information Criterion: 1806.57
## Effective number of parameters: 176.04
##
## Marginal Likelihood: -1610.59
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV10$summary.fixed)
## mean sd 0.025quant 0.5quant
## (Intercept) 80.486492 1.169214 58.9963210 80.582815
## scale(propPublicAssist2) 1.229447 1.149823 0.9349739 1.229186
## 0.975quant mode kld
## (Intercept) 109.059964 80.783336 1
## scale(propPublicAssist2) 1.618104 1.228677 1
The presence of persons with heart disease in a ZCTA was associated with a doubling of the risk of positive COVID-19 tests (IDR=2.1, 95% CrI 1.5, 3.0)
formulaCOV11 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(heartDisease)
resultCOV11 <- inla(formulaCOV11,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV11)
##
## Call:
## c("inla(formula = formulaCOV11, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1146 0.6858 0.0431 0.8434
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode
## (Intercept) 4.7881 0.1738 4.4435 4.7890 5.1268 4.7910
## scale(heartDisease) 0.7511 0.1671 0.4234 0.7507 1.0800 0.7502
## kld
## (Intercept) 0
## scale(heartDisease) 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 87763.73 11110.87 67686.88
## Precision for ID.area (spatial component) 51890.02 8313.74 37284.95
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 87182.02 111279.29 86131.85
## Precision for ID.area (spatial component) 51312.78 69874.29 50238.67
##
## Expected number of effective parameters(std dev): 175.93(0.081)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1804.32
## Effective number of parameters: 175.90
##
## Marginal Likelihood: -1601.70
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV11$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 120.067431 1.189841 85.076146 120.185425 168.482499
## scale(heartDisease) 2.119234 1.181920 1.527147 2.118588 2.944825
## mode kld
## (Intercept) 120.427484 1
## scale(heartDisease) 2.117323 1
The proportion of persons in a ZCTA with chronic obstructive pulmonary disease was strongly associated with the risk of positive COVID-19 tests in a ZCTA (IDR=8.2, 95% CrI 3.7, 18.3)
formulaCOV12 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(COPD)
resultCOV12 <- inla(formulaCOV12,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultCOV12)
##
## Call:
## c("inla(formula = formulaCOV12, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1136 0.7670 0.0337 0.9144
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 5.7326 0.2994 5.1423 5.7331 6.3192 5.7342 0
## scale(COPD) 2.1037 0.4068 1.3063 2.1030 2.9046 2.1015 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 90984.76 11480.62 70173.13
## Precision for ID.area (spatial component) 53786.77 8576.01 38752.05
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 90417.77 115170.56 89426.87
## Precision for ID.area (spatial component) 53176.32 72384.51 52025.08
##
## Expected number of effective parameters(std dev): 175.87(0.0853)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1804.79
## Effective number of parameters: 175.84
##
## Marginal Likelihood: -1597.85
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultCOV12$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 308.763068 1.349036 171.105811 308.934098 555.11232
## scale(COPD) 8.196671 1.502002 3.692394 8.190391 18.25753
## mode kld
## (Intercept) 309.280907 1
## scale(COPD) 8.178088 1
summaryUnivariate<-data.frame(rbind(
round(exp(resultCOV1$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV2$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV3$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV4$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV5$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV6$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV7$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV8$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV9$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV10$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV11$summary.fixed[2,c(1,3,5)]),1),
round(exp(resultCOV12$summary.fixed[2,c(1,3,5)]),1)
))
names(summaryUnivariate)<-c("IDR", "2.5%", "97.5%")
summaryUnivariate$Model<-c("Population Density", "Median Household Income", "School Density", "Older than 65", "Asian", "Housing Density", "Congdon Index", "Language", "Black", "Hispanic", "Heart Disease", "COPD")
summaryUnivariate<-summaryUnivariate[,c(4,1,2,3)]
kable_styling(kable(summaryUnivariate))
Model | IDR | 2.5% | 97.5% |
---|---|---|---|
Population Density | 1.5 | 1.1 | 2.2 |
Median Household Income | 0.5 | 0.4 | 0.7 |
School Density | 0.8 | 0.6 | 1.2 |
Older than 65 | 1.9 | 1.6 | 2.4 |
Asian | 0.4 | 0.2 | 0.8 |
Housing Density | 2.0 | 1.2 | 3.2 |
Congdon Index | 0.8 | 0.8 | 0.9 |
Language | 1.3 | 0.9 | 1.8 |
Black | 4.8 | 2.4 | 9.7 |
Hispanic | 1.2 | 0.9 | 1.6 |
Heart Disease | 2.1 | 1.5 | 2.9 |
COPD | 8.2 | 3.7 | 18.3 |
In a multivariable model including COPD, Heart Disease, proportion of Black residents, housing density, and age greater than 65, the only 2 variables that remained associated with positive COVID-19 testing with a probability greater than chance were the proportion of black residents and older persons.
formulaMultivar1 <- Positive~ f(ID.area, model="bym", offset(Total), graph=nyc.adj, adjust.for.con.comp = FALSE)+ scale(COPD) + scale(heartDisease) +scale(propBlack)+scale(propOld)+scale(houseDensity1)
resultMultivar1 <- inla(formulaMultivar1,family="poisson",data=testDat, control.compute=list(dic=TRUE,cpo=TRUE))
summary(resultMultivar1)
##
## Call:
## c("inla(formula = formulaMultivar1, family = \"poisson\", data = testDat, ", " control.compute = list(dic = TRUE, cpo = TRUE))")
##
## Time used:
## Pre-processing Running inla Post-processing Total
## 0.1315 0.9892 0.0396 1.1603
##
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode
## (Intercept) 5.8688 0.2958 5.2865 5.8690 6.4493 5.8694
## scale(COPD) 0.8426 0.4703 -0.0818 0.8426 1.7661 0.8426
## scale(heartDisease) 0.2405 0.1859 -0.1249 0.2404 0.6057 0.2403
## scale(propBlack) 0.8301 0.3624 0.1182 0.8300 1.5423 0.8296
## scale(propOld) 0.4039 0.1274 0.1546 0.4035 0.6549 0.4028
## scale(houseDensity1) 0.0748 0.2544 -0.4253 0.0749 0.5741 0.0750
## kld
## (Intercept) 0
## scale(COPD) 0
## scale(heartDisease) 0
## scale(propBlack) 0
## scale(propOld) 0
## scale(houseDensity1) 0
##
## Random effects:
## Name Model
## ID.area BYM model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for ID.area (iid component) 101663.36 12673.41 78760.41
## Precision for ID.area (spatial component) 57580.71 8955.54 41796.18
## 0.5quant 0.975quant mode
## Precision for ID.area (iid component) 100995.58 128501.47 99775.95
## Precision for ID.area (spatial component) 56975.08 76911.03 55846.59
##
## Expected number of effective parameters(std dev): 175.97(0.08)
## Number of equivalent replicates : 1.006
##
## Deviance Information Criterion: 1800.35
## Effective number of parameters: 175.92
##
## Marginal Likelihood: -1605.11
## CPO and PIT are computed
##
## Posterior marginals for linear predictor and fitted values computed
# resultCOV1$dic
exp(resultMultivar1$summary.fixed)
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) 353.818947 1.344230 197.6575788 353.892978 632.231395
## scale(COPD) 2.322435 1.600398 0.9214812 2.322389 5.848275
## scale(heartDisease) 1.271831 1.204321 0.8826129 1.271775 1.832446
## scale(propBlack) 2.293637 1.436792 1.1254481 2.293222 4.675561
## scale(propOld) 1.497614 1.135838 1.1671696 1.497069 1.924994
## scale(houseDensity1) 1.077690 1.289640 0.6536017 1.077745 1.775537
## mode kld
## (Intercept) 354.048520 1
## scale(COPD) 2.322388 1
## scale(heartDisease) 1.271683 1
## scale(propBlack) 2.292468 1
## scale(propOld) 1.495997 1
## scale(houseDensity1) 1.077880 1
multiVarMod<-data.frame(round(exp(resultMultivar1$summary.fixed)[,c(1,3,5)],2))
names(multiVarMod)<-c("IDR", "2.5%", "97.5%")
multiVarMod$Variable<-c("Intercept","COPD", "Heart Disease", "Black", "Older than 65", "Housing Density" )
multiVarMod<-multiVarMod[,c(4,1,2,3)]
rownames(multiVarMod)<-1:nrow(multiVarMod)
kable_styling(kable(multiVarMod))
Variable | IDR | 2.5% | 97.5% |
---|---|---|---|
Intercept | 353.82 | 197.66 | 632.23 |
COPD | 2.32 | 0.92 | 5.85 |
Heart Disease | 1.27 | 0.88 | 1.83 |
Black | 2.29 | 1.13 | 4.68 |
Older than 65 | 1.50 | 1.17 | 1.92 |
Housing Density | 1.08 | 0.65 | 1.78 |
Zev Ross file has covariate data at the ZCTA level. Can find data dictionary at “data/data_dictionary.xls”
Median Household Income at ZCTA downloaded from Institute for Social Research which also has link for census tract to ZCTA crosswalk file. Also comes with a population estimated that differs from the one in Zev Ross’s file.
covariates<-read.csv("data/zcta_varsFIN.csv",header=T, stringsAsFactors=F)
str(covariates)
# select variables
# schDen (School Density (schools per square kilometer))
# POP2010 2010 Population: Total
# AG65UP 2010 Population: 65 years and older
# BLK 2010 Population: Black or African American alone
# ASN 2010 Population: Asian alone
# POPDEN2010 2010 Population Density (persons per square kilometer)
# HOUS2000 2000 Public Assistance Income in 1999 for Households - Total
# POPLNGOTH population older than 5 speaking other than English (to be used with POPTOT5 (population older than 5) to calculate proportion )
covariates2<-covariates[,c("zcta10","schDen","POP2010","AG65UP","BLK","ASN","POPDEN2010","HOUS2000", "POPTOT5", "POPLNGOTH")]
covariates2$zcta<-substr(covariates2$zcta10,3,7) # fix ZCTA variable
covariates2$nonEnglish<-covariates2$POPLNGOTH/covariates2$POPTOT5
covariates2$propBlack<-covariates2$BLK/covariates2$POP2010
covariates2$propOld<-covariates2$AG65UP/covariates2$POP2010
covariates2$propAsian<-covariates2$ASN/covariates2$POP2010
covariates2$nonEnglish<-covariates2$POPLNGOTH/covariates2$POPTOT5
covariates2$propAssistance<-covariates2$HOUS2000/covariates2$POP2010
MHI<-read.csv("taMHI.csv",header=T, stringsAsFactors=F)
names(MHI)[1]<-"zcta"
MHI$zcta<-as.character(MHI$zcta)
MHI<-MHI[,-3] # remove POP variable
covariates3<-merge(MHI, covariates2, by="zcta", all.x=F, all.y=T)
Housing density and area in square miles at ZCTA level can be created from the below code taken from “spatialEpiModels.Rnw”. Created two congdonZCTA densiity measures houseDensity1
is the simple rate of number of houses per square mile. houseDensity2
is this proportion divided by the total number of people in a ZCTA
congdonZCTA<-read.table(file="TIONS/SPATIAL EPI BOOK/spatialEpiHowTo/spatialEpiModels/censusPopHousing.txt", header=F, sep=",", skip=2, col.names=c("x1", "zcta", "x2", "x3","x4", "house.tot", "pop.tot", "house.rent", "house.occ"), colClasses=c("character", "character", "character", "character","character", "numeric", "numeric","numeric", "numeric"))
congdonZCTA<-congdonZCTA[,-c(1,3:5)]
congdonZCTA$zcta<-as.numeric(congdonZCTA$zcta)
complete<-complete.cases(congdonZCTA)
congdonZCTA<-congdonZCTA[complete,]
area<-read.table(file="TIONS/SPATIAL EPI BOOK/spatialEpiHowTo/spatialEpiModels/censusSqMiles.txt", header=T)
area<-area[,-c(2,3,5:7)]
str(area)
# merge
congdonZCTA2<-merge(congdonZCTA, area, by="zcta", all.x=T, all.y=F)
congdonZCTA2<-congdonZCTA2[,-3] # remove tot.pop variable
congdonZCTA2$houseDensity1<-congdonZCTA2$house.tot/congdonZCTA2$sq.miles
# merge to covariate data
covariates4<-merge(covariates3, congdonZCTA2, by="zcta", all.x=T, all.y=F)
covariates4$houseDensity2<-covariates4$houseDensity1/covariates4$POP2010
Create a variable based on Peter Congdon’s social fragmentation index 8 It combines 4 variables extracted from US census variables: the proportion of total congdonZCTA units that are not owner occupied, the proportion of vacant congdonZCTA units, the proportion of individuals living alone, and the proportion of congdonZCTA units into which someone recently moved. Based on Census definitions, a “recent” move is defined as anytime in the previous 9 years (since the last decennial census). As per the Congdon paper, the variables are standardized, and added with equal weight.
The resulting variable is normally distributed with mean zero and 95% quantiles -2.463311 and 2.205669.
Code can be found in srtsINLA.Snw starting at line 856. Will need to use crosswalk file obtained from Institute for Social Research (See note above for url)
# get variables for Congdon Index at Census Tract level
library(UScensus2010tract)
data(new_york.tract10)
# extract congdonZCTA variables for social fragmentation index from
#UScensus2010tract spatial polygon file
congdonZCTA<-attr(new_york.tract10, "data")
congdonZCTA<-subset(congdonZCTA, select=c(fips,H0010001,H0180002,H0180019,H0210001))
names(congdonZCTA)<-c("tract", "totHousing", "ownerHousing",
"aloneHousing", "vacantHousing")
# load pedDat from SRTS files
load(file="TS/srtsAnalysis/srtsNotes/srtsINLA/srtsINLA.RData")
congdonZCTA<-congdonZCTA[congdonZCTA$tract %in% pedDat$tract,]
# extract recent movers from downloaded US census file
recent<-read.csv("housingNYStracts/housingNYStracts.csv",header=T, stringsAsFactors=F)
recent<-subset(recent, select=c(GEO.id2, HC01_VC72))
names(recent)<-c("tract", "recentMove")
# merge congdonZCTA and recent movers files
congdonZCTA<-merge(congdonZCTA, recent, by="tract", all.x=T)
congdonZCTA$recentMove<-as.numeric(congdonZCTA$recentMove)
# set missing congdonZCTA data to zero
congdonZCTA$recentMove[is.na(congdonZCTA$recentMove)]<-0
# load crosswalk file of NYC tract to ZCTA level
xWalk<-read.csv("vert2ZCTA/nycTract2ZCTA.csv",header=T, stringsAsFactors=F)
# format the tract variables for merging
# congdonZCTA variable is 11 character string first 2 digits state, next 3 are county, final 6 are census tract, e.g. "36005000100" "36005000200"
congdonZCTA$tract2<-substr(congdonZCTA$tract,6,11)
# xWalk variable is 7 characters with a "." separating the last 2 digits, which needs to be removed
xWalk$tract2<-gsub("[.]","",xWalk$tract)
# merge congdonZCTA and xWalk files
congdonZCTA1<-merge(congdonZCTA, xWalk, by="tract2", all.x=T, all.y=F)
# sum up the Congdon elements by ZCTA
congdonZCTA2<-aggregate(. ~ congdonZCTA1$zcta5, data=congdonZCTA1[,3:7], FUN=sum)
names(congdonZCTA2)[1]<-"zcta"
# CONGDON CALCULATIONS
# calculate proportions based on total congdonZCTA units
# avoid division by zero
congdonZCTA2$owner<-congdonZCTA2$ownerHousing/(congdonZCTA2$totHousing+.1)
congdonZCTA2$alone<-congdonZCTA2$aloneHousing/(congdonZCTA2$totHousing+.1)
congdonZCTA2$vacant<-congdonZCTA2$vacantHousing/(congdonZCTA2$totHousing+.1)
congdonZCTA2$notOwner<-1-congdonZCTA2$owner
congdonZCTA2$recent<-congdonZCTA2$recentMove/(congdonZCTA2$totHousing+.1)
# standardize proportions
congdonZCTA2$aloneST<-
(congdonZCTA2$alone-mean(congdonZCTA2$alone, na.rm=T))/sd(congdonZCTA2$alone, na.rm=T)
summary(congdonZCTA2$aloneST)
congdonZCTA2$vacantST<-
(congdonZCTA2$vacant-mean(congdonZCTA2$vacant, na.rm=T))/sd(congdonZCTA2$vacant, na.rm=T)
summary(congdonZCTA2$vacantST)
congdonZCTA2$notOwnerST<-
(congdonZCTA2$notOwner-mean(congdonZCTA2$notOwner, na.rm=T))/sd(congdonZCTA2$notOwner, na.rm=T)
summary(congdonZCTA2$notOwnerST)
congdonZCTA2$recentST<-
(congdonZCTA2$recent-mean(congdonZCTA2$recent, na.rm=T))/sd(congdonZCTA2$recent, na.rm=T)
summary(congdonZCTA2$recentST)
# add standardized measures into single index
congdonZCTA2$index<-congdonZCTA2$aloneST+congdonZCTA2$vacantST+congdonZCTA2$notOwnerST+congdonZCTA2$recentST
summary(congdonZCTA2$index)
plot(density(congdonZCTA2$index))
quantile(congdonZCTA2$index, probs=c(0.05,0.95))
# 5% 95%
# -3.056998 3.944542
# save the ZCTA-level Congdon Index file
saveRDS(congdonZCTA2, "~/nycZCTACongdon.RDS")
congdonZCTA2<-readRDS("~/nycZCTACongdon.RDS")
# merge covariates4 and congdon
# restrict congdon file to just index and zcta
congdon<-congdonZCTA2[,c("zcta","index")]
covariates5<-merge(covariates4, congdon, by="zcta", all.x=T, all.y=F)
str(covariates5)
names(covariates5)[24]<-"congdonIndex"
# NB: can get county and zip names from xWalk file and merge to the covariate file if needed (only helpful for queens), but difficult to merge to the covariate file, need to remove duplicate zcta's
# zipNames<-xWalk[-1,c("zcta5", "cntyname", "zipname")]
# names(zipNames)[1]<-"zcta"
# merge to covariates file
# covariates6<-merge(covariates5, zipNames, by="zcta", all.x=T, all.y=F)
# save the covariates file
Downloaded updated data from New York City OpenData portal.
newDat1<-read.csv("C/Demographic_Statistics_By_Zip_Code.csv",header=T, stringsAsFactors=F)
newDat1$JURISDICTION.NAME<-as.character(newDat1$JURISDICTION.NAME)
newDat2<-newDat1[,c("JURISDICTION.NAME","PERCENT.HISPANIC.LATINO","PERCENT.US.CITIZEN","PERCENT.RECEIVES.PUBLIC.ASSISTANCE" )]
names(newDat2)<-c("zcta", "propHisp", "propCitizen", "propPublicAssist2")
covariates6<-merge(covariates5, newDat2, by="zcta", all.x=T, all.Y=F)
Downloaded shapefile data on ZCTA-level cardiovascular and diabetes from Simply Analytics App via NYU Libraries
library(foreign)
# heart disease
shapefile1<-read.dbf("~/simplyAnalyticsData/SimplyAnalytics_Shapefiles(1)/SimplyAnalytics_Shapefiles_2020-04-19_15_07_55_9a48bb6f434ea2cdb7c907ec71313405.dbf")
str(shapefile1)
# variable_names.txt
# VALUE0 # Population, 2019
# VALUE1 % MEDICINE/DRUGS/AILMENTS | AILMENTS | AILMENTS HAVE/HAD IN LAST 12 MONTHS? | HEART DISEASE/CONGESTIVE HEART FAILURE, 2019
names(shapefile1)[1]<-"zcta"
names(shapefile1)[4]<-"heartDisease"
heartDisease<-shapefile1[,c(1,4)]
head(heartDisease)
heartDisease$heartDisease[is.na(heartDisease$heartDisease)]<-0.001
covariates7<-merge(covariates6,heartDisease, by="zcta", all.x=T, all.Y=F)
# Other health metrics
# HTN, CAD, Emphysema
shapefile2<-read.dbf("~/simplyAnalyticsData/SimplyAnalytics_Shapefiles(2)/SimplyAnalytics_Shapefiles_2020-04-19_15_12_56_576cfa55f705a769f6bfe050f89469ea.dbf")
str(shapefile2)
# VALUE0 # MEDICINE/DRUGS/AILMENTS | AILMENTS | AILMENTS HAVE/HAD IN LAST 12 MONTHS? | HYPERTENSION/HIGH BLOOD PRESSURE, 2019
# VALUE1 % MEDICINE/DRUGS/AILMENTS | AILMENTS | AILMENTS HAVE/HAD IN LAST 12 MONTHS? | HEART DISEASE/CONGESTIVE HEART FAILURE, 2019
# VALUE2 % MEDICINE/DRUGS/AILMENTS | AILMENTS | AILMENTS HAVE/HAD IN LAST 12 MONTHS? | EMPHYSEMA, 2019
# VALUE3 # MEDICINE/DRUGS/AILMENTS | AILMENTS | AILMENTS HAVE/HAD IN LAST 12 MONTHS? | HEART ATTACK/STROKE, 2019
# DM, COPD
shapefile3<-read.dbf("~/simplyAnalyticsData/SimplyAnalytics_Shapefiles(3)/SimplyAnalytics_Shapefiles_2020-04-19_15_20_02_0fac73e47dd32eb23d46fa3821630f62.dbf")
str(shapefile3)
# VALUE0 # MEDICINE/DRUGS/AILMENTS | AILMENTS | AILMENTS HAVE/HAD IN LAST 12 MONTHS? | DIABETES TYPE 1, 2019
# VALUE1 # MEDICINE/DRUGS/AILMENTS | AILMENTS | AILMENTS HAVE/HAD IN LAST 12 MONTHS? | DIABETES TYPE 2, 2019
# VALUE2 # MEDICINE/DRUGS/AILMENTS | AILMENTS | AILMENTS HAVE/HAD IN LAST 12 MONTHS? | COPD (CHRONIC OBSTRUCTIVE PULMONARY DIS), 2019
names(shapefile3)[1]<-"zcta"
names(shapefile3)[5]<-"COPD"
COPD<-shapefile3[,c(1,5)]
COPD$COPD[is.na(COPD$COPD)]<-0.001
covariates8<-merge(covariates7,COPD, by="zcta", all.x=T, all.Y=F)
saveRDS(covariates8, "~/covidNYCZCTA_covariates.RDS")
NYC DOHMH releasing COVID19 positive tests data on GitHub Files are updated every 2 days or so. Have to track down previous files by clicking on “History”, selecting the date, then clicking the three-dot elipsis to access the link to the csv for that date.
dat422<-read.csv("https://github.com/nychealth/coronavirus-data/raw/master/tests-by-zcta.csv",header=T, stringsAsFactors=F) # 4/22 data
# recentDat<-read.csv("https://github.com/nychealth/coronavirus-data/raw/master/tests-by-zcta.csv",header=T, stringsAsFactors=F)
saveRDS(dat422, "~/covidNYCZCTA_422.RDS")
Merge the covariates data to the most recent total testing data.
covariates<-readRDS("~/covidNYCZCTA_covariates.RDS")
dat1<-readRDS("~/covidNYCZCTA_422.RDS")
names(dat1)[1]<-"zcta"
testDat<-merge(dat1, covariates, by="zcta", all.x=T, all.y=F )
# calculate rate positive per 10,000 population
testDat$rate<-testDat$Positive/testDat$POP2010*10000
# calcuate proportion positive per 10,000 tests
testDat$prop<-testDat$Positive/testDat$Total*10000
saveRDS(testDat, "~/testDat.RDS")
Read in spatial file of NYC ZCTAs. Index the map file to the test data file to restrict to ZCTAs with valid data entries.
nyc <- readShapePoly("ZCTA/tl_2010_36_zcta510NYC.shp")
valid<-nyc$ZCTA5CE10 %in% testDat$ZCTA
nyc<-nyc[valid,]
plot(nyc)
# save as an RDS object
saveRDS(nyc, "~/nycMap4COVID.RDS")
Create an adjacency matrix “nb” object from the map file using . Manually edit the matrix to create adjacencies between boros. First, create the adjacency matrix “nb” object from the map file using . Then use utility to manually edit the nb object. Interactively click on map to select two areas to connect, enter “y” to connect them, enter “c” to continue. Save the edited adjacency object. Plot the default and the edited adjacency matrices.
zzz <- poly2nb(nyc)
nnb1<-edit.nb(zzz, polys=nyc)
# added contiguities between points
# 67 and 74
# 66 and 142
# 25 and 161
# 34 and 137
# 44 and 122
# 13 and 131
# 52 and 83
# 12 and 93
# 4 and 148
# 157 and 175
# 51 and 55
# extract coordinates from shape file to plot adjacency matrix
nyc2<-st_read("ZCTA/tl_2010_36_zcta510NYC.shp")
nyc2<-nyc2[valid,]
coords <- st_coordinates(st_centroid(st_geometry(nyc2)))
# plot default ajdacency matrix
jpeg('~/adjacencyMatrixPlot1.jpg')
plot(nyc)
plot.nb(zzz, coords, add=T)
dev.off()
# plot edited matrix
# plot
jpeg('~/adjacencyMatrixPlot2.jpg')
plot(nyc)
plot.nb(nnb1, coords, add=T)
dev.off()
# save the nnb object
saveRDS(nnb1, "~/nnb1.RDS")
After correcting the nb object, use to get the nb object into the correct format for INLA, saving the resulting object for later use in models in the covidINLA directory. Set up a path to it saved as an object named “nyc.adj”
nb2INLA("~/nyc.graph", nnb1)
nyc.adj <- paste("~/nyc.graph")
file.show(nyc.adj)
# Read in previous data, 4/1, 4/3, 4/4
data4_1<-read.csv("https://github.com/nychealth/coronavirus-data/raw/097cbd70aa00eb635b17b177bc4546b2fce21895/tests-by-zcta.csv",header=T, stringsAsFactors=F)
data4_3<-read.csv("https://github.com/nychealth/coronavirus-data/raw/0074809280d3f9ae0bd09ca62629fb21243ffc72/tests-by-zcta.csv",header=T, stringsAsFactors=F)
data4_4<-read.csv("https://github.com/nychealth/coronavirus-data/raw/0ae531d56696b7dfa01d1d1ad6286d7ae03350c7/tests-by-zcta.csv",header=T, stringsAsFactors=F)
# read in 4/5 (on 4/5)
data4_5<-read.csv("https://github.com/nychealth/coronavirus-data/raw/master/tests-by-zcta.csv",header=T, stringsAsFactors=F)
# update read in 4/7 (on 4/7)
data4_7<-read.csv("https://github.com/nychealth/coronavirus-data/raw/master/tests-by-zcta.csv",header=T, stringsAsFactors=F)
# update 4/8 data (on 4/8)
data4_8<-read.csv("https://github.com/nychealth/coronavirus-data/raw/master/tests-by-zcta.csv",header=T, stringsAsFactors=F)
# load existing data
load("~/covidNYCZCTA_days.RData")
# save updated separate files as rdata object
save(data4_1,data4_3,data4_4,data4_5,data4_7, data4_8, file="~/covidNYCZCTA_days.RData")
#Combine into single data file, difference columns
# NB: 4/8 data has new column "99999" for NA, need to account for it to merge data sets
data4_8<-data4_8[-2,] # 3 entries for "99999" removed (for now)
dat1<-data.frame(ZCTA=data4_1$MODZCTA, x1=data4_1$Positive, x3=data4_3$Positive, x4=data4_4$Positive,x5=data4_5$Positive,x7=data4_7$Positive, x8=data4_8$Positive)
# difference the columns to get daily counts of positive tests
dat1$d1<-dat1$x1
dat1$d2<-dat1$x3-dat1$x1
dat1$d3<-dat1$x4-dat1$x3
dat1$d4<-dat1$x5-dat1$x4
dat1$d5<-dat1$x7-dat1$x5
dat1$d6<-dat1$x8-dat1$x7
# save file of positive test counts by zip code and day
saveRDS(dat1, "~/covidNYCZCTA.RDS")
dat1<-readRDS("~/covidNYCZCTA.RDS")
Load New York City ZCTA map object nycZIPS2
created on page 37 of “Spatial Epidemiology Notes” book. Save nycZIPS2
as nyc
.
load("TIONS/SPATIAL EPI BOOK/spatialEpiBook/spatialEpiModels/SpatialEpiModels.RData")
nyc<-nycZIPS2
Traditionally, one would use the total population as an offset and intepret exponentiated results as incidensity ratios. This is complicated in the setting of COVID19↩︎
This differs from the simple mean from the raw data in part because we are excluding the NA entries, and also because much of the information is being “shifted” to the random effects term.↩︎
Note that the mean and the mode are nearly identical.↩︎
Else, the model returns the warning “The graph for the model besag has 3 connected components!!! Model is revised accordingly.” Some useful information about the issue in the help file inla.doc("besag")
, and Havard Rue posted a helpful reply to a question In summary, “connected components” refers to disconnected subgraphs. The default setting for INLA is to “interpret this as a sum-to-zero constraint for each subgraph, instead of a sum-to-zero constraint for the union of the graphs”. You can over-ride that behavior with the option (..., adjust.for.con.comp = FALSE)
Because NYC geography is complex, it is difficult to completely avoid the issue without drastically altering the map.↩︎
The calculation involves creating an empty matrix, with rows equal to number of regions, with 1000 columns into which we extract 1000 observations from the marginal distribution of the CAR term (\(\nu\)), and from which we calculate the empirical variance. Then extract values for random effects (UH) term and calculate the proportion of total heterogeneity accounted for by the spatially structured heterogeneity. We see in the result that spatial clustering accounts for for about 26.3% of the total heterogeneity.↩︎
NB. In conversation, Andrew Lawson cautioned against measures like this because CAR and UH are incompletely identified. Even Blangiardo, from whom this code is adapted, says the UH and CAR variances are not directly comparable. But, the result can give an indication of the relative contribution of each component of the total spatial heterogeneity.↩︎
This is not surprising, as two components of the Congdon Index measure housing vacancies and single-occupant housing.↩︎
Peter Congdon. Suicide and Parasuicide in London: A Small-area Study. Urban Studies, Vol. 33, No. 1, 137-158, 1996 and Roman Pabayo, Beth E. Molnar, Nancy Street, and Ichiro Kawachi. The relationship between social fragmentation and sleep among adolescents living in Boston, Massachusetts. Journal of Public Health pp 1-12 J Public Health (2014) doi: 10.1093/pubmed/fdu001↩︎