The pcFactorStan package for R provides convenience functions and pre-programmed Stan models related to analysis of pairwise comparison data. Its purpose is to make fitting models using Stan easy. pcFactorStan relies on the rstan package, which should be installed first. See here for instructions on installing rstan.
One situation where a factor model might be useful is when there are people that play in tournaments of more than one game. For example, the computer player AlphaZero (Silver et al. 2018) has trained to play chess, shogi, and Go. We can take the tournament match outcome data for each of these games and find rankings among the players. We may also suspect that there is a latent board game skill that accounts for some proportion of the variance in the per-board game rankings. This proportion can be recovered by the factor model.
Our goal may be to fit a factor model, but it is necessary to build up the model step-by-step. There are essentially three models: 'unidim', 'covariance', and 'factor'. 'unidim' analyzes a single item. 'covariance' is suitable for two or more items. Once you have vetted your items with the 'unidim' and 'covariance' models, then you can try the 'factor' model. There is also a special model 'unidim_adapt'. Except for this model, the other models require a scaling constant. To find an appropriate scaling constant, we will fit 'unidim_adapt' to each item separately and then take the median of median point estimates to set the scale.
The R code below first loads rstan and pcFactorStan.
library(rstan)
library(pcFactorStan)
Next we take a peek at the data.
head(phyActFlowPropensity)
pa1 | pa2 | skill | predict | novelty | creative | complex | goal1 | feedback1 | chatter | waiting | body | control | present | spont | stakes | evaluated | reward |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mountain biking | tennis | 1 | -1 | -2 | 1 | 1 | 1 | 1 | -2 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 0 |
mountain biking | tennis | 1 | 2 | -1 | -1 | -1 | 0 | 2 | 1 | 2 | 0 | 1 | 0 | 0 | 1 | 2 | -1 |
ice skating | running | -2 | 1 | -1 | -2 | -1 | 1 | 1 | -2 | -2 | -1 | 0 | 0 | -1 | -1 | -1 | 0 |
climbing | rowing | -2 | 2 | -2 | -2 | -2 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 0 | 0 | 0 |
card game | gardening | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | -2 | 2 | 1 | 0 | 0 | 2 | -2 | 2 |
dance | table tennis | 0 | -2 | -1 | -1 | 0 | -1 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
These data consist of paired comparisons of 87 physical activities on 16 flow-related facets. The procedure was that participants submitted two activities using free-form input. These activities were substitute into item templates. For example, the 'predict' item asked, “How predictable is the action?” with response options:
A “somewhat more” response is scored 1 or -1 and “much more” scored 2 or -2. A tie (i.e. “roughly equal”) is scored as zero. We will need to analyze each item separately before we analyze them together. Therefore, we will start with 'skill'.
Data must be fed into Stan in a partially digested form. The next block of code demonstrates how a suitable data object may be constructed using the prepData()
function. This function automatically determines the
number of threshold parameters based on the
range observed in your data.
One thing it does not do is pick a varCorrection
factor. The varCorrection
determines the degree of adaption in the model. Usually a setting of between 2.0 to 4.0 will obtain optimal results.
dl <- prepData(phyActFlowPropensity[,c(paste0('pa',1:2), 'skill')])
dl$varCorrection <- 2.0
Next we fit the model using the pcStan()
function, which is a wrapper for stan()
from rstan. We also choose the number of chains.
As is customary Stan procedure, the first half of each chain is discarded as warm up.
fit1 <- pcStan("unidim_adapt", data=dl)
A variety of diagnostics are available to check whether the sampler ran into trouble.
check_hmc_diagnostics(fit1)
#>
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#>
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#>
#> Energy:
#> E-BFMI indicated no pathological behavior.
Everything looks good, but there are a few more things to check. We want \(\widehat R < 1.015\) and effective sample size greater than 100 times the number of chains (Vehtari et al., 2019).
allPars <- summary(fit1, probs=c())$summary
print(min(allPars[,'n_eff']))
#> [1] 785.6807
print(max(allPars[,'Rhat']))
#> [1] 1.004
Again, everything looks good. If the targets values were not reached then we would sample the model again with more iterations. Time for a plot,
library(ggplot2)
theta <- summary(fit1, pars=c("theta"), probs=c())$summary[,'mean']
palist <- levels(filterGraph(phyActFlowPropensity)$pa1)
ggplot(data.frame(x=theta, activity=palist, y=0.47)) +
geom_point(aes(x=x),y=0) +
geom_text(aes(label=activity, x=x, y=y),
angle=85, hjust=0, size=2,
position = position_jitter(width = 0, height = 0.4)) + ylim(0,1) +
theme(legend.position="none",
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
Intuitively, this seems like a fairly reasonable ranking for skill. As pretty as the plot is, the main reason that we fit this model was to find a scaling factor that would produce a standard deviation close to 1.0,
s50 <- summary(fit1, pars=c("scale"), probs=c(.5))$summary[,'50%']
print(s50)
#> [1] 0.6643013
We use the median instead of the mean because scale
is not likely to have a symmetric marginal posterior distribution.
We obtained 0.66, but that value is just for one item.
We have to perform the same procedure for every item.
Wow, that would be really tedious … if we did not have a function to do it for us!
Fortunately, calibrateItems
takes care of it and produces a table
of the pertinent data,
result <- calibrateItems(phyActFlowPropensity, iter=1000L)
print(result)
item | iter | divergent | treedepth | low_bfmi | n_eff | Rhat | scale | thetaVar |
---|---|---|---|---|---|---|---|---|
skill | 1500 | 0 | 0 | 0 | 680.37146 | 1.005625 | 0.6435277 | 0.8561198 |
predict | 1500 | 0 | 0 | 0 | 761.15886 | 1.005753 | 0.6155655 | 0.8388006 |
novelty | 1000 | 0 | 0 | 0 | 617.34970 | 1.005868 | 0.5040971 | 0.7808596 |
creative | 1500 | 0 | 0 | 0 | 782.21246 | 1.007882 | 0.4910805 | 0.7828963 |
complex | 1500 | 0 | 0 | 0 | 618.77199 | 1.001815 | 0.5786848 | 0.8213287 |
goal1 | 1000 | 234 | 0 | 3 | 38.70932 | 1.080710 | 0.0233462 | 0.2735013 |
feedback1 | 1000 | 0 | 0 | 1 | 24.75716 | 1.218889 | 0.1297325 | 0.4817001 |
chatter | 1000 | 0 | 0 | 0 | 631.36598 | 1.003779 | 0.2368478 | 0.6050697 |
waiting | 1000 | 0 | 0 | 0 | 622.06265 | 1.001788 | 0.5291597 | 0.7956639 |
body | 1000 | 0 | 0 | 0 | 630.34945 | 1.003235 | 0.3661471 | 0.7085937 |
control | 1000 | 0 | 0 | 0 | 581.48594 | 1.003033 | 0.3168158 | 0.6632210 |
present | 1000 | 0 | 0 | 0 | 630.55837 | 1.003091 | 0.2225332 | 0.5921971 |
spont | 1000 | 0 | 0 | 0 | 656.14892 | 1.012010 | 0.2546944 | 0.6225483 |
stakes | 1000 | 0 | 0 | 0 | 624.31770 | 1.007750 | 0.2755874 | 0.6391814 |
evaluated | 1000 | 0 | 0 | 0 | 565.64655 | 1.001916 | 0.4680233 | 0.7632993 |
reward | 1000 | 0 | 0 | 0 | 607.72124 | 1.002324 | 0.2017555 | 0.5738516 |
The items goal1 and feedback1 ran into trouble.
The nonzero count of divergent transitions and lowbfmi means that these items contained too little signal to estimate. We could try again with varCorrection=1.0
, but we are going to exclude them instead.
The model succeeded on the rest of the items.
I requested iter=1000L
to demonstrate how calibrateItems
will resample the model until the n_eff
is large enough and the Rhat
is small enough. Items _skill and predict (among others) needed 1500 samples instead of the default 1000.
The median scale across all included items is now readily available,
excl <- match(c('goal1','feedback1'), result$item)
median(result[-excl,'scale'])
#> [1] 0.4170852
Next we will fit the covariance model. We exclude the Cholesky
factor of the correlation matrix rawThetaCorChol
because the
regular correlation matrix is also available.
pafp <- phyActFlowPropensity
excl <- match(c('goal1','feedback1'), colnames(pafp))
pafp <- pafp[,-excl]
dl <- prepData(pafp)
dl$scale <- median(result[-excl,'scale'])
fit2 <- pcStan("covariance", data=dl, include=FALSE, pars=c('rawTheta', 'rawThetaCorChol'))
check_hmc_diagnostics(fit2)
#>
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#>
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#>
#> Energy:
#> E-BFMI indicated no pathological behavior.
allPars <- summary(fit2, probs=0.5)$summary
print(min(allPars[,'n_eff']))
#> [1] NaN
print(max(allPars[,'Rhat']))
#> [1] NaN
The HMC diagnostics looks good, but … oh dear!
Something is wrong with the n_eff
and \(\widehat R\).
Let us look more carefully,
head(allPars[order(allPars[,'sd']),])
#> mean se_mean sd 50% n_eff Rhat
#> thetaCor[1,1] 1 NaN 0.000000e+00 1 NaN NaN
#> thetaCor[2,2] 1 9.730366e-19 6.012894e-17 1 3818.6400 0.9989995
#> thetaCor[3,3] 1 1.390984e-18 6.349484e-17 1 2083.6861 0.9989995
#> thetaCor[4,4] 1 2.194746e-18 6.694478e-17 1 930.3894 0.9989995
#> thetaCor[5,5] 1 1.160020e-18 6.896321e-17 1 3534.3020 0.9989995
#> thetaCor[7,7] 1 2.714412e-18 7.522650e-17 1 768.0511 0.9989995
Ah ha! It looks like all the entries of the correlation matrix are reported, including the entries that are not stochastic but are fixed to constant values. We need to filter those out to get sensible results.
excl <- match(paste0('thetaCor[',1:dl$NITEMS,',', 1:dl$NITEMS,']'), rownames(allPars))
allPars <- allPars[-excl,]
print(min(allPars[,'n_eff']))
#> [1] 824.8427
print(max(allPars[,'Rhat']))
#> [1] 1.011792
Ah, much better. Now we can inspect the correlation matrix. There are many ways to visualize a correlation matrix. One of my favorite ways is to plot it using the qgraph package,
itemNames <- colnames(pafp)[-(1:2)]
tc <- summary(fit2, pars=c("thetaCor"), probs=c(.5))$summary[,'50%']
tcor <- matrix(tc, length(itemNames), length(itemNames))
dimnames(tcor) <- list(itemNames, itemNames)
library(qgraph)
qgraph(tcor, layout = "spring", graph = "cor", labels=colnames(tcor),
legend.cex = 0.3,
cut = 0.3, maximum = 1, minimum = 0, esize = 20,
vsize = 7, repulsion = 0.8)
Based on this plot and theoretical considerations, I decided to exclude spont, control, evaluated, and waiting from the factor model. A detailed rationale for why these items are excluded and not others will be presented in a forthcoming article. For now, let us focus on the mechanics of the next step in our analyses.
itemNames <- setdiff(itemNames, c('spont','control','evaluated','waiting'))
pafp <- pafp[,c(paste0('pa',1:2), itemNames)]
dl <- prepData(pafp)
dl$scale <- median(result[-excl,'scale']) # close enough
fit3 <- pcStan("factor", data=dl, include=FALSE,
pars=c('rawUnique', 'rawUniqueTheta', 'rawFactor', 'rawLoadings'))
check_hmc_diagnostics(fit3)
#>
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#>
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#>
#> Energy:
#> E-BFMI indicated no pathological behavior.
allPars <- summary(fit3, probs=0.5)$summary
print(min(allPars[,'n_eff']))
#> [1] 130.9063
print(max(allPars[,'Rhat']))
#> [1] 1.030636
The maximum \(\widehat R\) is too high. We need to try again with more iterations,
fit3 <- pcStan("factor", data=dl, include=FALSE, iter=4000L,
pars=c('rawUnique', 'rawUniqueTheta', 'rawFactor', 'rawLoadings'))
check_hmc_diagnostics(fit3)
#>
#> Divergences:
#> 0 of 8000 iterations ended with a divergence.
#>
#> Tree depth:
#> 0 of 8000 iterations saturated the maximum tree depth of 10.
#>
#> Energy:
#> E-BFMI indicated no pathological behavior.
allPars <- summary(fit3, probs=0.5)$summary
print(min(allPars[,'n_eff']))
#> [1] 1058.425
print(max(allPars[,'Rhat']))
#> [1] 1.011202
Looks good! Let us plot the marginal posterior distributions of the factor proportions,
library(reshape2)
prop <- summary(fit3, pars='factorProp', probs=c(0.1, 0.5, 0.9))$summary[,c('10%','50%','90%')]
colnames(prop) <- paste0('q',c(10,50,90))
prop <- as.data.frame(prop)
prop$item <- factor(itemNames)
rawProp <- extract(fit3, pars=c("factorProp"))[[1]]
colnames(rawProp) <- itemNames
rawProp <- rawProp[sample.int(nrow(rawProp), 500),]
rawPropTall <- melt(rawProp, variable.name='item')
colnames(rawPropTall)[2] <- c('item')
ggplot() + geom_vline(xintercept=0, color="green") +
geom_jitter(data=rawPropTall, aes(value, item), height = 0.35, alpha=.05) +
geom_segment(data=prop, aes(y=item, yend=item, x=q10, xend=q90),
color="yellow", alpha=.5) +
geom_point(data=prop, aes(x=q50, y=item), color="red", size=1) +
theme(axis.title.y=element_blank())
We can also look at item response curves,
thresholdVec <- summary(fit3, pars='threshold', probs=c())$summary[,'mean']
thresholds <- matrix(thresholdVec, nrow=2, dimnames=list(NULL, itemNames))
softmax <- function(y) exp(y) / sum(exp(y))
draw1 <- function(scale, th, item) {
tdiff <- seq(-2.5/scale, 2.5/scale, .05/scale)
gr <- expand.grid(tdiff=tdiff, category=c("much more","somewhat more", 'equal',
"somewhat less", "much less"), p=NA, item=item)
gg <- matrix(c(0, rev(cumsum(th)), -cumsum(th)), ncol=5, nrow=length(tdiff), byrow=TRUE)
gg[,2:5] <- (gg[,2:5] + scale * tdiff)
gg <- t(apply(gg, 1, cumsum))
gg <- t(apply(gg, 1, softmax))
for (lev in 1:length(levels(gr$category))) {
gr[gr$category == levels(gr$category)[lev],'p'] <- gg[,lev]
}
geom_line(data=gr, aes(x=tdiff,y=p,color=category,linetype=category), size=.1, alpha=.2)
}
rawThreshold <- extract(fit3, pars=c("threshold"))[[1]]
rawThreshold <- rawThreshold[sample.int(nrow(rawThreshold), 50),]
pl <- ggplot() + xlab("difference in latent score (logits)") + ylab("probability") +
ylim(0,1) + facet_wrap(~item)
for (cx in 1:nrow(rawThreshold)) {
for (ix in 1:length(itemNames)) {
pl <- pl + draw1(dl$scale, rawThreshold[cx,c(ix*2-1,ix*2)], itemNames[ix])
}
}
print(pl)
print(thresholds)
skill | predict | novelty | creative | complex | chatter | body | present | stakes | reward |
---|---|---|---|---|---|---|---|---|---|
0.7138205 | 0.7363434 | 0.7107519 | 0.7621464 | 0.3375177 | 0.8247705 | 0.9906680 | 1.1986245 | 1.0022205 | 1.1953601 |
-0.3423715 | -0.5000462 | -0.2886812 | -0.2643945 | 0.2272349 | -0.5026549 | -0.5181785 | -0.6487027 | -0.4432016 | -0.6175744 |
Finally, we can plot the factor scores. Activities with small sample
sizes are retained by filterGraph
if they connect other activities
because they contribute information to the model. However, when we
look at the per-activity factor scores, we can limit ourselves to
activities with a sample size of at least 11.
orig <- filterGraph(pafp)$pa1
pa11 <- filterGraph(pafp, minDifferent=11L)$pa1
fs <- summary(fit3, pars='factor', probs=c(0.1, 0.5, 0.9))$summary[,c('10%','50%','90%')]
colnames(fs) <- paste0('q',c(10,50,90))
fs <- fs[match(levels(pa11), levels(orig)),]
fs <- as.data.frame(fs)
fs$activity <- levels(pa11)
fs$activity <- factor(fs$activity, levels=levels(pa11)[order(fs$q50)])
rawFs <- extract(fit3, pars=c("factor"))[[1]]
rawFs <- rawFs[,match(levels(pa11), levels(orig))]
colnames(rawFs) <- levels(pa11)
rawFs <- rawFs[sample.int(nrow(rawFs), 500),]
rawFsTall <- melt(rawFs, variable.name='activity')
colnames(rawFsTall)[2] <- c('activity')
rawFsTall$activity <- factor(as.character(rawFsTall$activity),
levels=levels(pa11)[order(fs$q50)])
ggplot() + geom_vline(xintercept=0, color="green") +
geom_jitter(data=rawFsTall, aes(value, activity), height = 0.35, alpha=.05) +
geom_segment(data=fs, aes(y=activity, yend=activity, x=q10, xend=q90),
color="yellow", alpha=.5) +
geom_point(data=fs, aes(x=q50, y=activity), color="red", size=1) +
theme(axis.title.y=element_blank())
Given that my background is more in software than math, I am not a fan of the greek letters used with such enthusiasm by mathematicians. When I name variables, I favor the expressive over the succinct.
If you read through the Stan models, you will find some
variables prefixed with raw
. These are special variables
internal to the model. In particular, you should not try
to evaluate the \(\widehat R\) or effective sample size
of raw
parameters. These parameters are best excluded
from the sampling output.
parameter | prior | purpose |
---|---|---|
threshold | normal(0,2) | item response thresholds |
theta | normal(0,1) | latent score |
The 'unidim_adapt' model has a varCorrection
constant
that is used to calibrate the scale
. For multivariate models,
scale
should be set to the median of the item-wise scales.
parameter | prior | purpose |
---|---|---|
threshold | normal(0,2) | item response thresholds |
thetaCor | lkj(2) | correlations between items |
sigma | lognormal(1,1) | relative item standard deviations |
theta | see below | latent score |
Thresholds for all
items are combined into a single vector.
The prior for theta
is multivariate normal with correlations
thetaCor
and standard deviations sigma
.
Exclude rawTheta
and rawThetaCorChol
from sampling.
parameter | prior | purpose |
---|---|---|
threshold | normal(0,2) | item response thresholds |
unique | normal(1,1) T[0,] | standard deviation of unique scores |
uniqueTheta | normal(0,1) | unique scores |
factorLoadings | normal(0,1) | signed standard deviation of factor scores |
factor | normal(0,1) | factor scores |
factorProp | N/A | signed factor variance proportion |
Thresholds for all items are combined into a single vector.
factorProp
is computed using Equation 3 of Gelman et al. (in press)
and has no prior of its own.
factorLoadings
is in standard deviation units but can be negative.
Similarly, factorProp
is a signed proportion bounded between -1 and 1.
Exclude rawUnique
, rawUniqueTheta
, rawFactor
, and rawLoadings
from sampling.
Gelman, A., Goodrich, B., Gabry, J., & Vehtari, A. (in press). R-squared for Bayesian regression models. The American Statistician. DOI: 10.1080/00031305.2018.1549100
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., … & Lillicrap, T. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140-1144.
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2019). Rank-normalization, folding, and localization: An improved \(\widehat R\) for assessing convergence of MCMC. arXiv preprint arXiv:1903.08008.
sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 19.04
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] qgraph_1.6.2 pcFactorStan_0.11 Rcpp_1.0.1
#> [4] rstan_2.18.2 StanHeaders_2.18.1 reshape2_1.4.3
#> [7] ggplot2_3.1.1 knitr_1.20
#>
#> loaded via a namespace (and not attached):
#> [1] ssgraph_1.0 splines_3.5.2 gtools_3.5.0
#> [4] Formula_1.2-2 assertthat_0.2.1 BDgraph_2.47
#> [7] highr_0.6 stats4_3.5.2 latticeExtra_0.6-28
#> [10] d3Network_0.5.2.1 pbivnorm_0.6.0 pillar_1.4.0
#> [13] backports_1.1.2 lattice_0.20-35 glue_1.3.1
#> [16] digest_0.6.18 RColorBrewer_1.1-2 checkmate_1.8.5
#> [19] ggm_2.3 colorspace_1.3-2 htmltools_0.3.6
#> [22] Matrix_1.2-14 plyr_1.8.4 psych_1.8.3.3
#> [25] pkgconfig_2.0.2 purrr_0.2.4 corpcor_1.6.9
#> [28] mvtnorm_1.0-7 scales_0.5.0 whisker_0.3-2
#> [31] glasso_1.8 jpeg_0.1-8 fdrtool_1.2.15
#> [34] huge_1.2.7 htmlTable_1.11.2 tibble_2.1.1
#> [37] withr_2.1.2 pbapply_1.3-4 nnet_7.3-12
#> [40] lazyeval_0.2.1 cli_1.1.0 mnormt_1.5-5
#> [43] survival_2.42-3 magrittr_1.5 crayon_1.3.4
#> [46] evaluate_0.10.1 nlme_3.1-137 MASS_7.3-50
#> [49] foreign_0.8-70 pkgbuild_1.0.3 tools_3.5.2
#> [52] loo_2.0.0 data.table_1.10.4-3 prettyunits_1.0.2
#> [55] matrixStats_0.53.1 stringr_1.3.1 munsell_0.4.3
#> [58] cluster_2.0.7-1 callr_2.0.3 compiler_3.5.2
#> [61] rlang_0.3.4 debugme_1.1.0 grid_3.5.2
#> [64] rstudioapi_0.7 rjson_0.2.15 htmlwidgets_1.2
#> [67] igraph_1.2.4.1 lavaan_0.6-2.1250 base64enc_0.1-3
#> [70] labeling_0.3 gtable_0.2.0 codetools_0.2-15
#> [73] abind_1.4-5 inline_0.3.14 R6_2.4.0
#> [76] gridExtra_2.3 dplyr_0.8.1 Hmisc_4.1-1
#> [79] stringi_1.4.3 parallel_3.5.2 rpart_4.1-13
#> [82] acepack_1.4.1 png_0.1-7 tidyselect_0.2.5