This vignette is for researchers whose main workflow is in Stata. It covers:
.dta files
with nabs_read_dta(), and the labelled-variable /
extended-missing pitfalls it handles for you.did_multiplegt_dyn to
nabs_event_study().group,
effects, placebo, and df are
accepted directly..dta with nabs_write_dta() so you (or a
coauthor) can finish in Stata.Of the heterogeneity-robust estimators that nonabsdid
harmonizes, only one has an official Stata implementation:
| Estimator | Stata | R |
|---|---|---|
| DCDH (de Chaisemartin & D’Haultfoeuille) | did_multiplegt_dyn (SSC) |
DIDmultiplegtDYN |
| PanelMatch (Imai, Kim, & Wang) | — | PanelMatch |
| fect: IFE / FE-imputation / MC (Liu, Wang, & Xu) | — | fect |
If your treatment is non-absorbing (it can switch on
and off) and you want to compare DCDH against matching-based and
imputation/factor-model-based estimators on the same axis, R is
currently the only place where all of them live. nonabsdid
exists to make that comparison a few lines of code; this vignette exists
to make those lines feel familiar if you arrive from Stata.
Because the same DCDH estimator is implemented in both languages by
the same authors, the DCDH series is also your bridge for
trust: run did_multiplegt_dyn on the same data in
Stata and through nonabsdid, check that the point estimates
agree, and then read the R-only estimators with the same confidence.
(Pin the version of DIDmultiplegtDYN you used; see
“Reproducibility” at the end.)
nabs_read_dta()The two classic stumbling blocks when moving a .dta file
into R are:
label values arrive in R as haven_labelled
vectors, which most estimation packages (including the ones wrapped
here) do not understand..a–.z arrive as tagged
NAs, which look like ordinary NA when printed
but are a distinct thing internally.nabs_read_dta() handles both with sensible defaults:
labelled columns become factors, and all extended missings collapse to
regular NA.
# For this vignette we fabricate a .dta file; in real life you already
# have one.
tmp <- tempfile(fileext = ".dta")
panel <- expand.grid(id = 1:60, t = 1:10)
panel$d <- with(panel, as.integer(
(id %% 4 == 1 & t %in% 4:7) |
(id %% 4 == 2 & t %in% 5:8) |
(id %% 4 == 3 & t %in% 6:9)
))
panel$y <- 0.2 * panel$t + 0.5 * panel$d + rnorm(nrow(panel))
haven::write_dta(panel, tmp)
mydata <- nabs_read_dta(tmp)
#> Read 'C:\Users\81809\AppData\Local\Temp\Rtmp4859ca\file2a1c4f063919.dta': 600
#> rows, 4 columns.
head(mydata)
#> # A tibble: 6 × 4
#> id t d y
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 -1.26
#> 2 2 1 0 1.15
#> 3 3 1 0 1.57
#> 4 4 1 0 -1.17
#> 5 5 1 0 -1.50
#> 6 6 1 0 0.206If a labelled variable is really numeric — a 0/1 treatment dummy that
happens to carry “treated”/“untreated” labels is the common case — use
labelled = "numeric" to keep the underlying codes:
You can also skip the explicit read entirely:
nabs_event_study() and
nabs_event_study_simple() accept a path to a
.dta file as their data argument.
did_multiplegt_dyn →
nabs_event_study()A typical Stata call:
did_multiplegt_dyn y, group(id) time(t) treatment(d) ///
effects(8) placebo(6) cluster(state) controls(x1 x2)The equivalent through nonabsdid:
res <- nabs_event_study(
mydata,
outcome = "y",
treatment = "d",
unit = "id", # Stata: group()
time = "t",
method = "DCDH",
leads = 7, # Stata: effects(8) -> leads = 8 - 1
lags = 6, # Stata: placebo(6)
cluster = "state",
controls = c("x1", "x2")
)Option by option:
Stata (did_multiplegt_dyn) |
nabs_event_study() |
Note |
|---|---|---|
varlist first variable (Y) |
outcome = "y" |
|
group(id) |
unit = "id" |
|
time(t) |
time = "t" |
|
treatment(d) |
treatment = "d" |
|
effects(k) |
leads = k - 1 |
see below |
placebo(k) |
lags = k |
same count of placebos |
cluster(v) |
cluster = "v" |
defaults to unit |
controls(x1 x2) |
controls = c("x1", "x2") |
|
| any other option | pass through ... |
forwarded to
DIDmultiplegtDYN::did_multiplegt_dyn() |
Why leads = effects - 1? Pure axis
convention, not a difference in the estimator.
did_multiplegt_dyn counts effects(k)
post-treatment estimates labelled 1 through k;
nonabsdid places treatment onset at relative time 0, so a
window of leads produces estimates at 0, 1, …,
leads — that is, leads + 1 post-period
estimates. effects(8) in Stata and leads = 7
here produce the identical underlying call and the same number
of estimated effects; only the x-axis labels shift by one. The
pre-period side has no shift: placebo(6) and
lags = 6 both give six placebo estimates.
For options the unified wrapper doesn’t name explicitly
(e.g. normalized, switchers,
trends_nonparam), pass them through ... using
the R package’s argument names — they generally match the Stata option
names — or call DIDmultiplegtDYN::did_multiplegt_dyn()
directly and tidy the result with
as_nabs_event_study().
csdid (Callaway–Sant’Anna), did_imputation
(Borusyak–Jaravel–Spiess), and eventstudyinteract
(Sun–Abraham) are built for absorbing treatment
(staggered adoption with no reversals). If your treatment switches off,
those designs don’t apply directly — that is exactly the gap
nonabsdid’s estimator set targets. There is no option-level
translation to give, because the estimators are different; conceptually,
your csdid-style event-study plot maps onto
nabs_event_study_simple()’s overlay figure.
If you paste arguments from a Stata script, the wrappers understand the Stata names directly and tell you how they were translated:
# These two calls are identical:
nabs_event_study(mydata, outcome = "y", treatment = "d", time = "t",
method = "DCDH",
group = "id", effects = 8, placebo = 6)
#> Translated Stata-style arguments:
#> * `group` -> `unit`
#> * `placebo` = 6 -> `lags` = 6
#> * `effects` = 8 -> `leads` = 7
#> i nonabsdid puts treatment onset at relative time 0, so `effects`
#> post-period estimates correspond to `leads = effects - 1`. ...
nabs_event_study(mydata, outcome = "y", treatment = "d", time = "t",
method = "DCDH",
unit = "id", leads = 7, lags = 6)df is likewise accepted for data. Supplying
both a canonical name and its alias (e.g. unit and
group) is an error rather than a silent choice.
nabs_write_dta()Every estimator’s output lands in one tidy schema (time,
estimate, std.error, conf.low,
conf.high, window, method,
outcome), so exporting all of it for a Stata-using coauthor
is one line:
res <- nabs_event_study_simple(mydata, outcome = "y", treatment = "d",
unit = "id", time = "t")
nabs_write_dta(res$tidy, "event_study_results.dta")Dots are not legal in Stata variable names, so
std.error, conf.low, and
conf.high are renamed to std_error,
conf_low, and conf_high on the way out (you’ll
see a message listing the renames).
Back in Stata, rebuilding the figure for one method is the usual
twoway:
use event_study_results.dta, clear
keep if method == "DCDH"
twoway (rcap conf_low conf_high time) ///
(scatter estimate time), ///
yline(0, lpattern(dash)) xline(-0.5, lpattern(dot)) ///
xtitle("Periods since treatment") ytitle("Effect on outcome") ///
legend(off)Or compare methods side by side:
use event_study_results.dta, clear
encode method, gen(m)
twoway (scatter estimate time if m == 1) ///
(scatter estimate time if m == 2) ///
(scatter estimate time if m == 3), ///
yline(0) legend(order(1 "DCDH" 2 "IFE" 3 "PanelMatch"))nabs_write_dta() also accepts the result objects
themselves (nabs_event_study_result /
nabs_event_study_simple) and routes them through
as_nabs_event_study() for you.
did_multiplegt_dyn on the same data in both Stata and R
once, and confirm the estimates match before relying on the R-only
estimators.packageVersion("DIDmultiplegtDYN") (and the SSC version on
the Stata side); the authors occasionally change defaults between
releases.