Background: In the first stage of a two-stage study, the researcher uses a statistical model to impute the unobserved exposures. In the second stage, imputed exposures serve as covariates in epidemiological models. Imputation error in the first stage operate as measurement errors in the second stage, and thus bias exposure effect estimates. Objective: This study aims to improve the estimation of exposure effects by sharing information between the first and second stages. Methods: At the heart of our estimator is the observation that not all second-stage observations are equally important to impute. We thus borrow ideas from the optimal-experimental-design theory, to identify individuals of higher importance. We then improve the imputation of these individuals using ideas from the machine-learning literature of domain adaptation. Results: Our simulations confirm that the exposure effect estimates are more accurate than the current best practice. An empirical demonstration yields smaller estimates of PM effect on hyperglycemia risk, with tighter confidence bands. Significance: Sharing information between environmental scientist and epidemiologist improves health effect estimates. Our estimator is a principled approach for harnessing this information exchange, and may be applied to any two stage study.
|Journal||Journal of Exposure Science and Environmental Epidemiology|
|State||Accepted/In press - 2022|
- Environmental epidemiology
- Two-stage studies