Abstract: We consider a nonparametric mean regression for clustered samples, where observations are independent across clusters, but may exhibit within-cluster dependence and be accompanied with conditional heteroskedasticity. The clusters may have different size, and their average size may be small, moderately large, or seriously large. We focus on the Nadaraya-Watson (NW) mean regression estimator, and derive its asymptotic distribution under corresponding scenarios of a fixed or slowly growing, and rapidly growing average cluster size, which interplays with the rate of bandwidth shrinkage as the total sample size increases, and yields different rates of convergence. We also discuss optimal bandwidth selection; the rule
turns out to be uniform across the cases. The form of asymptotic variance of the NW estimator depends on the growth rate of the average cluster size via the dominance or balance between within-cluster error variances and error covariances. We propose a number of asymptotic variance estimates, suitable in different situations, and prove their consistency. One of these
estimators is robust to the growth rate of the average cluster size, and leads to robust confidence intervals. Finally, we provide an illustration of the developed inferential tools in a nonparametric regression of log wage on age in a one-year slice of the CPS dataset.
Presentation slides