Clustered SEs in R and Stata

Clustered standard errors are popular and very easy to compute in some popular packages such as Stata, but how to compute them in R?

With panel data it's generally wise to cluster on the dimension of the individual effect as both heteroskedasticity and autocorrellation are almost certain to exist in the residuals at the individual level. In fact, Stock and Watson (2008) have shown that the White robust errors are inconsistent in the case of the panel fixed-effects regression model. Interestingly, the problem is due to the incidental parameters and does not occur if T=2. Stata has since changed its default setting to always compute clustered error in panel FE with the robust option. 

There have been several posts about computing cluster-robust standard errors in R equivalently to how Stata does it, for example (herehere and here). However, the bloggers make the issue a bit more complicated than it really is. 

We can very easily get the clustered VCE with the plm package and only need to make the same degrees of freedom adjustment that Stata does. In Stata, the t-tests and F-tests use G-1 degrees of freedom (where G is the number of groups/clusters in the data). The plm package does not make this adjustment automatically.

I'll set up an example using data from Petersen (2006) so that you can compare to the tables on his website:

 # load packages
require(plm)
require(lmtest)  # for waldtest()
 # get data and load as pdata.frame
url <- "http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt"
p.df <- read.table(url)
names(p.df) <- c("firmid", "year", "x", "y")
p.df <- pdata.frame(p.df, index = c("firmid", "year"), drop.index = F, row.names = T)
head(p.df)
 ##     firmid year         x       y
## 1-1      1    1 -1.113973  2.2515
## 1-2      1    2 -0.080854  1.2423
## 1-3      1    3 -0.237607 -1.4264
## 1-4      1    4 -0.152486 -1.1094
## 1-5      1    5 -0.001426  0.9147
## 1-6      1    6 -1.212737 -1.4247
 options(digits = 8)  # for more exact comparison with Stata's output

For completeness, I'll reproduce all tables apart from the last one.

 # fit pooled OLS
m1 <- lm(y ~ x, data = p.df)
# fit same model with plm (needed for clustering)
pm1 <- plm(y ~ x, data = p.df, model = "pooling")

Petersen's Table 1: OLS coefficients and regular standard errors

 coeftest(m1)
 ## 
## t test of coefficients:
## 
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 0.0296797  0.0283593  1.04656  0.29535    
## x           1.0348334  0.0285833 36.20414  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 waldtest(m1)
 ## Wald test
## 
## Model 1: y ~ x
## Model 2: y ~ 1
##   Res.Df Df       F     Pr(>F)    
## 1   4998                          
## 2   4999 -1 1310.74 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Petersen's Table 2: OLS coefficients and white standard errors

 coeftest(m1, vcov = vcovHC(m1, type = "HC1"))
 ## 
## t test of coefficients:
## 
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 0.0296797  0.0283607  1.04651  0.29538    
## x           1.0348334  0.0283952 36.44401  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 waldtest(m1, vcov = vcovHC(m1, type = "HC1"))
 ## Wald test
## 
## Model 1: y ~ x
## Model 2: y ~ 1
##   Res.Df Df       F     Pr(>F)    
## 1   4998                          
## 2   4999 -1 1328.17 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that Stata uses HC1 not HC3 corrected SEs.

Petersen's Table 3: OLS coefficients and standard errors clustered by firmid

 # compute Stata like df-adjustment
G <- length(unique(p.df$firmid))
N <- length(p.df$firmid)
dfa <- (G/(G - 1)) * (N - 1)/pm1$df.residual

# display with cluster VCE and df-adjustment
firm_c_vcov <- dfa * vcovHC(pm1, type = "HC0", cluster = "group", adjust = T)
coeftest(pm1, vcov = firm_c_vcov)
 ## 
## t test of coefficients:
## 
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.0296797  0.0670127  0.4429  0.65786    
## x           1.0348334  0.0505957 20.4530  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 waldtest(pm1, vcov = firm_c_vcov, test = "F")
 ## Wald test
## 
## Model 1: y ~ x
## Model 2: y ~ 1
##   Res.Df Df       F     Pr(>F)    
## 1   4998                          
## 2   4999 -1 418.324 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The additional adjust=T just makes sure we also retain the usual N/(N-k) small sample adjustment.

Petersen's Table 4: OLS coefficients and standard errors clustered by year

 # compute Stata like df-adjustment
G <- length(unique(p.df$year))
N <- length(p.df$year)
dfa <- (G/(G - 1)) * (N - 1)/pm1$df.residual

# display with cluster VCE and df-adjustment
time_c_vcov <- dfa * vcovHC(pm1, type = "HC0", cluster = "time", adjust = T)
coeftest(pm1, vcov = time_c_vcov)
 ## 
## t test of coefficients:
## 
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 0.0296797  0.0233867  1.26908  0.20447    
## x           1.0348334  0.0333889 30.99332  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 waldtest(pm1, vcov = time_c_vcov, test = "F")
 ## Wald test
## 
## Model 1: y ~ x
## Model 2: y ~ 1
##   Res.Df Df       F     Pr(>F)    
## 1   4998                          
## 2   4999 -1 960.586 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here's the corresponding Stata code (the results are exactly the same):

 webuse set http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se
webuse test_data, clear
xtset firmid year
reg y x
reg y x, robust
reg y x, cluster(firmid)
reg y x, cluster(year)

The advantage is that only standard packages are required provided we calculate the correct DF manually . One could easily wrap the DF computation into a convenience function. We probably should also check for missing values on the cluster variable. Extending this example to two-dimensional clustering is easy and will be the next post.

References:

Stock, J. H. and Watson, M. W. (2008), Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression. Econometrica, 76: 155–174.