Clustered standard errors are popular and very easy to compute in some popular packages such as Stata, but how to compute them in R?

With panel data it's generally wise to cluster on the dimension of the individual effect as both heteroskedasticity and autocorrellation are almost certain to exist in the residuals at the individual level. In fact, Stock and Watson (2008) have shown that the White robust errors are inconsistent in the case of the panel fixed-effects regression model. Interestingly, the problem is due to the incidental parameters and does not occur if T=2. Stata has since changed its default setting to always compute clustered error in panel FE with the robust option.

There have been several posts about computing cluster-robust standard errors in R equivalently to how Stata does it, for example (here, here and here). However, the bloggers make the issue a bit more complicated than it really is.

We can very easily get the clustered VCE with the `plm`

package and only need to make the same degrees of freedom adjustment that Stata does. In Stata, the t-tests and F-tests use G-1 degrees of freedom (where G is the number of groups/clusters in the data). The `plm`

package does not make this adjustment automatically.

I'll set up an example using data from Petersen (2006) so that you can compare to the tables on his website:

# load packages require(plm) require(lmtest) # for waldtest()

# get data and load as pdata.frame url <- "http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt" p.df <- read.table(url) names(p.df) <- c("firmid", "year", "x", "y") p.df <- pdata.frame(p.df, index = c("firmid", "year"), drop.index = F, row.names = T) head(p.df)

## firmid year x y ## 1-1 1 1 -1.113973 2.2515 ## 1-2 1 2 -0.080854 1.2423 ## 1-3 1 3 -0.237607 -1.4264 ## 1-4 1 4 -0.152486 -1.1094 ## 1-5 1 5 -0.001426 0.9147 ## 1-6 1 6 -1.212737 -1.4247

options(digits = 8) # for more exact comparison with Stata's output

For completeness, I'll reproduce all tables apart from the last one.

# fit pooled OLS m1 <- lm(y ~ x, data = p.df) # fit same model with plm (needed for clustering) pm1 <- plm(y ~ x, data = p.df, model = "pooling")

Petersen's Table 1: OLS coefficients and regular standard errors

coeftest(m1)

## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.0296797 0.0283593 1.04656 0.29535 ## x 1.0348334 0.0285833 36.20414 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

waldtest(m1)

## Wald test ## ## Model 1: y ~ x ## Model 2: y ~ 1 ## Res.Df Df F Pr(>F) ## 1 4998 ## 2 4999 -1 1310.74 < 2.22e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Petersen's Table 2: OLS coefficients and white standard errors

coeftest(m1, vcov = vcovHC(m1, type = "HC1"))

## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.0296797 0.0283607 1.04651 0.29538 ## x 1.0348334 0.0283952 36.44401 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

waldtest(m1, vcov = vcovHC(m1, type = "HC1"))

## Wald test ## ## Model 1: y ~ x ## Model 2: y ~ 1 ## Res.Df Df F Pr(>F) ## 1 4998 ## 2 4999 -1 1328.17 < 2.22e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that Stata uses HC1 not HC3 corrected SEs.

Petersen's Table 3: OLS coefficients and standard errors clustered by firmid

# compute Stata like df-adjustment G <- length(unique(p.df$firmid)) N <- length(p.df$firmid) dfa <- (G/(G - 1)) * (N - 1)/pm1$df.residual # display with cluster VCE and df-adjustment firm_c_vcov <- dfa * vcovHC(pm1, type = "HC0", cluster = "group", adjust = T) coeftest(pm1, vcov = firm_c_vcov)

## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.0296797 0.0670127 0.4429 0.65786 ## x 1.0348334 0.0505957 20.4530 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

waldtest(pm1, vcov = firm_c_vcov, test = "F")

## Wald test ## ## Model 1: y ~ x ## Model 2: y ~ 1 ## Res.Df Df F Pr(>F) ## 1 4998 ## 2 4999 -1 418.324 < 2.22e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The additional `adjust=T`

just makes sure we also retain the usual N/(N-k) small sample adjustment.

Petersen's Table 4: OLS coefficients and standard errors clustered by year

# compute Stata like df-adjustment G <- length(unique(p.df$year)) N <- length(p.df$year) dfa <- (G/(G - 1)) * (N - 1)/pm1$df.residual # display with cluster VCE and df-adjustment time_c_vcov <- dfa * vcovHC(pm1, type = "HC0", cluster = "time", adjust = T) coeftest(pm1, vcov = time_c_vcov)

## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.0296797 0.0233867 1.26908 0.20447 ## x 1.0348334 0.0333889 30.99332 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

waldtest(pm1, vcov = time_c_vcov, test = "F")

## Wald test ## ## Model 1: y ~ x ## Model 2: y ~ 1 ## Res.Df Df F Pr(>F) ## 1 4998 ## 2 4999 -1 960.586 < 2.22e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here's the corresponding Stata code (the results are exactly the same):

webuse set http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se webuse test_data, clear xtset firmid year reg y x reg y x, robust reg y x, cluster(firmid) reg y x, cluster(year)

The advantage is that only standard packages are required provided we calculate the correct DF manually . One could easily wrap the DF computation into a convenience function. We probably should also check for missing values on the cluster variable. Extending this example to two-dimensional clustering is easy and will be the next post.

*References*:

Stock, J. H. and Watson, M. W. (2008), Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression. Econometrica, 76: 155–174.

Very useful blog. Was a great help for my analysis. However, I am pretty new on R and also on empirical analysis. Hence, I would have two questions:

(i) after having received the output for clustered SE by entity, one has simply to replace the significance values which firstly are received by “summary(pm1)”, right?

(ii) what exactly does the waldtest() check?

Hope you can clarify my doubts. Thanks in advance.

Phil

Phil, I’m glad this post is useful. Regarding your questions:

1) Yes, if you adjust the variance-covariance matrix for clustering then the standard errors and test statistics (t-stat and p-values) reported by summary will not be correct (but the point estimates are the same).

2) You may notice that summary() typically produces an F-test at the bottom. That’s the model F-test, testing that all coefficients on the variables (not the constant) are zero. The waldtest() function produces the same test when you have clustering or other adjustments.

HTH,

Rich

Hey Rich, thanks a lot for your reply! I would have another question: In this paper http://cameron.econ.ucdavis.edu/research/Cameron_Miller_Cluster_Robust_October152013.pdf on page 4 the author states that “Failure to control for within-cluster error correlation can lead to very misleadingly small

standard errors, and consequent misleadingly narrow confidence intervals, large t-statistics and low p-values”. However, as far as I can see the initial standard error for x displayed by coeftest(m1) is, though slightly, larger than the cluster-robust standard error. According to the cited paper it should though be the other way round – the cluster-robust standard error should be larger than the default one.

How does that come? Do you have an explanation? I am asking since also my results display ambigeous movements of the cluster-robust standard errors. As far as I know, cluster-robust standard errors are als heteroskedastic-robust. Do this two issues outweigh one another?

Hope you have an answer.

All the best,

Phil

Not sure if this is the case in the data used in this example, but you can get smaller SEs by clustering if there is a negative correlation between the observations within a cluster.

-Brad

One other possible issue in your manual-correction method: if you have any listwise deletion in your dataset due to missing data, your calculated sample size and degrees of freedom will be too high. I don’t know if that’s an issue here, but it’s a common one in most applications in R.

Hello Rich, thank you for your explanations. I am a totally new R user and I would be grateful if you could advice how to run a panel data regression (fixed effects) when standard errors are already clustered? I mean, how could I use clustered standard errors in my further analysis?

Best regards,

Inna

I would like to correct myself and ask more precisely. Is there any difference in wald test syntax when it’s applied to “within” model compared to “pooling”? In my analysis wald test shows results if I choose “pooling” but if I choose “within” then I get an error (Error in uniqval[as.character(effect), , drop = F] :

incorrect number of dimensions). Do I need extra packages for wald in “within” model?

Hi,

In the above you calculate the df adjustment as

dfa <- (G/(G – 1)) * (N – 1)/pm1$df.residual

but then retain adjust=T as "the usual N/(N-k) small sample adjustment."

But I thought (N – 1)/pm1$df.residual was that small sample adjustment already…

In State Users manual p. 333 they note:

For linear regression, the finite-sample adjustment is N/(N-k) without vce(cluster clustvar)—where k is the number of regressors—and {M/(M-1)}(N-1)/(N-k) with

vce(cluster clustvar).

Aren't you adjusting for sample size twice?

Thanks, Bill

Actually adjust=T or adjust=F makes no difference here… adjust is only an option in vcovHAC?

Bill

Hi Richard,

Thanks for this insightful post. You mention that plm() (as opposed to lm()) is required for clustering. However, a properly specified lm() model will lead to the same result both for coefficients and clustered standard errors.

Cheers,

Vincent