# High-dimensional fixed effects in GLMs

16 March 2019   econometrics

In many empirical applications unobservable characteristics may influence estimates. A popular way of dealing with this is to include fixed effects. However, as this usually amounts to including large numbers of dummies in the regression, the computation quickly becomes infeasible. The problem is particularly hard when the assumed error term distribution is not normal, i.e. in GLMs. Together with Joschka Wanner and Alexander Hudlet I have been working on a new way to address this problem. Enter the new R package glmhdfe that allows for the estimation of generalized linear models with high dimensional fixed effects. The package makes use of a convenient property of some combinations of error term distributions and link functions, where the fixed effects have — conditional on all other estimated parameters — an explicit solution.

Consider the following equation that we want to estimate:

\begin{aligned} y_{i} &= g^{-1} \left(\mathbf{x}_{i}' \mathbf{\beta} + \left(\mathbf{d}_{i}^{A}\right)^\prime \mathbf{\delta}^{A} + \left(\mathbf{d}_{i}^{B}\right)^\prime \mathbf{\delta}^{B} + \ldots \right) + \epsilon_{i}, \end{aligned}

or in matrix form

\begin{aligned} \mathbf{y} &= g^{-1} \left(\mathbf{X \beta} + \mathbf{D}^{A} \mathbf{\delta}^{A} + \mathbf{D}^{B} \mathbf{\delta}^{B} + \ldots \right) + \mathbf{\epsilon} \\ &= g^{-1} ( \mathbf{ \eta } ) + \mathbf{\epsilon}, \end{aligned}

such that

\begin{aligned} E(\mathbf{y}) &= \mathbf{\mu} = g^{-1} (\mathbf{\eta} ) \quad \text{and} \quad \\ E(y_{i}) &= \mu_{i} = g^{-1} ( \eta_{i} ), \end{aligned}

where subscript $i$ denotes the observation, $g(\cdot)$ is the link function, $y_i$ denotes the dependent variable, $\mathbf{x}_i$ denotes the vector of explanatory variables with corresponding parameter vector $\mathbf{\beta}$, and $\mathbf{d}_i$ denoting dummies where superscripts $A, B, \ldots$ index the arbitrary number of fixed effects. $\mathbf{\delta}$ are the conforming parameters.

\begin{aligned} \hat{\mathbf{\beta}}: \quad & \sum_{i} \frac{y_i - \mu_i }{V(y_i)} \frac{\partial \mu_{i}}{\partial \hat{\mathbf{\beta}}} = \mathbf{0}, \\ \hat{\delta}^{a}: \quad & \sum_{i} \frac{y_{i} - \mu_{i}}{V(y_i)} \frac{\partial \mu_{i}}{\partial \hat{\delta}^{a}} = 0,\\ & \ldots \end{aligned}

where $\hat{\delta}^{a}$ is the $a$th element of $\hat{\mathbf{\delta}}^{A}$.

Importantly, note that the inner derivative of $\mu_{i}$ with respect to $\hat{\delta}^{a}$ is always equal to $d_{i}^{a}$ , which is the $a$th element of $\mathbf{d}_{i}^{A}$ . Therefore, in the corresponding first-order conditions, it will always suffice to consider the elements of the sum for which $d_{i}^{a}=1$, i.e. it can equivalently be written as:

\begin{aligned} \hat{\delta}^{a}: \quad & \sum_{i | d_{i}^{a} = 1} \frac{y_{i} - \mu_{i}}{V(y_{i})} = 0. \end{aligned}

For certain distribution and link combinations this yields explicit solutions for the estimated coefficient for the fixed effects $\hat{\delta}^{a}$, given estimates for beta and the other deltas. Specifically, this is the case for the Gaussian distribution with identity and log link, and for the Poisson, Gamma and Inverse Gaussian distributions with log link. This makes it possible to update the fixed effects separately from the estimation of the coefficients on variables of interest in every iteration of the IRLS procedure used to estimate beta, dramatically increasing the speed of the estimation procedure.

For more detail on the inner workings see the technical note. A Stata implementation is coming soon.



# find all destinations that country o exported to you in y
trade_data[year == x$y & iso_o == x$o & trade == T][["iso_d"]]
}

# find
get_eg_values <- function(y, o, d, p_o, gravity_data) {
return(gravity_data[year == y &
iso3_o %in% p_o[] &
iso3_d == d, .(distw_eg = min(distw, na.rm = T),
contig_eg = max(contig, na.rm = T),
comlang_off_eg = max(comlang_off, na.rm = T),
rta_eg = max(fta_hmr, na.rm = T),
comcur_eg = max(comcur, na.rm = T)
)])
}

###
# generate extended gravity variables for aggregate data
###