The emergence of Machine Learning has led to a significant influx of computer science professionals into the field of econometrics. Although the blend of different disciplines is often advantageous, there is usually a necessary period of adjustment, during which misunderstandings are frequent. Evidence and examples supporting these observations can be found in the literature cited below, especially in the work of Iskhakov et al., 2020.

#### The issue...

While I was always conscious of these potential misunderstandings, the extent of their incompatibility became apparent to me while I was working on a research paper about Instrumental Variables (IV).

Now, many theoretical works on applied machine-learning to econometrics focused on a particular issue connected to IV, namely: measurament error. "Measurement error" in econometrics refers to inaccuracies or errors in the data used for analysis. For instance, if you're studying the relationship between income and education, but the income data is not precisely recorded (maybe because some people underreport their earnings), that's a measurement error.

IV come into play to correct for these errors. Think of IV as a tool that helps find the true relationship between variables (like income and education) by using additional information that is related to the cause (education) but not affected by the measurement errors in the outcome (income). This helps in getting a clearer, more accurate picture of the actual relationship, free from the distortions caused by measurement errors.

*A famous example of IV is provided in Angrist (1990). Specifically, he proposed using draft lottery for military service during Vietnam War as an instrument for veteran status.*

The use of IV in econometrics involves meeting several important assumptions. However, many studies on IV conducted by machine learning experts tend to concentrate primarily on one particular assumption known as the exclusion restriction.

As I previously pointed out, there are more assumptions that need to be satisfied for an instrument to be effective, and these are often crucial!

*Following the example of Angrist (1990), Card and Lemieux (2001) showed indeed, that the latter instrument was related to Y not via the treatment (veteran status) but via unobservables. As explained in Huber (2023), the draft lottery in fact induced also college enrollment since military service could be postponed for college students. Being enrolled into college, in turns affected the outcome variable, earnings.*

We will now explore the main assumptions of IV via the following standard notation. We will refer to Y as being the outcome of interest, X are covariates, Z the instrument (which we will assume to be binary for the sake of clarity) and D the treatment status depending on the instrument (namely D(z) is the potential treatment decision if the instrument Z has value z). As we will see, to identify the main estimator of IVs, the Local Average Treatment Effect (LATE), we will need to focus on a particular group of people which will be identified by their response to treatment. Namely, we will focus on compliers, i.e. those that get treated when Z=1 and who get not when Z=0 (i.e. D(1)=1, D(0)=0).

Notice: there are cases in which the expected value of the LATE corresponds to the Average Treatment Effect (ATE). We will briefly discuss this case in what follows.

#### The recap (assumptions)

In our journey through econometric methodologies, we’ve arrived at a pivotal station: the IV approach, particularly its application in identifying the LATE. To grasp the full potential of IV, we must first understand the concepts of compliers, defiers, always takers and never takers:

**Compliers**: These are the go-with-the-flow folks. If the nudge suggests taking the treatment, they'll take it; if there's no nudge, they won't. They're the key group for IV analysis because their behavior helps us understand the true effect of the treatment. The have D(1)=1, D(0)=0.**Defiers**: The rebels of the group. They do the opposite of what the nudge suggests. If it says "take the treatment," they won't, and vice versa. They're like the kids who refuse to eat their veggies just because they're told to. For them D(1)=0, D(0)=1**Always Takers**: No matter what the nudge says, these individuals will always take the treatment. They're like the person who always eats dessert, regardless of whether it's recommended. For them D(1)=1, D(0)=1.**Never Takers**: Contrary to the Always Takers, these individuals will never take the treatment, nudge or no nudge. Think of someone who never exercises, even when their smartwatch is constantly reminding them to. For them D(1)=0, D(0)=0.

In IV analysis, we're especially interested in the Compliers because they help us isolate the effect of the treatment from other factors. By focusing on them, we can get a clearer picture of whether the treatment itself is effective. This is why some assumptions (and especially monotonicity) are due.

IV, in particular, comes with three core assumptions:

**Relevance**: The chosen instrument must be correlated with the treatment variable—meaning it actually influences whether individuals receive the treatment.**Exclusion restriction**: The instrument affects the outcome exclusively through its impact on the treatment, not through other pathways.**Monotonicity**: The instrument should move individuals in one direction regarding treatment—either towards receiving it or not—but not both ways for different individuals. In simpler terms, this means that there are no defiers. To see this, let's assume that the instrument Z is binary. in this case, monotonicity imposes that the treatment status, D, should move to 1 if the instrument move from 0 to 1 for that individual (i.e. Pr(D(1)>D(0))=1). Hence it is not possible to find defiers, i.e. individuals such that if the instrument moves from 0 to 1, their treatment status moves from 0 to 1.

These assumptions are like the rules of the road for navigating the causal pathways from treatment to outcome. More often in practical applications, it is not credible that such assumptions hold unconditionally (i.e. without controlling for covariates). Let's again provide an example:

*We try to estimate the causal effect of education on earnings, using proximity to a college as an instrumental variable. If living near a college doesn't significantly influence educational attainment due to factors (covariates, X) like widespread online education, the relevance of the instrument is compromised. Moreover, the exogeneity assumption may be breached if areas close to colleges have inherent characteristics like better job markets or affluent communities that directly impact earnings, regardless of educational level. Additionally, the exclusion restriction is violated if proximity to a college offers benefits like enhanced networking opportunities that boost earnings independently of education. *

In all these cases, **controlling for X is crucial**. Controlling for X means basically that the assumptions hold "within the groups sharing the same value of X".

At the heart of this methodology is the treatment effect on a special group known as 'compliers'—individuals who follow the lead of the instrument. If the instrument indicates treatment, they take it; if not, they abstain. They are the target audience of our IV analysis, the ones who help us see the true effect of the treatment.

To estimate LATE effectively, we can sometimes relax the strictness of the instrument’s correlation with the treatment. The focus shifts to ensuring the exclusion restriction stands firm—our instrument must not stray off course and affect the outcome through any side roads.

In mathematical terms, the LATE for compliers is identified via the above assumptions as:

Δ D(1)=1, D(0)=0 =E[Y(1)-Y(0)|D(1)=1, D(0)=0]

where "D(1)=1, D(0)=0" indicates that we are focusing on compliers. In practical applications, conditioning on X makes the expression of LATE appear more complicated. Specifically LATE is seen as the ratio between the intention to treat (ITT) and the conditional first stage effects. However this expression may be better understood reading Huber (2023) where the author explains that, at the end of the day the first stage, in our scenario, represents the share of compliers since D(1)-D(0) is 1 for compliers and 0 for the remaining compliance types (remember that defiers are excluded by assumption). The effect on the outcome is therefore the LATE multiplied by the share of compliers conditional on X (which is to say that the only effect on the outcome of the treatment is given by compliers), i.e.:

E[Y|Z=1,X]-E[Y|Z=0, X] = Δ D(1)=1, D(0)=0 E[D|Z=1, X]-E[D|Z=0, X]

from which the LATE (Δ D(1)=1, D(0)=0) is recovered. this expression is often seen in many applications.

This equation represents the average treatment effect for this compliant subgroup, honing in on the causal impact we’re interested in measuring.

*As you see it's not only about the exclusion restriction!*

**TECHNICAL ASIDE:**

As I previously mentioned, there are circumstances under which the LATE corresponds to the ATE. This is the case when further assumptions are made as, for instance, that the average effects are homogeneous across various compliance groups conditionally on X. This basically means that the effects are on average the same **regardless** the compliance group we focus on. So even though the effects estimated via LATE are focused on compliers, since by assumption the same effects are present for defiers, always takers and never takers, we can average through them to obtain the ATE.

#### The recap (LATE)

In the realm of econometrics, the LATE shines a spotlight on a specific group known as 'compliers.' These are individuals who only take the treatment if prompted by an instrument—a nudge or condition that influences their decision. This focus makes LATE 'local' because it specifically measures the average effect of the treatment on this particular group, the compliers, rather than on everyone.

The fundamental equation (leaving out the conditioning on X for the moment) that drives our understanding of LATE as we have seen is:

Δ D(1)=1, D(0)=0 =E[Y(1)-Y(0)|D(1)=1, D(0)=0]

This equation is crucial because it captures the average effect of the treatment on our compliers. It's the cornerstone of LATE and hinges on the three core assumptions mentioned above (relevance, exclusion and monotonicity).

In this paragraph we wanna try to understand where each of the assumptions (and not only exclusion restriction) intervene in determining the expression of the LATE:

**Relevance**: This assumption is crucial for identification. It ensures that the instrument (e.g., a policy change or random assignment) actually affects the likelihood that an individual receives the treatment. In other words, there must be a significant correlation between the instrument and the treatment. Without relevance, the instrument wouldn't be able to create the variation in treatment necessary to identify its causal effect.**Exclusion**: For estimation, the exclusion restriction is vital. It ensures that the instrument affects the outcome only through the treatment, not via any other path. This allows us to attribute changes in the outcome directly to changes in the treatment caused by the instrument. If the instrument influenced the outcome through another channel, it would confound the estimation of the treatment's effect, making it difficult to isolate the impact of the treatment itself.**Monotonicity**: This assumption is important for both identification and estimation. It implies that individuals' treatment status moves in only one direction in response to the instrument. It ensures that there are no "defiers," or people who do the opposite of what the instrument indicates. Monotonicity makes the relationship between the instrument and the treatment clear-cut, which is necessary for identifying the complier subgroup and estimating the average effect of the treatment for this group accurately. If this assumption does not held, then we cannot be sure if some compensation effects among compliance groups are intervening and, therefore we cannot be sure of what LATE is estimating (is it the effect of compliers mitigated by defiers? Is it the effect of compliers boosted by always takers? How to disentangle the two?...)

By satisfying these assumptions, we can use the instrument to create a clear comparison between those who do and do not receive the treatment, due to the instrument, and thus estimate the LATE, which is the average effect of the treatment on the compliers.

**HOW TO ESTIMATE IV? **

This kinda goes beyond the scope of this post but for the sake of completeness I would like to briefly introduce the main estimation technique(s).

To apply the IV approach robustly, we often use the Two-Stage Least Squares (2SLS) method, which unfolds in two parts. Initially, we regress the outcome on the instrument to see how strongly they're connected. Then, we use this connection to adjust our estimates for the treatment's effect on the outcome.

The 2SLS can be distilled into the following model:

Y=β0+β1D+ϵ

*D*=*π*0+*π*1*Z*+*u*

⇒Y=β0+β1(π0+π1Z+u)+ϵ

The robustness of the relationship between the instrument Z and the treatment D in the first stage is vital. If this relationship is weak, we encounter the 'weak instrument problem,' where our estimates may become unreliable.

To counteract this, researchers have formulated strategies for the case of weak instruments, ensuring that our instruments have enough pull to provide trustworthy conclusions about the treatment's effect.

#### References:

Athey, S. and Imbens, G.W., 2019. Machine learning methods that economists should know about. *Annual Review of Economics*, *11*, pp.685-725.

Iskhakov, F., Rust, J. and Schjerning, B., 2020. Machine learning and structural econometrics: contrasts and synergies. *The Econometrics Journal*, *23*(3), pp.S81-S124.

Mullainathan, S. and Spiess, J., 2017. Machine learning: an applied econometric approach. *Journal of Economic Perspectives*, *31*(2), pp.87-106.

## Comments