Set identification

From testwiki
Jump to navigation Jump to search

In statistics and econometrics, set identification (or partial identification) extends the concept of identifiability (or "point identification") in statistical models to environments where the model and the distribution of observable variables are not sufficient to determine a unique value for the model parameters, but instead constrain the parameters to lie in a strict subset of the parameter space. Statistical models that are set (or partially) identified arise in a variety of settings in economics, including game theory and the Rubin causal model. Unlike approaches that deliver point-identification of the model parameters, methods from the literature on partial identification are used to obtain set estimates that are valid under weaker modelling assumptions.Template:Sfn

History

Early works containing the main ideas of set identification included Template:Harvtxt and Template:Harvtxt. However, the methods were significantly developed and promoted by Charles Manski, beginning with Template:Harvtxt and Template:Harvtxt.

Partial identification continues to be a major theme in research in econometrics. Template:Harvtxt named partial identification as an example of theoretical progress in the econometrics literature, and Template:Harvtxt list partial identification as β€œone of the most prominent recent themes in econometrics.”

Definition

Let Uβˆˆπ’°βŠ†β„du denote a vector of latent variables, let Zβˆˆπ’΅βŠ†β„dz denote a vector of observed (possibly endogenous) explanatory variables, and let Yβˆˆπ’΄βŠ†β„dy denote a vector of observed endogenous outcome variables. A structure is a pair s=(h,𝒫U∣Z), where 𝒫U∣Z represents a collection of conditional distributions, and h is a structural function such that h(y,z,u)=0 for all realizations (y,z,u) of the random vectors (Y,Z,U). A model is a collection of admissible (i.e. possible) structures s.[1][2]

Let 𝒫Y∣Z(s) denote the collection of conditional distributions of Y∣Z consistent with the structure s. The admissible structures s and s are said to be observationally equivalent if 𝒫Y∣Z(s)=𝒫Y∣Z(s).[1][2] Let s⋆ denotes the true (i.e. data-generating) structure. The model is said to be point-identified if for every sβ‰ s we have 𝒫Y∣Z(s)≠𝒫Y∣Z(s⋆). More generally, the model is said to be set (or partially) identified if there exists at least one admissible sβ‰ s⋆ such that 𝒫Y∣Z(s)≠𝒫Y∣Z(s⋆). The identified set of structures is the collection of admissible structures that are observationally equivalent to s⋆.Template:Sfn

In most cases the definition can be substantially simplified. In particular, when U is independent of Z and has a known (up to some finite-dimensional parameter) distribution, and when h is known up to some finite-dimensional vector of parameters, each structure s can be characterized by a finite-dimensional parameter vector ΞΈβˆˆΞ˜βŠ‚β„dΞΈ. If ΞΈ0 denotes the true (i.e. data-generating) vector of parameters, then the identified set, often denoted as ΘIβŠ‚Ξ˜, is the set of parameter values that are observationally equivalent to ΞΈ0.Template:Sfn

Example: missing data

This example is due to Template:Harvtxt. Suppose there are two binary random variables, Template:Math and Template:Math. The econometrician is interested in P(Y=1). There is a missing data problem, however: Template:Math can only be observed if Z=1.

By the law of total probability,

P(Y=1)=P(Y=1∣Z=1)P(Z=1)+P(Y=1∣Z=0)P(Z=0).

The only unknown object is P(Y=1∣Z=0), which is constrained to lie between 0 and 1. Therefore, the identified set is

ΘI={p∈[0,1]:p=P(Y=1∣Z=1)P(Z=1)+qP(Z=0), for some q∈[0,1]}.

Given the missing data constraint, the econometrician can only say that P(Y=1)∈ΘI. This makes use of all available information.

Statistical inference

Set estimation cannot rely on the usual tools for statistical inference developed for point estimation. A literature in statistics and econometrics studies methods for statistical inference in the context of set-identified models, focusing on constructing confidence intervals or confidence regions with appropriate properties. For example, a method developed by Template:Harvtxt constructs confidence regions that cover the identified set with a given probability.

Notes

Template:Reflist

References

Further reading