E-graph

Template:Short description

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Definition and operations

Let $Σ$ be a set of uninterpreted functions, where $Σ_{n}$ is the subset of $Σ$ consisting of functions of arity $n$ . Let $𝕚 𝕕$ be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of $f \in Σ_{n}$ to e-class IDs $i_{1}, i_{2}, \dots, i_{n} \in 𝕚 𝕕$ is denoted $f (i_{1}, i_{2}, \dots, i_{n})$ and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:^[1]

A union-find structure $U$ representing equivalence classes of e-class IDs, with the usual operations $f i n d$ , $a d d$ and $m e r g e$ . An e-class ID $e$ is canonical if $f i n d (U, e) = e$ ; an e-node $f (i_{1}, \dots, i_{n})$ is canonical if each $i_{j}$ is canonical ( $j$ in $1, \dots, n$ ).
An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
- a hashcons $H$ (i.e. a mapping) from canonical e-nodes to e-class IDs, and
- an e-class map $M$ that maps e-class IDs to e-classes, such that $M$ maps equivalent IDs to the same set of e-nodes: $\forall i, j \in 𝕚 𝕕, M [i] = M [j] \Leftrightarrow f i n d (U, i) = f i n d (U, j)$

Invariants

In addition to the above structure, a valid e-graph conforms to several data structure invariants.^[2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes $f (i_{1}, \dots, i_{n}), f (j_{1}, \dots, j_{n})$ are congruent when $f i n d (U, i_{k}) = f i n d (U, j_{k}), k \in {1, \dots, n}$ . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

Template:Expand section

E-graphs expose wrappers around the $a d d$ , $f i n d$ , and $m e r g e$ operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

Equivalent formulations

An e-graph can also be formulated as a bipartite graph $G = (N ⊎ i d, E)$ where

$i d$ is the set of e-class IDs (as above),
$N$ is the set of e-nodes, and
$E \subseteq (i d \times N) \cup (N \times i d)$ is a set of directed edges.

There is a directed edge from each e-class to each of its members, and from each e-node to each of its children.^[3]

E-matching

Let $V$ be a set of variables and let $T e r m (Σ, V)$ be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, $T e r m (Σ, V)$ is the smallest set such that $V \subset T e r m (Σ, V)$ , $Σ_{0} \subset T e r m (Σ, V)$ , and when $x_{1}, \dots, x_{n} \in T e r m (Σ, V)$ and $f \in Σ_{n}$ , then $f (x_{1}, \dots, x_{n}) \in T e r m (Σ, V)$ . A term containing variables is called a pattern, a term without variables is called ground.

An e-graph $E$ represents a ground term $t \in T e r m (Σ, \emptyset)$ if one of its e-classes represents $t$ . An e-class $C$ represents $t$ if some e-node $f (i_{1}, \dots, i_{n}) \in C$ does. An e-node $f (i_{1}, \dots, i_{n}) \in C$ represents a term $g (j_{1}, \dots, j_{n})$ if $f = g$ and each e-class $M [i_{k}]$ represents the term $j_{k}$ ( $k$ in $1, \dots, n$ ).

e-matching is an operation that takes a pattern $p \in T e r m (Σ, V)$ and an e-graph $E$ , and yields all pairs $(σ, C)$ where $σ \subset V \times 𝕚 𝕕$ is a substitution mapping the variables in $p$ to e-class IDs and $C \in 𝕚 𝕕$ is an e-class ID such that the term $σ (p)$ is represented by $C$ . There are several known algorithms for e-matching,^[4]^[5] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.^[6]

Extraction

Given an e-class and a cost function that maps each function symbol in $Σ$ to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class. This problem is NP-hard.^[7] There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem. However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm.^[8]

Complexity

An e-graph with n equalities can be constructed in O(n log n) time.^[9]

Equality saturation

Template:Expand section

Equality saturation is a technique for building optimizing compilers using e-graphs.^[10] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3^[11] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.^[12] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.^[13] E-graphs are also used in the Simplify theorem prover of ESC/Java.^[14]

Equality saturation is used in specialized optimizing compilers,^[15] e.g. for deep learning^[16] and linear algebra.^[17] Equality saturation has also been used for translation validation applied to the LLVM toolchain.^[18]

E-graphs have been applied to several problems in program analysis, including fuzzing,^[19] abstract interpretation,^[20] and library learning.^[21]

References

Template:Reflist

External links

Template:Program analysis

[1] Template:Harv

[2] Template:Harv

[3] Template:Harv

[4] Template:Harv

[5] Template:Cite journal

[6] Template:Cite journal

[7] Template:Cite thesis

[8] Template:Harv

[9] Template:Harv

[10] Template:Harv

[11] Template:Cite book

[12] Template:Cite conference

[13] Template:Harv

[14] Template:Cite journal

[15] Template:Cite journal

[16] Template:Cite arXiv

[17] Template:Cite arXiv

[18] Template:Cite book

[19] Template:Cite web

[20] Template:Cite arXiv
Template:Cite arXiv

[21] Template:Cite journal

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

E-graph

Contents

Definition and operations

Invariants

Operations

Equivalent formulations

E-matching

Extraction

Complexity

Equality saturation

Applications

References

External links

Navigation menu

E-graph

Definition and operations

Invariants

Operations

Equivalent formulations

E-matching

Extraction

Complexity

Equality saturation

Applications

References

External links

Navigation menu

Search