Chomsky–Schützenberger representation theorem

From testwiki
Jump to navigation Jump to search

In formal language theory, the Chomsky–Schützenberger representation theorem is a theorem derived by Noam Chomsky and Marcel-Paul Schützenberger in 1959[1] about representing a given context-free language in terms of two simpler languages. These two simpler languages, namely a regular language and a Dyck language, are combined by means of an intersection and a homomorphism.

The theorem Proofs of this theorem are found in several textbooks, e.g. Template:Harvtxt or Template:Harvtxt.

Mathematics

Notation

A few notions from formal language theory are in order.

A context-free language is regular, if it can be described by a regular expression, or, equivalently, if it is accepted by a finite automaton.

A homomorphism is based on a function h which maps symbols from an alphabet Γ to words over another alphabet Σ; If the domain of this function is extended to words over Γ in the natural way, by letting h(xy)=h(x)h(y) for all words x and y, this yields a homomorphism h:Γ*Σ*.

A matched alphabet TT is an alphabet with two equal-sized sets; it is convenient to think of it as a set of parentheses types, where T contains the opening parenthesis symbols, whereas the symbols in T contains the closing parenthesis symbols. For a matched alphabet TT, the typed Dyck language DT is given by

DT={w(TT)*w is a correctly nested sequence of parentheses}.

For example, the following is a valid sentence in the 3-typed Dyck language:

( [ [ ] { } ] ( ) { ( ) } )

Theorem

A language L over the alphabet Σ is context-free if and only if there exists

  • a matched alphabet TT
  • a regular language R over TT,
  • and a homomorphism h:(TT)*Σ*
such that L=h(DTR).

We can interpret this as saying that any CFG language can be generated by first generating a typed Dyck language, filtering it by a regular grammar, and finally converting each bracket into a word in the CFG language.

References

Template:Reflist

Template:Noam Chomsky