Misplaced Pages

Chomsky–Schützenberger representation theorem

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In formal language theory, the Chomsky–Schützenberger representation theorem is a theorem derived by Noam Chomsky and Marcel-Paul Schützenberger in 1959 about representing a given context-free language in terms of two simpler languages. These two simpler languages, namely a regular language and a Dyck language, are combined by means of an intersection and a homomorphism.

The theorem Proofs of this theorem are found in several textbooks, e.g. Autebert, Berstel & Boasson (1997) or Davis, Sigal & Weyuker (1994).

Mathematics

Notation

A few notions from formal language theory are in order.

A context-free language is regular, if it can be described by a regular expression, or, equivalently, if it is accepted by a finite automaton.

A homomorphism is based on a function h {\displaystyle h} which maps symbols from an alphabet Γ {\displaystyle \Gamma } to words over another alphabet Σ {\displaystyle \Sigma } ; If the domain of this function is extended to words over Γ {\displaystyle \Gamma } in the natural way, by letting h ( x y ) = h ( x ) h ( y ) {\displaystyle h(xy)=h(x)h(y)} for all words x {\displaystyle x} and y {\displaystyle y} , this yields a homomorphism h : Γ Σ {\displaystyle h:\Gamma ^{*}\to \Sigma ^{*}} .

A matched alphabet T T ¯ {\displaystyle T\cup {\overline {T}}} is an alphabet with two equal-sized sets; it is convenient to think of it as a set of parentheses types, where T {\displaystyle T} contains the opening parenthesis symbols, whereas the symbols in T ¯ {\displaystyle {\overline {T}}} contains the closing parenthesis symbols. For a matched alphabet T T ¯ {\displaystyle T\cup {\overline {T}}} , the typed Dyck language D T {\displaystyle D_{T}} is given by

D T = { w ( T T ¯ ) w  is a correctly nested sequence of parentheses } . {\displaystyle D_{T}=\{\,w\in (T\cup {\overline {T}})^{*}\mid w{\text{ is a correctly nested sequence of parentheses}}\,\}.}

For example, the following is a valid sentence in the 3-typed Dyck language:

( { } ] ( ) { ( ) } )

Theorem

A language L over the alphabet Σ {\displaystyle \Sigma } is context-free if and only if there exists

  • a matched alphabet T T ¯ {\displaystyle T\cup {\overline {T}}}
  • a regular language R {\displaystyle R} over T T ¯ {\displaystyle T\cup {\overline {T}}} ,
  • and a homomorphism h : ( T T ¯ ) Σ {\displaystyle h:(T\cup {\overline {T}})^{*}\to \Sigma ^{*}}
such that L = h ( D T R ) {\displaystyle L=h(D_{T}\cap R)} .

We can interpret this as saying that any CFG language can be generated by first generating a typed Dyck language, filtering it by a regular grammar, and finally converting each bracket into a word in the CFG language.

References

  1. Chomsky, N.; Schützenberger, M. P. (1959-01-01), Braffort, P.; Hirschberg, D. (eds.), "The Algebraic Theory of Context-Free Languages*", Studies in Logic and the Foundations of Mathematics, Computer Programming and Formal Systems, vol. 26, Elsevier, pp. 118–161, doi:10.1016/S0049-237X(09)70104-1, ISBN 978-0-444-53391-3, retrieved 2024-09-28
Noam Chomsky
Select
bibliography
Linguistics
Politics
Collections
Academic
works about
Filmography
Family
Related
Categories: