Misplaced Pages

Literal movement grammar

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In linguistics and theoretical computer science, literal movement grammars (LMGs) are a grammar formalism intended to characterize certain extraposition phenomena of natural language such as topicalization and cross-serial dependency. LMGs extend the class of context free grammars (CFGs) by adding introducing pattern-matched function-like rewrite semantics, as well as the operations of variable binding and slash deletion. LMGs were introduced by A.V. Groenink in 1995.

Description

The basic rewrite operation of an LMG is very similar to that of a CFG, with the addition of arguments to the non-terminal symbols. Where a context-free rewrite rule obeys the general schema S α {\displaystyle S\to \alpha } for some non-terminal S {\displaystyle S} and some string of terminals and/or non-terminals α {\displaystyle \alpha } , an LMG rewrite rule obeys the general schema X ( x 1 , . . . , x n ) α {\displaystyle X(x_{1},...,x_{n})\to \alpha } , where X is a non-terminal with arity n (called a predicate in LMG terminology), and α {\displaystyle \alpha } is a string of "items", as defined below. The arguments x i {\displaystyle x_{i}} are strings of terminal symbols and/or variable symbols defining an argument pattern. In the case where an argument pattern has multiple adjacent variable symbols, the argument pattern will match any and all partitions of the actual value that unify. Thus, if the predicate is f ( x y ) {\displaystyle f(xy)} and the actual pattern is f ( a b ) {\displaystyle f(ab)} , there are three valid matches: x = ϵ ,   y = a b ;   x = a ,   y = b ;   x = a b ,   y = ϵ {\displaystyle x=\epsilon ,\ y=ab;\ x=a,\ y=b;\ x=ab,\ y=\epsilon } . In this way, a single rule is actually a family of alternatives.

An "item" in a literal movement grammar is one of

  • f ( x 1 , , x n ) {\displaystyle f(x_{1},\ldots ,x_{n})} , a predicate of arity n,
  • x : f ( x 1 , , x n ) {\displaystyle x{\text{:}}f(x_{1},\ldots ,x_{n})} , a variable binding x to the string produced by f ( x 1 , . . . , x n ) {\displaystyle f(x_{1},...,x_{n})} , or
  • f ( x 1 , , x n ) / α {\displaystyle f(x_{1},\ldots ,x_{n})/\alpha } , a slash deletion of f ( x 1 , . . . , x n ) {\displaystyle f(x_{1},...,x_{n})} by the string of terminals and/or variables α {\displaystyle \alpha } .

In a rule like f ( x 1 , . . . , x m ) α   y : g ( z 1 , . . . z n )   β {\displaystyle f(x_{1},...,x_{m})\to \alpha \ y{\text{:}}g(z_{1},...z_{n})\ \beta } , the variable y is bound to whatever terminal string the g predicate produces, and in α {\displaystyle \alpha } and β {\displaystyle \beta } , all occurrences of y are replaced by that string, and α {\displaystyle \alpha } and β {\displaystyle \beta } are produced as if terminal string had always been there.

An item x / y {\displaystyle x/y} , where x is something that produces a terminal string (either a terminal string itself or some predicate), and y is a string of terminals and/or variables, is rewritten as the empty string ( ϵ {\displaystyle \epsilon } ) if and only if g ( y 1 , . . . , y n ) = z {\displaystyle g(y_{1},...,y_{n})=z} , and otherwise cannot be rewritten at all.

Example

LMGs can characterize the non-CF language { a n b n c n : n 1 } {\displaystyle \{a^{n}b^{n}c^{n}:n\geq 1\}} as follows:

S ( ) x : A ( )   B ( x ) {\displaystyle S()\to x{\text{:}}A()\ B(x)}
A ( ) a   A ( ) {\displaystyle A()\to a\ A()}
A ( ) ϵ {\displaystyle A()\to \epsilon }
B ( x y ) a / x   b   B ( y ) c {\displaystyle B(xy)\to a/x\ b\ B(y)c}
B ( ϵ ) ϵ {\displaystyle B(\epsilon )\to \epsilon }

The derivation for aabbcc, using parentheses also for grouping, is therefore

S ( ) x : A ( )   B ( x ) x : ( a   A ( ) )   B ( x ) x : ( a a   A ( ) )   B ( x ) x : a a   B ( x ) a a   B ( a a ) {\displaystyle S()\to x{\text{:}}A()\ B(x)\to x{\text{:}}(a\ A())\ B(x)\to x{\text{:}}(aa\ A())\ B(x)\to x{\text{:}}aa\ B(x)\to aa\ B(aa)}

a a   a / a   b   B ( a )   c a a b   B ( a )   c a a b   a / a   b   B ( )   c c a a b b   B ( )   c c   a a b b c c {\displaystyle \to aa\ a/a\ b\ B(a)\ c\to aab\ B(a)\ c\to aab\ a/a\ b\ B()\ cc\to aabb\ B()\ cc\ \to aabbcc}

Computational power

Languages generated by LMGs contain the context-free languages as a proper subset, as every CFG is an LMG where all predicates have arity 0 and no production rule contains variable bindings or slash deletions.

References

  1. Groenink, Annius V. 1995. Literal Movement Grammars. In Proceedings of the 7th EACL Conference.
Automata theory: formal languages and formal grammars
Chomsky hierarchyGrammarsLanguagesAbstract machines
  • Type-0
  • Type-1
  • Type-2
  • Type-3
Each category of languages, except those marked by a , is a proper subset of the category directly above it. Any language in each category is generated by a grammar and by an automaton in the category in the same line.
Categories: