Misplaced Pages

Inside–outside algorithm

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Parameter estimation method for probabilistic context-free grammars

For parsing algorithms in computer science, the inside–outside algorithm is a way of re-estimating production probabilities in a probabilistic context-free grammar. It was introduced by James K. Baker in 1979 as a generalization of the forward–backward algorithm for parameter estimation on hidden Markov models to stochastic context-free grammars. It is used to compute expectations, for example as part of the expectation–maximization algorithm (an unsupervised learning algorithm).

Inside and outside probabilities

The inside probability β j ( p , q ) {\displaystyle \beta _{j}(p,q)} is the total probability of generating words w p w q {\displaystyle w_{p}\cdots w_{q}} , given the root nonterminal N j {\displaystyle N^{j}} and a grammar G {\displaystyle G} :

β j ( p , q ) = P ( w p q | N p q j , G ) {\displaystyle \beta _{j}(p,q)=P(w_{pq}|N_{pq}^{j},G)}

The outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is the total probability of beginning with the start symbol N 1 {\displaystyle N^{1}} and generating the nonterminal N p q j {\displaystyle N_{pq}^{j}} and all the words outside w p w q {\displaystyle w_{p}\cdots w_{q}} , given a grammar G {\displaystyle G} :

α j ( p , q ) = P ( w 1 ( p 1 ) , N p q j , w ( q + 1 ) m | G ) {\displaystyle \alpha _{j}(p,q)=P(w_{1(p-1)},N_{pq}^{j},w_{(q+1)m}|G)}

Computing inside probabilities

Base Case:

β j ( p , p ) = P ( w p | N j , G ) {\displaystyle \beta _{j}(p,p)=P(w_{p}|N^{j},G)}

General case:

Suppose there is a rule N j N r N s {\displaystyle N_{j}\rightarrow N_{r}N_{s}} in the grammar, then the probability of generating w p w q {\displaystyle w_{p}\cdots w_{q}} starting with a subtree rooted at N j {\displaystyle N_{j}} is:

k = p k = q 1 P ( N j N r N s ) β r ( p , k ) β s ( k + 1 , q ) {\displaystyle \sum _{k=p}^{k=q-1}P(N_{j}\rightarrow N_{r}N_{s})\beta _{r}(p,k)\beta _{s}(k+1,q)}

The inside probability β j ( p , q ) {\displaystyle \beta _{j}(p,q)} is just the sum over all such possible rules:

β j ( p , q ) = N r , N s k = p k = q 1 P ( N j N r N s ) β r ( p , k ) β s ( k + 1 , q ) {\displaystyle \beta _{j}(p,q)=\sum _{N_{r},N_{s}}\sum _{k=p}^{k=q-1}P(N_{j}\rightarrow N_{r}N_{s})\beta _{r}(p,k)\beta _{s}(k+1,q)}

Computing outside probabilities

Base Case:

α j ( 1 , n ) = { 1 if  j = 1 0 otherwise {\displaystyle \alpha _{j}(1,n)={\begin{cases}1&{\mbox{if }}j=1\\0&{\mbox{otherwise}}\end{cases}}}

Here the start symbol is N 1 {\displaystyle N_{1}} .

General case:

Suppose there is a rule N r N j N s {\displaystyle N_{r}\rightarrow N_{j}N_{s}} in the grammar that generates N j {\displaystyle N_{j}} . Then the left contribution of that rule to the outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is:

k = q + 1 k = n P ( N r N j N s ) α r ( p , k ) β s ( q + 1 , k ) {\displaystyle \sum _{k=q+1}^{k=n}P(N_{r}\rightarrow N_{j}N_{s})\alpha _{r}(p,k)\beta _{s}(q+1,k)}

Now suppose there is a rule N r N s N j {\displaystyle N_{r}\rightarrow N_{s}N_{j}} in the grammar. Then the right contribution of that rule to the outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is:

k = 1 k = p 1 P ( N r N s N j ) α r ( k , q ) β s ( k , p 1 ) {\displaystyle \sum _{k=1}^{k=p-1}P(N_{r}\rightarrow N_{s}N_{j})\alpha _{r}(k,q)\beta _{s}(k,p-1)}

The outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is the sum of the left and right contributions over all such rules:

α j ( p , q ) = N r , N s k = q + 1 k = n P ( N r N j N s ) α r ( p , k ) β s ( q + 1 , k ) + N r , N s k = 1 k = p 1 P ( N r N s N j ) α r ( k , q ) β s ( k , p 1 ) {\displaystyle \alpha _{j}(p,q)=\sum _{N_{r},N_{s}}\sum _{k=q+1}^{k=n}P(N_{r}\rightarrow N_{j}N_{s})\alpha _{r}(p,k)\beta _{s}(q+1,k)+\sum _{N_{r},N_{s}}\sum _{k=1}^{k=p-1}P(N_{r}\rightarrow N_{s}N_{j})\alpha _{r}(k,q)\beta _{s}(k,p-1)}

References

  1. ^ Manning, Christopher D.; Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press. pp. 388–402. ISBN 0-262-13360-1.

External links

Category: