Ubuntu Manpage: Marpa::R2::ASF - Marpa's abstract syntax forests (ASF's)

Provided by: libmarpa-r2-perl_12.000000-1_amd64

Name

       Marpa::R2::ASF - Marpa's abstract syntax forests (ASF's)

Synopsis

       We want to "diagram" the following sentence.

           my $sentence = 'a panda eats shoots and leaves.';

       Here's the result we are looking for.  It is in Penntag form:

           (S (NP (DT a) (NN panda))
              (VP (VBZ eats) (NP (NNS shoots) (CC and) (NNS leaves)))
              (. .))
           (S (NP (DT a) (NN panda))
              (VP (VP (VBZ eats) (NP (NNS shoots))) (CC and) (VP (VBZ leaves)))
              (. .))
           (S (NP (DT a) (NN panda))
              (VP (VP (VBZ eats)) (VP (VBZ shoots)) (CC and) (VP (VBZ leaves)))
              (. .))

       Here is the grammar.

           :default ::= action => [ values ] bless => ::lhs
           lexeme default = action => [ value ] bless => ::name

           S   ::= NP  VP  period  bless => S

           NP  ::= NN              bless => NP
               |   NNS          bless => NP
               |   DT  NN          bless => NP
               |   NN  NNS         bless => NP
               |   NNS CC NNS  bless => NP

           VP  ::= VBZ NP          bless => VP
               | VP VBZ NNS        bless => VP
               | VP CC VP bless => VP
               | VP VP CC VP bless => VP
               | VBZ bless => VP

           period ~ '.'

           :discard ~ whitespace
           whitespace ~ [\s]+

           CC ~ 'and'
           DT  ~ 'a' | 'an'
           NN  ~ 'panda'
           NNS  ~ 'shoots' | 'leaves'
           VBZ ~ 'eats' | 'shoots' | 'leaves'

       Here's the code. It actually does two traversals, one that produces the full result as shown above, and
       another which "prunes" the forest down to a single tree.

           my $panda_grammar = Marpa::R2::Scanless::G->new(
               { source => \$dsl, bless_package => 'PennTags', } );
           my $panda_recce = Marpa::R2::Scanless::R->new( { grammar => $panda_grammar } );
           $panda_recce->read( \$sentence );
           my $asf = Marpa::R2::ASF->new( { slr=>$panda_recce } );
           my $full_result = $asf->traverse( {}, \&full_traverser );
           my $pruned_result = $asf->traverse( {}, \&pruning_traverser );

       The code for the full traverser is in an appendix.  The pruning code is simpler.  Here it is:

           sub penn_tag {
              my ($symbol_name) = @_;
              return q{.} if $symbol_name eq 'period';
              return $symbol_name;
           }

           sub pruning_traverser {

           # This routine converts the glade into a list of Penn-tagged elements.  It is called recursively.
               my ( $glade, $scratch ) = @_;
               my $rule_id     = $glade->rule_id();
               my $symbol_id   = $glade->symbol_id();
               my $symbol_name = $panda_grammar->symbol_name($symbol_id);

               # A token is a single choice, and we know enough to fully Penn-tag it
               if ( not defined $rule_id ) {
                   my $literal  = $glade->literal();
                   my $penn_tag = penn_tag($symbol_name);
                   return "($penn_tag $literal)";
               }

               my $length = $glade->rh_length();
               my @return_value = map { $glade->rh_value($_) } 0 .. $length - 1;

               # Special case for the start rule
               return ( join q{ }, @return_value ) . "\n" if $symbol_name eq '[:start]';

               my $join_ws = q{ };
               $join_ws = qq{\n   } if $symbol_name eq 'S';
               my $penn_tag = penn_tag($symbol_name);
               return "($penn_tag " . ( join $join_ws, @return_value ) . ')';

           }

       Here is the "pruned" output:

           (S (NP (DT a) (NN panda))
              (VP (VBZ eats) (NP (NNS shoots) (CC and) (NNS leaves)))
              (. .))

About this document

       This document describes the abstract syntax forests (ASF's) of Marpa's SLIF interface.  An ASF is an
       efficient and practical way to represent multiple abstract syntax trees (AST's).

Constructor

   new()
           my $asf = Marpa::R2::ASF->new( { slr => $slr } );
           die 'No ASF' if not defined $asf;

       Creates a new ASF object.  Must be called with a list of one or more hashes of named arguments.
       Currently only one named argument is allowed, the "slr" argument, and that argument is required.  The
       value of the "slr" argument must be a SLIF recognizer object.

       Returns the new ASF object, or "undef" if there was a problem.

Accessor

   grammar()
           my $grammar     = $asf->grammar();

       Returns the SLIF grammar associated with the ASF.  This can be convenient when using SLIF grammar methods
       while examining an ASF.  All failures are thrown as exceptions.

The traverser method

   traverse()
           my $full_result = $asf->traverse( {}, \&full_traverser );

       Performs a traversal of the ASF.  Returns the value of the traversal, which is computed as described
       below.  It requires two arguments.  The first is a per-traversal object, which must be a Perl reference.
       The second argument must be a reference to a traverser function, Discussion of how to write a traverser
       follows.  The traverse() method may be called repeatedly for an ASF, with the same traverser, or with
       different ones.

How to write a traverser

The process of writing a traverser will be familiar if you have experience with traversing trees. The
traverser may be called at every node of the forest. (These nodes are called glades.) The traverser
must return a value, which may not be an "undef". The value returned by the traverser becomes the value
of the glade. The value of the topmost glade (called the peak) becomes the value of the traversal, and
will be the value returned by the traverse() method.

The traverser is always invoked once for the peak. The traverser is also invoked once for any glade
whose value is required. It may or may not be invoked for other glades. The traverser is never invoked
twice for the same glade. If more than one attempt to made to retrieve the value of a glade, the
traverser will only be invoked for the first one -- all subsequent attempts will return a memoized value.

The traverser is always invoked with two arguments. The first argument will be a glade object. Methods
of the glade object are used to find information about the glade, and to move around in it.

The second of the two arguments to a traverser is the per-traversal object, which will be shared by all
calls in the traversal. The per-traversal object may be used as a "scratch pad" for information that it
is not convenient to pass via return values. Prefer the per-traversal object to the use of globals.

"Moving around" in a glade means visiting its parse alternatives. (Parse alternatives are usually called
alternatives, when the meaning is clear.) If a glade has exactly one alternative, it is called a trivial
glade. When invoked, the traverser points at the first alternative. Alternatives after the first may be
visited using the next() glade method.

Parse alternatives may be either token alternatives or rule alternatives. Rule and token alternatives
may be distnguished with the rule_id() glade method, which returns "undef" if and only if the glade is
positioned at a token alternative.

As a special case, a glade representing a nulled symbol is always a trivial glade, containing only one
token alternative. This means that a nulled symbol is always treated as a token in this context, even
when it actually is the LHS symbol of a nulled rule.

At all alternatives, the span() and the literal() glade methods are of use. The symbol_id() glade method
is also always of use, although its meaning varies. At token alteratives, the symbol_id() method returns
the ID of the token symbol. At rule alteratives, the symbol_id() method returns the ID of the LHS symbol
of the rule.

At rule alternatives, the rh_length() and the rh_value() glade methods are of use. The rh_length()
method returns the length of the RHS, and the rh_value() method returns the value of one of the RHS
children, as determined using the traverser.

At the peak of the ASF, the symbol will be named '"[:start]"'. This case often requires special
treatment. Note that it is entirely possible for the peak glade to be non-trivial.

Glade methods

       These are methods of the glade object.  Glade objects are passed as arguments to the traversal routine,
       and are only valid within its scope.

   literal()
           my $literal = $glade->literal();

       Returns the glade literal, a string in the input which corresponds to this glade.  The glade literal
       remains constant inside a glade.  The literal() method accepts no arguments.

   span()
           my ( $start, $length ) = $glade->span();
           my $end = $start + $length - 1;

       Returns the glade span, two numbers which describe the location which corresponds to this glade.  The
       first number will be the start of the span, as an offset in the input stream.  The second number will be
       its length.  The glade span remains constant within a glade.  The span() method accepts no arguments.

       The "end" character of the span, when defined, may be calculated as its start plus its length, minus one.
       Applications should note that glades representing nulled symbols are special cases.  They will have a
       length of zero and, properly speaking, their literals are zero length and do not have defined first
       (start) and last (end) characters.

   symbol_id()
           my $symbol_id   = $glade->symbol_id();

       Returns the glade symbol.  For a token alternative, the glade symbol is the token symbol.  For a rule
       alternative, the glade symbol is the LHS symbol of the rule.  The symbol ID remains constant within a
       glade.  The symbol_id() method accepts no arguments.

   rule_id()
           my $rule_id     = $glade->rule_id();

       Returns the ID of the rule for the current alternative.  The ID will be a non-negative number, or
       "undef".  (Note that, for alternatives in this interface, both zero and "undef" are considered valid rule
       IDs.)  Returns "undef" if and only if the current alternative is a token alternative.  The rule_id()
       method accepts no arguments.

   rh_length()
           my $length = $glade->rh_length();

       Returns the number of RHS children of the current rule.  On success, this will always be an integer
       greater than zero.  The rh_length() method accepts no arguments.  It is a fatal error to call rh_length()
       for a glade that currently points to a token alternative.

   rh_value()
           my $child_value = $glade->rh_value($rh_ix);

       Requires exactly one argument, $rh_ix, which must be the zero-based index of a RHS child of the current
       rule instance.  Returns the value of the child at index $rh_ix of the current rule instance.  For
       convenient iteration, returns "undef" if the value of the $rh_ix is greater than or equal to the RHS
       length.  It is a fatal error to call rh_value() for a glade that currently points to a token alternative.

   rh_values()
           my @return_value = $glade->rh_values();

       Returns the RHS children of the current rule.  The rh_values() method accepts no arguments.  It is a
       fatal error to call rh_values() for a glade that currently points to a token alternative.

   all_choices()
           my @results = $glade->all_choices();

       all_choices() expects to be called by a traverser which always return its parse results as a reference to
       an array of choices.  all_choices() returns the Cartesian product of the lists of choices at the current
       position.

       For example, suppose there were 3 child values, and

       •   "A" was the only choice for the first child value;

       •   "B", "C" and "D" were the 3 choices for the value of the second child; and

       •   "E" was the only choice for the third child value.

       We would then expect the traverser to return parse results that were references to arrays of the choices:

           [A]
           [B,C,D]
           [E]

       and the result of all_choices() would be their Cartesian product:

           [A,B,E]
           [A,C,E]
           [A,D,E]

       If a parse result is not an unblessed array, all_choices() will treat it as if it was the only choice for
       that  child value.  That is, if the parse result is "X", where "X" is not an unblessed array, then effect
       would be the same as if "[ X ]", a reference to an array whose only element is "X", had  been  the  parse
       result.

   next()
           last CHOICE if not defined $glade->next();

       Points the glade at the next alternative.  If there is no next alternative, returns "undef".  On success,
       returns  a defined value.  One of the values returned on success may be the integer zero, so applications
       using this method to control an interation should be careful to check for a Perl defined value,  and  not
       for a Perl true value.

       In addition, because the rule_id() method remains constant only within a symch, and the next() method may
       change  the  current  symch,  rule_id()  method  must always be called to obtain the current rule ID in a
       "while" loop that is controlled by the next() method.

Details

This section contains additional explanations, not essential to understanding the rest of this document.
Often they are formal or mathematical. While some people find these helpful, others find them
distracting, which is why they are segregated here.

Symches and factorings
Symch and factoring are terms which are useful for some advanced applications. For the purposes of this
document, the reader can consider the term "factoring" as a synonym for "parse alternative". A symch is
either a rule symch or a token alternative. A rule symch is a series of rule alternatives (factorings)
which share the same rule ID and the same glade. A glade's token alternative is a symch all by itself.
The term symch is shorthand for "symbolic choice".

The value that each glade accessor returns can be classified as

• remaining constant inside a glade;

• remaining constant within a symch; or

• potentially varying with each factoring.

The values of the literal(), span(), and symbol_id() methods remain constant inside each glade. The
rule_id() method remains constant within a symch -- in fact, the rule ID and the glade define a symch.
(Recall that for alternatives in the ASF interface, "undef" is considered a rule ID.) The values of the
rh_length() method and the values of the rh_value() method method may vary with each alternative
(factoring).

When moving through a glade using the next() method, alternatives within the same symch are visited as a
group. More precisely, let the "current rule ID" be defined as the rule ID of the alternative at which
the glade is currently pointing. The next() glade method guarantees that, before any alternative with a
rule ID different from the current rule ID is visited, all of the so-far-unvisited alternatives that
share the current rule ID will be visited.

Appendix: full traverser code

       This code is not fully generalized.  Applications will often not  be  able  to  use  it  exactly  as  is.
       Instead,  this  full_traverser(),  and the pruning_traverser(), are intended to be used as templates.  An
       application adopting this full_traverser() will often want to introduce some pruning,  as  well  as  some
       processing  of each child value.  For example, the volumes of data processed is often quite large, and an
       application may wish to compact each child value in some way.

           sub full_traverser {

               # This routine converts the glade into a list of Penn-tagged elements.
               # It is called recursively.
               my ( $glade, $scratch ) = @_;
               my $rule_id     = $glade->rule_id();
               my $symbol_id   = $glade->symbol_id();
               my $symbol_name = $panda_grammar->symbol_name($symbol_id);

               # A token is a single choice, and we know enough to fully Penn-tag it
               if ( not defined $rule_id ) {
                   my $literal  = $glade->literal();
                   my $penn_tag = penn_tag($symbol_name);
                   return ["($penn_tag $literal)"];
               } ## end if ( not defined $rule_id )

               # Our result will be a list of choices
               my @return_value = ();

             CHOICE: while (1) {

                   # The results at each position are a list of choices, so
                   # to produce a new result list, we need to take a Cartesian
                   # product of all the choices
                   my $length = $glade->rh_length();
                   my @results = ( [] );
                   for my $rh_ix ( 0 .. $length - 1 ) {
                       my @new_results = ();
                       for my $old_result (@results) {
                           my $child_value = $glade->rh_value($rh_ix);
                           for my $new_value ( @{$child_value} ) {
                               push @new_results, [ @{$old_result}, $new_value ];
                           }
                       }
                       @results = @new_results;
                   } ## end for my $rh_ix ( 0 .. $length - 1 )

                   # Special case for the start rule
                   if ( $symbol_name eq '[:start]' ) {
                       return [ map { join q{}, @{$_} } @results ];
                   }

                   # Now we have a list of choices, as a list of lists.  Each sub list
                   # is a list of Penn-tagged elements, which we need to join into
                   # a single Penn-tagged element.  The result will be to collapse
                   # one level of lists, and leave us with a list of Penn-tagged
                   # elements
                   my $join_ws = q{ };
                   $join_ws = qq{\n   } if $symbol_name eq 'S';
                   push @return_value, map {
                           '('
                         . penn_tag($symbol_name) . q{ }
                         . ( join $join_ws, @{$_} ) . ')'
                   } @results;

                   # Look at the next alternative in this glade, or end the
                   # loop if there is none
                   last CHOICE if not defined $glade->next();

               } ## end CHOICE: while (1)

               # Return the list of Penn-tagged elements for this glade
               return \@return_value;
           }

Copyright and License

         Copyright 2022 Jeffrey Kegler
         This file is part of Marpa::R2.  Marpa::R2 is free software: you can
         redistribute it and/or modify it under the terms of the GNU Lesser
         General Public License as published by the Free Software Foundation,
         either version 3 of the License, or (at your option) any later version.

         Marpa::R2 is distributed in the hope that it will be useful,
         but WITHOUT ANY WARRANTY; without even the implied warranty of
         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
         Lesser General Public License for more details.

         You should have received a copy of the GNU Lesser
         General Public License along with Marpa::R2.  If not, see
         http://www.gnu.org/licenses/.

perl v5.40.1                                       2025-06-25                                Marpa::R2::ASF(3pm)