Revision as of 01:03, 8 May 2006 edit-Barry- (talk | contribs)1,472 edits →Regular expressions: I think I improved my old wording. The last two "other proposed terms" come from http://colabti.de/irclogger/irclogger_log/perl6?date=2006-04-26,Wed#l1125← Previous edit | Revision as of 19:39, 8 May 2006 edit undoHarmil (talk | contribs)8,207 edits →Regular expressions: To quote nethack: ulch! Tainted meat! The conflation of rules and regexes here is dangerously misleadingNext edit → | ||
Line 101: | Line 101: | ||
=== Regular expressions === | === Regular expressions === | ||
Perl's ] and string-processing support has always been one of its defining features. Unlike most other languages, in which regular expressions are provided by a ], Perl has ] facilities built-in to the language. Since Perl's pattern-matching constructs have exceeded the capabilities of ] regular expressions for some time, |
Perl's ] and string-processing support has always been one of its defining features. Unlike most other languages, in which regular expressions are provided by a ], Perl has ] facilities built-in to the language. Since Perl's pattern-matching constructs have exceeded the capabilities of ] regular expressions for some time, Perl 6 documentation will exclusively refer to them as ''regexes'', distancing the term from the formal definition. | ||
Perl 6 provides a superset of Perl 5 features with rexpect to regexes, folding them into a larger framework called "rules" which provide the capabilities of a ], as well as acting as a ] with respect to their lexical scope. Rules are introduced with the <code>rule</code> keyword which has a usage quite similar to subroutine definition. Anonymous rules can also be introduced with the <code>regex</code> (or <code>rx</code>) keyword, or they can simply be used inline as regexps were in Perl 5 via the <code>m</code> (matching) or <code>s</code> (search and replace) operators. | |||
In , Larry Wall enumerated 20 problems with "current regex culture". Among these were that Perl's regexes were "too compact and 'cute'", had "too much reliance on too few metacharacters", "little support for named captures", "little support for grammars", and "poor integration with 'real' language". He then proceeded to lay out what were then the most radical changes to the language yet. | In , Larry Wall enumerated 20 problems with "current regex culture". Among these were that Perl's regexes were "too compact and 'cute'", had "too much reliance on too few metacharacters", "little support for named captures", "little support for grammars", and "poor integration with 'real' language". He then proceeded to lay out what were then the most radical changes to the language yet. | ||
Line 117: | Line 119: | ||
* Simplified code assertions: <code><?{...}> | * Simplified code assertions: <code><?{...}> | ||
* Perl 5's <code>/x</code> is now the default. | * Perl 5's <code>/x</code> is now the default. | ||
Also, in Perl 6, one can define ]s that encapsulate related regexes, analogously to how classes encapsulate methods, or how ]s encapsulate subroutines. | |||
Examples: | Examples: |
Revision as of 19:39, 8 May 2006
Perl 6 is the next version of the Perl programming language, currently under development. The vision for Perl 6 is more than simply a rewrite of Perl 5.
Perl 6 is not intended to be backwards-compatible, though there will be a compatibility mode. Larry Wall, the creator of Perl, has called Perl 6 "the community's rewrite of Perl", because he has based the changes largely on 361 "requests for change" submitted by the Perl community in 2000. He is outlining these changes in a series of long essays, called Apocalypses, which are numbered to correspond to chapters in Programming Perl ("The Camel Book"). The current, unfinalized, specification of Perl 6 is encapsulated in design documents called Synopses, which are numbered to correspond to Apocalypses.
Implementations
Pugs is an implementation of Perl 6 in the Haskell programming language that will be used for bootstrapping. Pugs's goal is to write the Perl 6 compiler in Perl 6 itself, possibly by translating its source code to Perl 6. After that, Perl 6 will be self-hosted—it will be used to compile itself. Much of the implementation of Perl will then be exposed, making it possible to, for example, extend the parser.
Pugs can execute Perl 6 code directly, as well as compile Perl 6 to JavaScript, Perl 5 or Parrot bytecode.
Parrot is a virtual machine designed for interpreted languages, primarily for Perl 6. The self-hosting Perl 6 compiler will target (and also run on) Parrot.
Major changes from Perl 5
Perl 5 and Perl 6 differ fundamentally, though in general the intent has been to "keep Perl 6 Perl". Most of the changes are intended to normalize the language, to make it easier for learning and expert programmers alike to understand, and to make "easy things easier and hard things more possible".
A Specification
A major, but non-technical difference between Perl 5 and Perl 6 is that Perl 6 began as a specification. This means that Perl 6 can be re-implemented if needed, and it also means that programmers don't have to read the source code for the ultimate authority on any given feature. While Perl 5's documentation was regarded as excellent, even outside of the Perl community , if the documentation and the source code of the Perl 5 interpreter disagreed, the documentation would be changed.
A Type System
In Perl 6, the dynamic type system of Perl 5 has been augmented by the addition of static types. For example:
my int $i = 0; my num $n = 3.141; my str $s = "Hello, world";
However, as with Perl 5, programmers can do most things without any explicit typing at all:
my $i = "25" + 10;
Static typing is beneficial for reducing subtle errors and increasing maintainability, especially in large software projects. But static typing is a burden when writing quick scripts, one-liners or "one-off" code (i.e., code that is written to achieve some temporary purpose, run once and retired), purposes that have long been a mainstay of Perl. A currently unresolved debate in the Perl 6 design community regards how code written without types interacts with code using types.
Formal Subroutine Parameter Lists
Perl 5 defined subroutines without formal parameter lists at all (though simple parameter counting and some very loose type checking can be done using Perl 5's "prototypes"). Subroutine arguments passed in became aliases into elements of the array @_. If @_ were modified, the changes would be reflected in the original data:
# Perl 5 code sub incr { $_++ } my $x = 1; incr($x); # $x is now 2 incr(3); # runtime error: "Modification of a read-only value attempted"
Perl 6 introduces true formal parameters to the language. In Perl 6, a subroutine declaration looks something like this:
sub do_something(Str $thing, Int $other) { ... }
As in Perl 5, the formal parameters (i.e., the pseudo-variables in the parameter list) are aliases to the actual parameters (the values passed in), but by default, the aliases are marked is readonly (meaning similar to constant) so they cannot be modified:
sub incr(Num $x) { $x++; # compile-time error }
If a formal parameter is followed by is copy or is rw, however, it can be modified. In the is copy case, Perl 6 copies the actual parameter's data rather than aliasing it; so it can be modified, but changes are local to the subroutine. In the is rw case (rw stands for "read-write"), the alias is not marked readonly. This change also catches at compile-time errors such as the one above:
sub incr(Num $x is rw) { $x++ } incr(3); #compile-time error
There are a number of features in parameter lists which make parameter passing much more powerful than in Perl 5:
- = after the parameter can be used to assign default values.
- ? after the parameter indicates optional arguments.
- : before the parameter indicates a named argument (passed like an element of a hash).
- where can supply a condition that the actual parameter must match.
For example:
sub my_split(Rule $pat? = rx/\s+/, Str $expr? = $_, Int $lim? where $^lim >= 0) { ... }
Sigil invariance
In Perl 5, sigils—the punctuation characters that precede a variable name—changed depending on how the variable was used:
# Perl 5 code my @array = (0, 1, 2, 3); my $element = $array; # $element equals 1
In Perl 6, sigils are invariant:
my @array = (0, 1, 2, 3); my $element = @array; # $element equals 1
This change is meant to reduce the cognitive load of recognizing that a variable spelled $array...
is actually the variable @array
.
Object orientation
Perl 5 supported object orientation via a mechanism known as blessing. Any reference could be blessed into being an object of a particular class, as so:
# Perl 5 code my $object = bless $reference, 'Class';
A blessed object could then have methods invoked on it using the "arrow syntax":
# Perl 5 code $object->method();
which would cause Perl to locate ("dispatch") an appropriate subroutine named method
, and call it with $object
as its first argument.
While extremely powerful—virtually any other computer language's object model could be simulated using this simple facility—it made the most common case of object orientation, a struct-like object with some associated code, unnecessarily difficult. In addition, because Perl could make no assumptions about the object model in use, method invocation could not be optimized very well.
In the spirit of making the "easy things easy but hard things possible", Perl 6 retains the blessing model for programmers who desire unusual behavior, but supplies a more robust object model for the common cases. For example, a class to encapsulate a Cartesian point could be written:
class Point is rw { has $.x; has $.y; }
and then used:
my Point $point .= new; $point.x = 1.2; $point.y = -3.7;
The dot replaces the arrow in a nod to the many other languages, e.g. C++, Java, Python, and Ruby, that have coalesced around dot as the syntax for method invocation.
Note that the methods "x
" and "y
" are not explicitly declared. They are called auto-accessors. The "is rw" modifier on the class definition allows all of its public member attributes to be writable by default, using the auto-accessors.
Attributes may be declared in the following ways:
has $.a; # default accessor mode (usually read only) has $.b is rw; # Read/write accessor has $!c; # No public accessor (private) has $d; # Same as $!d
Regular expressions
Perl's regular expression and string-processing support has always been one of its defining features. Unlike most other languages, in which regular expressions are provided by a library, Perl has pattern matching facilities built-in to the language. Since Perl's pattern-matching constructs have exceeded the capabilities of formal regular expressions for some time, Perl 6 documentation will exclusively refer to them as regexes, distancing the term from the formal definition.
Perl 6 provides a superset of Perl 5 features with rexpect to regexes, folding them into a larger framework called "rules" which provide the capabilities of a parsing expression grammar, as well as acting as a closure with respect to their lexical scope. Rules are introduced with the rule
keyword which has a usage quite similar to subroutine definition. Anonymous rules can also be introduced with the regex
(or rx
) keyword, or they can simply be used inline as regexps were in Perl 5 via the m
(matching) or s
(search and replace) operators.
In Apocalypse 5, Larry Wall enumerated 20 problems with "current regex culture". Among these were that Perl's regexes were "too compact and 'cute'", had "too much reliance on too few metacharacters", "little support for named captures", "little support for grammars", and "poor integration with 'real' language". He then proceeded to lay out what were then the most radical changes to the language yet.
It may be most telling that there are only six unchanged features from Perl 5's regexes:
- Literals: word characters such as "A" and underscore will be matched literally.
- Capturing:
(...)
- Alternatives:
|
- Backslash escape:
\
- Repetition quantifiers:
*
,+
, and?
- Minimal matching suffix:
*?
,+?
,??
A few of the most powerful additions include:
- Simplified non-capturing groups:
which are the same as Perl 5's:
(?:...)
- Commit at various levels of scope via:
:
,::
,:::
, and<commit>
- Simplified code assertions:
<?{...}>
- Perl 5's
/x
is now the default.
Examples:
rx { a ( d | e ) f : g }
rx { ?( ab* ) <{ $1.size % 2 == 0 }> }
That last is identical to:
rx { ( ab* ) }
Syntactic simplification
The parentheses (round brackets) required in control flow constructs in Perl 5 are now optional:
if is_true() {
for @array {
...
}
}
The three dots above (...
) are syntactically valid in Perl 6 and are called the "yadda-yadda operator". "...
" can be used as a placeholder for code to be inserted later. If a running program attempts to execute "...
", however, an exception is thrown. This operator is useful for abstract methods, or for marking places where the programmer intends to insert code later.
Chained comparisons
New programmers often expect chained comparisons like the following to work:
if 1 <= $die1 == $die2 <= 6 { say "Doubles!" }
In Perl 6, this code now works as expected, in the spirit of DWIM (Do What I Mean), and is executed as if written:
if 1 <= $die1 and $die1 == $die2 and $die2 <= 6 { say "Doubles!" }
Lazy evaluation
Perl 6 gains the lazy evaluation of lists that has been a feature of some functional programming languages such as Haskell:
my int @integers = 0..Inf; # integers from 0 to infinity
for @integers -> $counter {
say "$counter";
last if $counter >= 10;
}
The code above will not crash by attempting to assign a list of infinite size to the array @integers
, nor will it hang indefinitely in attempting to expand the list in the for
loop. Instead, it will print the integers from 0 to 10, and then continue.
Because of this behavior, the well-known Perl idiom for reading from the argument list or standard input:
# Perl 5 code
while (<>) {
print;
}
is replaced by
for =<> {
print;
}
The prefix =
operator turns a filehandle or a file name into an iterator.
This is no longer an idiom, because, unlike the Perl 5 code, which automagically was expanded into
# Perl 5 code
while (defined ($_ = <>)) {
print;
}
Perl 6's lazy evaluation of for
(along with a rule terminating the lazy input list <>
at the end-of-file) means that the Perl 6 for
loop above does not have to be subjected to any special conversion to work as expected.
Junctions
Perl 6 introduces the concept of junctions: values that are composites of other values. In the earliest days of Perl 6's design, these were called "superpositions", by analogy to the concept in quantum physics of quantum superpositions — waveforms that can simultaneously occupy several states until observation "collapses" them. A Perl 5 module released in 2000 by Damian Conway called Quantum::Superpositions
provided an initial proof of concept. While at first, such superpositional values seemed like merely a programmatic curiosity, over time their utility and intuitiveness became widely recognized, and junctions now occupy a central place in Perl 6's design.
In their simplest form, junctions are created by combining a set of values with junctive operators:
my $even_digit = 0|2|4|6|8; # any(0, 2, 4, 6, 8)
my $odd_digits = 1&3&5&7&9; # all(1, 3, 5, 7, 9)
my $not_zero = none(0);
These values can be used arithmetically:
my $junction = 1|2|3;
$junction += 4; # junction now equals 5|6|7
$junction += (1&2); # junction now equals (6|7|8)&(7|8|9)
or in comparisons:
if $grade eq any('A'..'D') { say "pass" }
or even in subscripting:
if %person{any('first_name','nickname')} eq "Joe" { say "What do you know, Joe?" }
Junctions can also be used to more richly augment the type system:
class RGB_Color is Tuple & Color { ... }
sub get_tint (RGB_Color|CMYK_Color $color, num $opacity where 0 <= $^opacity <= 1) { ... }
sub store_record (Record&Storable $rec) { ... }
Junctions are unordered; 1|2|3
and 3|2|1
represent the same value. This lack of ordering means that the Perl 6 compiler can choose to evaluate junctive expressions in parallel. In fact, many in the Perl 6 community believe that junctions could supersede explicit multithreading as the ordinary way to achieve parallelism in Perl 6. For instance, the code
for all(@array) { ...}
would indicate to the compiler that the for
loop should be run in parallel rather than in serial.
Macros
In low-level languages, the concept of macros has become synonymous with textual substitution of source-code due to the widespread use of the C preprocessor, however, high-level languages such as Lisp pre-dated C in their use of macros that were far more powerful. It is this Lisp-like macro concept that Perl 6 will take advantage of. The power of this sort of macro stems from the fact that it operates on the program as a high-level data structure, rather than as simple text, and has the full capabilities of the programming language at its disposal.
A Perl 6 macro definition will look like a subroutine or method definition, and can operate on unparsed strings, an AST representing pre-parsed code, or a combination of the two. A macro definition would look like this:
macro hello($what) {
q:code { say "Hello { {{{$what}}} }" };
}
In this particular example, the macro is no more complex than a C-style textual substitution, but because parsing of the macro parameter occurs before the macro operates on the calling code, diagnostic messages would be far more informative. However, because the body of a macro is executed at compile time each time it is used, many techniques of optimization can be employed. It is even possible to entirely eliminate complex computations from resulting programs by performing the work at compile-time.
Hello world
The hello world program in Perl 6 can be written
say "Hello world"
though there's more than one way to do it. say
is new to Perl 6: it is like print
, but appends a newline to its output, similar to REXX's say
, Pascal's writeln
or Ruby and C's puts
.
JAPH
Like in Perl 5, JAPHs (Perl programs which print "Just another Perl hacker,") are a good way to experiment with Perl 6.
sub japh (Str $lang) { say "just another $lang hacker"; }
my &perl6Japh := &japh.assuming("Perl6");
perl6Japh();
External links
The general site for Perl 6 development:
The specification is broken down into:
And also:
The main site for Parrot development:
Category: