------------------------------
10/02/06: beazley
The last Lexer object built by lex() can be found in lex.lexer.
The last Parser object built by yacc() can be found in yacc.parser.
10/02/06: beazley
New example added: examples/yply
This example uses PLY to convert Unix-yacc specification files to
PLY programs with the same grammar. This may be useful if you
want to convert a grammar from bison/yacc to use with PLY.
10/02/06: beazley
Added support for a start symbol to be specified in the yacc
input file itself. Just do this:
start = 'name'
where 'name' matches some grammar rule. For example:
def p_name(p):
'name : A B C'
...
This mirrors the functionality of the yacc %start specifier.
09/30/06: beazley
Some new examples added.:
examples/GardenSnake : A simple indentation based language similar
to Python. Shows how you might handle
whitespace. Contributed by Andrew Dalke.
examples/BASIC : An implementation of 1964 Dartmouth BASIC.
Contributed by Dave against his better
judgement.
09/28/06: beazley
Minor patch to allow named groups to be used in lex regular
expression rules. For example:
t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
Patch submitted by Adam Ring.
09/28/06: beazley
LALR(1) is now the default parsing method. To use SLR, use
yacc.yacc(method="SLR"). Note: there is no performance impact
on parsing when using LALR(1) instead of SLR. However, constructing
the parsing tables will take a little longer.
09/26/06: beazley
Change to line number tracking. To modify line numbers, modify
the line number of the lexer itself. For example:
def t_NEWLINE(t):
r'\n'
t.lexer.lineno += 1
This modification is both cleanup and a performance optimization.
In past versions, lex was monitoring every token for changes in
the line number. This extra processing is unnecessary for a vast
majority of tokens. Thus, this new approach cleans it up a bit.
*** POTENTIAL INCOMPATIBILITY ***
You will need to change code in your lexer that updates the line
number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
09/26/06: beazley
Added the lexing position to tokens as an attribute lexpos. This
is the raw index into the input text at which a token appears.
This information can be used to compute column numbers and other
details (e.g., scan backwards from lexpos to the first newline
to get a column position).
09/25/06: beazley
Changed the name of the __copy__() method on the Lexer class
to clone(). This is used to clone a Lexer object (e.g., if
you're running different lexers at the same time).
09/21/06: beazley
Limitations related to the use of the re module have been eliminated.
Several users reported problems with regular expressions exceeding
more than 100 named groups. To solve this, lex.py is now capable
of automatically splitting its master regular regular expression into
smaller expressions as needed. This should, in theory, make it
possible to specify an arbitrarily large number of tokens.
09/21/06: beazley
Improved error checking in lex.py. Rules that match the empty string
are now rejected (otherwise they cause the lexer to enter an infinite
loop). An extra check for rules containing '' has also been added.
Since lex compiles regular expressions in verbose mode, '' is interpreted
as a regex comment, it is critical to use '\' instead.
09/18/06: beazley
Added a TOKEN decorator function to lex.py that can be used to
define token rules where the documentation string might be computed
in some way.
digit = r'([0-9])'
nondigit = r'([_A-Za-z])'
identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'
from ply.lex import TOKEN
TOKEN(identifier)
def t_ID(t):
Do whatever
The TOKEN decorator merely sets the documentation string of the
associated token function as needed for lex to work.
Note: An alternative solution is the following:
def t_ID(t):
Do whatever
t_ID.__doc__ = identifier
Note: Decorators require the use of Python 2.4 or later. If compatibility
with old versions is needed, use the latter solution.
The need for this feature was suggested by Cem Karan.
09/14/06: beazley
Support for single-character literal tokens has been added to yacc.
These literals must be enclosed in quotes. For example:
def p_expr(p):
"expr : expr '+' expr"
...
def p_expr(p):
'expr : expr "-" expr'
...
In addition to this, it is necessary to tell the lexer module about
literal characters. This is done by defining the variable 'literals'
as a list of characters. This should be defined in the module that
invokes the lex.lex() function. For example:
literals = ['+','-','*','/','(',')','=']
or simply
literals = '+=*/()='
It is important to note that literals can only be a single character.
When the lexer fails to match a token using its normal regular expression
rules, it will check the current character against the literal list.
If found, it will be returned with a token type set to match the literal
character. Otherwise, an illegal character will be signalled.
09/14/06: beazley
Modified PLY to install itself as a proper Python package called 'ply'.
This will make it a little more friendly to other modules. This
changes the usage of PLY only slightly. Just do this to import the
modules
import ply.lex as lex
import ply.yacc as yacc
Alternatively, you can do this:
from ply import *
Which imports both the lex and yacc modules.
Change suggested by Lee June.
09/13/06: beazley
Changed the handling of negative indices when used in production rules.
A negative production index now accesses already parsed symbols on the
parsing stack. For example,
def p_foo(p):
"foo: A B C D"
print p[1] Value of 'A' symbol
print p[2] Value of 'B' symbol
print p[-1] Value of whatever symbol appears before A
on the parsing stack.
p[0] = some_val Sets the value of the 'foo' grammer symbol
This behavior makes it easier to work with embedded actions within the
parsing rules. For example, in C-yacc, it is possible to write code like
this:
bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; }
In this example, the printf() code executes immediately after A has been
parsed. Within the embedded action code, $1 refers to the A symbol on
the stack.
To perform this equivalent action in PLY, you need to write a pair
of rules like this:
def p_bar(p):
"bar : A seen_A B"
do_stuff
def p_seen_A(p):
"seen_A :"
print "seen an A =", p[-1]
The second rule "seen_A" is merely a empty production which should be
reduced as soon as A is parsed in the "bar" rule above. The use
of the negative index p[-1] is used to access whatever symbol appeared
before the seen_A symbol.
This feature also makes it possible to support inherited attributes.
For example:
def p_decl(p):
"decl : scope name"
def p_scope(p):
"""scope : GLOBAL
| LOCAL"""
p[0] = p[1]
def p_name(p):
"name : ID"
if p[-1] == "GLOBAL":
...
else if p[-1] == "LOCAL":
...
In this case, the name rule is inheriting an attribute from the
scope declaration that precedes it.
*** POTENTIAL INCOMPATIBILITY ***
If you are currently using negative indices within existing grammar rules,
your code will break. This should be extremely rare if non-existent in
most cases. The argument to various grammar rules is not usually not
processed in the same way as a list of items.