Pyparsing

Latest version: v3.1.2

Safety actively analyzes 639267 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 17

2.4.1.1

-------------------------------
This is a re-release of version 2.4.1 to restore the release history
in PyPI, since the 2.4.1 release was deleted.

There are 3 known issues in this release, which are fixed in

2.4.1

--------------------------
- NOTE: Deprecated functions and features that will be dropped
in pyparsing 2.5.0 (planned next release):

. support for Python 2 - ongoing users running with
Python 2 can continue to use pyparsing 2.4.1

. ParseResults.asXML() - if used for debugging, switch
to using ParseResults.dump(); if used for data transfer,
use ParseResults.asDict() to convert to a nested Python
dict, which can then be converted to XML or JSON or
other transfer format

. operatorPrecedence synonym for infixNotation -
convert to calling infixNotation

. commaSeparatedList - convert to using
pyparsing_common.comma_separated_list

. upcaseTokens and downcaseTokens - convert to using
pyparsing_common.upcaseTokens and downcaseTokens

. __compat__.collect_all_And_tokens will not be settable to
False to revert to pre-2.3.1 results name behavior -
review use of names for MatchFirst and Or expressions
containing And expressions, as they will return the
complete list of parsed tokens, not just the first one.
Use __diag__.warn_multiple_tokens_in_named_alternation
(described below) to help identify those expressions
in your parsers that will have changed as a result.

- A new shorthand notation has been added for repetition
expressions: expr[min, max], with '...' valid as a min
or max value:
- expr[...] is equivalent to OneOrMore(expr)
- expr[0, ...] is equivalent to ZeroOrMore(expr)
- expr[1, ...] is equivalent to OneOrMore(expr)
- expr[n, ...] or expr[n,] is equivalent
to expr*n + ZeroOrMore(expr)
(read as "n or more instances of expr")
- expr[..., n] is equivalent to expr*(0, n)
- expr[m, n] is equivalent to expr*(m, n)
Note that expr[..., n] and expr[m, n] do not raise an exception
if more than n exprs exist in the input stream. If this
behavior is desired, then write expr[..., n] + ~expr.

- '...' can also be used as short hand for SkipTo when used
in adding parse expressions to compose an And expression.

Literal('start') + ... + Literal('end')
And(['start', ..., 'end'])

are both equivalent to:

Literal('start') + SkipTo('end')("_skipped*") + Literal('end')

The '...' form has the added benefit of not requiring repeating
the skip target expression. Note that the skipped text is
returned with '_skipped' as a results name, and that the contents of
`_skipped` will contain a list of text from all `...`s in the expression.

- '...' can also be used as a "skip forward in case of error" expression:

expr = "start" + (Word(nums).setName("int") | ...) + "end"

expr.parseString("start 456 end")
['start', '456', 'end']

expr.parseString("start 456 foo 789 end")
['start', '456', 'foo 789 ', 'end']
- _skipped: ['foo 789 ']

expr.parseString("start foo end")
['start', 'foo ', 'end']
- _skipped: ['foo ']

expr.parseString("start end")
['start', '', 'end']
- _skipped: ['missing <int>']

Note that in all the error cases, the '_skipped' results name is
present, showing a list of the extra or missing items.

This form is only valid when used with the '|' operator.

- Improved exception messages to show what was actually found, not
just what was expected.

word = pp.Word(pp.alphas)
pp.OneOrMore(word).parseString("aaa bbb 123", parseAll=True)

Former exception message:

pyparsing.ParseException: Expected end of text (at char 8), (line:1, col:9)

New exception message:

pyparsing.ParseException: Expected end of text, found '1' (at char 8), (line:1, col:9)

- Added diagnostic switches to help detect and warn about common
parser construction mistakes, or enable additional parse
debugging. Switches are attached to the pyparsing.__diag__
namespace object:
- warn_multiple_tokens_in_named_alternation - flag to enable warnings when a results
name is defined on a MatchFirst or Or expression with one or more And subexpressions
(default=True)
- warn_ungrouped_named_tokens_in_collection - flag to enable warnings when a results
name is defined on a containing expression with ungrouped subexpressions that also
have results names (default=True)
- warn_name_set_on_empty_Forward - flag to enable warnings when a Forward is defined
with a results name, but has no contents defined (default=False)
- warn_on_multiple_string_args_to_oneof - flag to enable warnings when oneOf is
incorrectly called with multiple str arguments (default=True)
- enable_debug_on_named_expressions - flag to auto-enable debug on all subsequent
calls to ParserElement.setName() (default=False)

warn_multiple_tokens_in_named_alternation is intended to help
those who currently have set __compat__.collect_all_And_tokens to
False as a workaround for using the pre-2.3.1 code with named
MatchFirst or Or expressions containing an And expression.

- Added ParseResults.from_dict classmethod, to simplify creation
of a ParseResults with results names using a dict, which may be nested.
This makes it easy to add a sub-level of named items to the parsed
tokens in a parse action.

- Added asKeyword argument (default=False) to oneOf, to force
keyword-style matching on the generated expressions.

- ParserElement.runTests now accepts an optional 'file' argument to
redirect test output to a file-like object (such as a StringIO,
or opened file). Default is to write to sys.stdout.

- conditionAsParseAction is a helper method for constructing a
parse action method from a predicate function that simply
returns a boolean result. Useful for those places where a
predicate cannot be added using addCondition, but must be
converted to a parse action (such as in infixNotation). May be
used as a decorator if default message and exception types
can be used. See ParserElement.addCondition for more details
about the expected signature and behavior for predicate condition
methods.

- While investigating issue 93, I found that Or and
addCondition could interact to select an alternative that
is not the longest match. This is because Or first checks
all alternatives for matches without running attached
parse actions or conditions, orders by longest match, and
then rechecks for matches with conditions and parse actions.
Some expressions, when checking with conditions, may end
up matching on a shorter token list than originally matched,
but would be selected because of its original priority.
This matching code has been expanded to do more extensive
searching for matches when a second-pass check matches a
smaller list than in the first pass.

- Fixed issue 87, a regression in indented block.
Reported by Renz Bagaporo, who submitted a very nice repro
example, which makes the bug-fixing process a lot easier,
thanks!

- Fixed MemoryError issue 85 and 91 with str generation for
Forwards. Thanks decalage2 and Harmon758 for your patience.

- Modified setParseAction to accept None as an argument,
indicating that all previously-defined parse actions for the
expression should be cleared.

- Modified pyparsing_common.real and sci_real to parse reals
without leading integer digits before the decimal point,
consistent with Python real number formats. Original PR 98
submitted by ansobolev.

- Modified runTests to call postParse function before dumping out
the parsed results - allows for postParse to add further results,
such as indications of additional validation success/failure.

- Updated statemachine example: refactored state transitions to use
overridden classmethods; added <statename>Mixin class to simplify
definition of application classes that "own" the state object and
delegate to it to model state-specific properties and behavior.

- Added example nested_markup.py, showing a simple wiki markup with
nested markup directives, and illustrating the use of '...' for
skipping over input to match the next expression. (This example
uses syntax that is not valid under Python 2.)

- Rewrote delta_time.py example (renamed from deltaTime.py) to
fix some omitted formats and upgrade to latest pyparsing idioms,
beginning with writing an actual BNF.

- With the help and encouragement from several contributors, including
Matěj Cepl and Cengiz Kaygusuz, I've started cleaning up the internal
coding styles in core pyparsing, bringing it up to modern coding
practices from pyparsing's early development days dating back to
2003. Whitespace has been largely standardized along PEP8 guidelines,
removing extra spaces around parentheses, and adding them around
arithmetic operators and after colons and commas. I was going to hold
off on doing this work until after 2.4.1, but after cleaning up a
few trial classes, the difference was so significant that I continued
on to the rest of the core code base. This should facilitate future
work and submitted PRs, allowing them to focus on substantive code
changes, and not get sidetracked by whitespace issues.

2.4.0

---------------------------
- Well, it looks like the API change that was introduced in 2.3.1 was more
drastic than expected, so for a friendlier forward upgrade path, this
release:
. Bumps the current version number to 2.4.0, to reflect this
incompatible change.
. Adds a pyparsing.__compat__ object for specifying compatibility with
future breaking changes.
. Conditionalizes the API-breaking behavior, based on the value
pyparsing.__compat__.collect_all_And_tokens. By default, this value
will be set to True, reflecting the new bugfixed behavior. To set this
value to False, add to your code:

import pyparsing
pyparsing.__compat__.collect_all_And_tokens = False

. User code that is dependent on the pre-bugfix behavior can restore
it by setting this value to False.

In 2.5 and later versions, the conditional code will be removed and
setting the flag to True or False in these later versions will have no
effect.

- Updated unitTests.py and simple_unit_tests.py to be compatible with
"python setup.py test". To run tests using setup, do:

python setup.py test
python setup.py test -s unitTests.suite
python setup.py test -s simple_unit_tests.suite

Prompted by issue 83 and PR submitted by bdragon28, thanks.

- Fixed bug in runTests handling '\n' literals in quoted strings.

- Added tag_body attribute to the start tag expressions generated by
makeHTMLTags, so that you can avoid using SkipTo to roll your own
tag body expression:

a, aEnd = pp.makeHTMLTags('a')
link = a + a.tag_body("displayed_text") + aEnd
for t in s.searchString(html_page):
print(t.displayed_text, '->', t.startA.href)

- indentedBlock failure handling was improved; PR submitted by TMiguelT,
thanks!

- Address Py2 incompatibility in simpleUnitTests, plus explain() and
Forward str() cleanup; PRs graciously provided by eswald.

- Fixed docstring with embedded '\w', which creates SyntaxWarnings in
Py3.8, issue 80.

- Examples:

- Added example parser for rosettacode.org tutorial compiler.

- Added example to show how an HTML table can be parsed into a
collection of Python lists or dicts, one per row.

- Updated SimpleSQL.py example to handle nested selects, reworked
'where' expression to use infixNotation.

- Added include_preprocessor.py, similar to macroExpander.py.

- Examples using makeHTMLTags use new tag_body expression when
retrieving a tag's body text.

- Updated examples that are runnable as unit tests:

python setup.py test -s examples.antlr_grammar_tests
python setup.py test -s examples.test_bibparse

2.3.1

-----------------------------
- POSSIBLE API CHANGE: this release fixes a bug when results names were
attached to a MatchFirst or Or object containing an And object.
Previously, a results name on an And object within an enclosing MatchFirst
or Or could return just the first token in the And. Now, all the tokens
matched by the And are correctly returned. This may result in subtle
changes in the tokens returned if you have this condition in your pyparsing
scripts.

- New staticmethod ParseException.explain() to help diagnose parse exceptions
by showing the failing input line and the trace of ParserElements in
the parser leading up to the exception. explain() returns a multiline
string listing each element by name. (This is still an experimental
method, and the method signature and format of the returned string may
evolve over the next few releases.)

Example:
define a parser to parse an integer followed by an
alphabetic word
expr = pp.Word(pp.nums).setName("int")
+ pp.Word(pp.alphas).setName("word")
try:
parse a string with a numeric second value instead of alpha
expr.parseString("123 355")
except pp.ParseException as pe:
print(pp.ParseException.explain(pe))

Prints:
123 355
^
ParseException: Expected word (at char 4), (line:1, col:5)
__main__.ExplainExceptionTest
pyparsing.And - {int word}
pyparsing.Word - word

explain() will accept any exception type and will list the function
names and parse expressions in the stack trace. This is especially
useful when an exception is raised in a parse action.

Note: explain() is only supported under Python 3.

- Fix bug in dictOf which could match an empty sequence, making it
infinitely loop if wrapped in a OneOrMore.

- Added unicode sets to pyparsing_unicode for Latin-A and Latin-B ranges.

- Added ability to define custom unicode sets as combinations of other sets
using multiple inheritance.

class Turkish_set(pp.pyparsing_unicode.Latin1, pp.pyparsing_unicode.LatinA):
pass

turkish_word = pp.Word(Turkish_set.alphas)

- Updated state machine import examples, with state machine demos for:
. traffic light
. library book checkin/checkout
. document review/approval

In the traffic light example, you can use the custom 'statemachine' keyword
to define the states for a traffic light, and have the state classes
auto-generated for you:

statemachine TrafficLightState:
Red -> Green
Green -> Yellow
Yellow -> Red

Similar for state machines with named transitions, like the library book
state example:

statemachine LibraryBookState:
New -(shelve)-> Available
Available -(reserve)-> OnHold
OnHold -(release)-> Available
Available -(checkout)-> CheckedOut
CheckedOut -(checkin)-> Available

Once the classes are defined, then additional Python code can reference those
classes to add class attributes, instance methods, etc.

See the examples in examples/statemachine

- Added an example parser for the decaf language. This language is used in
CS compiler classes in many colleges and universities.

- Fixup of docstrings to Sphinx format, inclusion of test files in the source
package, and convert markdown to rst throughout the distribution, great job
by Matěj Cepl!

- Expanded the whitespace characters recognized by the White class to include
all unicode defined spaces. Suggested in Issue 51 by rtkjbillo.

- Added optional postParse argument to ParserElement.runTests() to add a
custom callback to be called for test strings that parse successfully. Useful
for running tests that do additional validation or processing on the parsed
results. See updated chemicalFormulas.py example.

- Removed distutils fallback in setup.py. If installing the package fails,
please update to the latest version of setuptools. Plus overall project code
cleanup (CRLFs, whitespace, imports, etc.), thanks Jon Dufresne!

- Fix bug in CaselessKeyword, to make its behavior consistent with
Keyword(caseless=True). Fixes Issue 65 reported by telesphore.

2.3.0

-----------------------------
- NEW SUPPORT FOR UNICODE CHARACTER RANGES
This release introduces the pyparsing_unicode namespace class, defining
a series of language character sets to simplify the definition of alphas,
nums, alphanums, and printables in the following language sets:
. Arabic
. Chinese
. Cyrillic
. Devanagari
. Greek
. Hebrew
. Japanese (including Kanji, Katakana, and Hirigana subsets)
. Korean
. Latin1 (includes 7 and 8-bit Latin characters)
. Thai
. CJK (combination of Chinese, Japanese, and Korean sets)

For example, your code can define words using:

korean_word = Word(pyparsing_unicode.Korean.alphas)

See their use in the updated examples greetingInGreek.py and
greetingInKorean.py.

This namespace class also offers access to these sets using their
unicode identifiers.

- POSSIBLE API CHANGE: Fixed bug where a parse action that explicitly
returned the input ParseResults could add another nesting level in
the results if the current expression had a results name.

vals = pp.OneOrMore(pp.pyparsing_common.integer)("int_values")

def add_total(tokens):
tokens['total'] = sum(tokens)
return tokens this line can be removed

vals.addParseAction(add_total)
print(vals.parseString("244 23 13 2343").dump())

Before the fix, this code would print (note the extra nesting level):

[244, 23, 13, 2343]
- int_values: [244, 23, 13, 2343]
- int_values: [244, 23, 13, 2343]
- total: 2623
- total: 2623

With the fix, this code now prints:

[244, 23, 13, 2343]
- int_values: [244, 23, 13, 2343]
- total: 2623

This fix will change the structure of ParseResults returned if a
program defines a parse action that returns the tokens that were
sent in. This is not necessary, and statements like "return tokens"
in the example above can be safely deleted prior to upgrading to
this release, in order to avoid the bug and get the new behavior.

Reported by seron in Issue 22, nice catch!

- POSSIBLE API CHANGE: Fixed a related bug where a results name
erroneously created a second level of hierarchy in the returned
ParseResults. The intent for accumulating results names into ParseResults
is that, in the absence of Group'ing, all names get merged into a
common namespace. This allows us to write:

key_value_expr = (Word(alphas)("key") + '=' + Word(nums)("value"))
result = key_value_expr.parseString("a = 100")

and have result structured as {"key": "a", "value": "100"}
instead of [{"key": "a"}, {"value": "100"}].

However, if a named expression is used in a higher-level non-Group
expression that *also* has a name, a false sub-level would be created
in the namespace:

num = pp.Word(pp.nums)
num_pair = ("[" + (num("A") + num("B"))("values") + "]")
U = num_pair.parseString("[ 10 20 ]")
print(U.dump())

Since there is no grouping, "A", "B", and "values" should all appear
at the same level in the results, as:

['[', '10', '20', ']']
- A: '10'
- B: '20'
- values: ['10', '20']

Instead, an extra level of "A" and "B" show up under "values":

['[', '10', '20', ']']
- A: '10'
- B: '20'
- values: ['10', '20']
- A: '10'
- B: '20'

This bug has been fixed. Now, if this hierarchy is desired, then a
Group should be added:

num_pair = ("[" + pp.Group(num("A") + num("B"))("values") + "]")

Giving:

['[', ['10', '20'], ']']
- values: ['10', '20']
- A: '10'
- B: '20'

But in no case should "A" and "B" appear in multiple levels. This bug-fix
fixes that.

If you have current code which relies on this behavior, then add or remove
Groups as necessary to get your intended results structure.

Reported by Athanasios Anastasiou.

- IndexError's raised in parse actions will get explicitly reraised
as ParseExceptions that wrap the original IndexError. Since
IndexError sometimes occurs as part of pyparsing's normal parsing
logic, IndexErrors that are raised during a parse action may have
gotten silently reinterpreted as parsing errors. To retain the
information from the IndexError, these exceptions will now be
raised as ParseExceptions that reference the original IndexError.
This wrapping will only be visible when run under Python3, since it
emulates "raise ... from ..." syntax.

Addresses Issue 4, reported by guswns0528.

- Added Char class to simplify defining expressions of a single
character. (Char("abc") is equivalent to Word("abc", exact=1))

- Added class PrecededBy to perform lookbehind tests. PrecededBy is
used in the same way as FollowedBy, passing in an expression that
must occur just prior to the current parse location.

For fixed-length expressions like a Literal, Keyword, Char, or a
Word with an `exact` or `maxLen` length given, `PrecededBy(expr)`
is sufficient. For varying length expressions like a Word with no
given maximum length, `PrecededBy` must be constructed with an
integer `retreat` argument, as in
`PrecededBy(Word(alphas, nums), retreat=10)`, to specify the maximum
number of characters pyparsing must look backward to make a match.
pyparsing will check all the values from 1 up to retreat characters
back from the current parse location.

When stepping backwards through the input string, PrecededBy does
*not* skip over whitespace.

PrecededBy can be created with a results name so that, even though
it always returns an empty parse result, the result *can* include
named results.

Idea first suggested in Issue 30 by Freakwill.

- Updated FollowedBy to accept expressions that contain named results,
so that results names defined in the lookahead expression will be
returned, even though FollowedBy always returns an empty list.
Inspired by the same feature implemented in PrecededBy.

2.2.2

-------------------------------
- Fixed bug in SkipTo, if a SkipTo expression that was skipping to
an expression that returned a list (such as an And), and the
SkipTo was saved as a named result, the named result could be
saved as a ParseResults - should always be saved as a string.
Issue 28, reported by seron.

- Added simple_unit_tests.py, as a collection of easy-to-follow unit
tests for various classes and features of the pyparsing library.
Primary intent is more to be instructional than actually rigorous
testing. Complex tests can still be added in the unitTests.py file.

- New features added to the Regex class:
- optional asGroupList parameter, returns all the capture groups as
a list
- optional asMatch parameter, returns the raw re.match result
- new sub(repl) method, which adds a parse action calling
re.sub(pattern, repl, parsed_result). Simplifies creating
Regex expressions to be used with transformString. Like re.sub,
repl may be an ordinary string (similar to using pyparsing's
replaceWith), or may contain references to capture groups by group
number, or may be a callable that takes an re match group and
returns a string.

For instance:
expr = pp.Regex(r"([Hh]\d):\s*(.*)").sub(r"<\1>\2</\1>")
expr.transformString("h1: This is the title")

will return
<h1>This is the title</h1>

- Fixed omission of LICENSE file in source tarball, also added
CODE_OF_CONDUCT.md per GitHub community standards.

Page 6 of 17

Releases

Has known vulnerabilities

Previous Next

Pyparsing

Page 6 of 17

2.4.1.1

2.4.1

2.4.0

2.3.1

2.3.0

2.2.2

Page 6 of 17

Links

Releases