-----------------------------
- API ENHANCEMENT: `Optional(expr)` may now be written as `expr | ""`
This will make this code:
"{" + Optional(Literal("A") | Literal("a")) + "}"
writable as:
"{" + (Literal("A") | Literal("a") | "") + "}"
Some related changes implemented as part of this work:
- `Literal("")` now internally generates an `Empty()` (and no longer raises an exception)
- `Empty` is now a subclass of `Literal`
Suggested by Antony Lee (issue 412), PR (413) by Devin J. Pohly.
- Added new class property `identifier` to all Unicode set classes in `pyparsing.unicode`,
using the class's values for `cls.identchars` and `cls.identbodychars`. Now Unicode-aware
parsers that formerly wrote:
ppu = pyparsing.unicode
ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
can now write:
ident = ppu.Greek.identifier
or
ident = ppu.Ελληνικά.identifier
- `ParseResults` now has a new method `deepcopy()`, in addition to the current
`copy()` method. `copy()` only makes a shallow copy - any contained `ParseResults`
are copied as references - changes in the copy will be seen as changes in the original.
In many cases, a shallow copy is sufficient, but some applications require a deep copy.
`deepcopy()` makes a deeper copy: any contained `ParseResults` or other mappings or
containers are built with copies from the original, and do not get changed if the
original is later changed. Addresses issue 463, reported by Bryn Pickering.
- Reworked `delimited_list` function into the new `DelimitedList` class.
`DelimitedList` has the same constructor interface as `delimited_list`, and
in this release, `delimited_list` changes from a function to a synonym for
`DelimitedList`. `delimited_list` and the older `delimitedList` method will be
deprecated in a future release, in favor of `DelimitedList`.
- Error messages from `MatchFirst` and `Or` expressions will try to give more details
if one of the alternatives matches better than the others, but still fails.
Question raised in Issue 464 by msdemlei, thanks!
- Added new class method `ParserElement.using_each`, to simplify code
that creates a sequence of `Literals`, `Keywords`, or other `ParserElement`
subclasses.
For instance, to define suppressible punctuation, you would previously
write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
You can now write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
`using_each` will also accept optional keyword args, which it will
pass through to the class initializer. Here is an expression for
single-letter variable names that might be used in an algebraic
expression:
algebra_var = MatchFirst(
Char.using_each(string.ascii_lowercase, as_keyword=True)
)
- Added new builtin `python_quoted_string`, which will match any form
of single-line or multiline quoted strings defined in Python. (Inspired
by discussion with Andreas Schörgenhumer in Issue 421.)
- Extended `expr[]` notation for repetition of `expr` to accept a
slice, where the slice's stop value indicates a `stop_on`
expression:
test = "BEGIN aaa bbb ccc END"
BEGIN, END = Keyword.using_each("BEGIN END".split())
body_word = Word(alphas)
expr = BEGIN + Group(body_word[...:END]) + END
equivalent to
expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END
print(expr.parse_string(test))
Prints:
['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
- `ParserElement.validate()` is deprecated. It predates the support for left-recursive
parsers, and was prone to false positives (warning that a grammar was invalid when
it was in fact valid). It will be removed in a future pyparsing release. In its
place, developers should use debugging and analytical tools, such as `ParserElement.set_debug()`
and `ParserElement.create_diagram()`.
(Raised in Issue 444, thanks Andrea Micheli!)
- Added bool `embed` argument to `ParserElement.create_diagram()`.
When passed as True, the resulting diagram will omit the `<DOCTYPE>`,
`<HEAD>`, and `<BODY>` tags so that it can be embedded in other
HTML source. (Useful when embedding a call to `create_diagram()` in
a PyScript HTML page.)
- Added `recurse` argument to `ParserElement.set_debug` to set the
debug flag on an expression and all of its sub-expressions. Requested
by multimeric in Issue 399.
- Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.
- Fixed bug in `Word` when `max=2`. Also added performance enhancement
when specifying `exact` argument. Reported in issue 409 by
panda-34, nice catch!
- `Word` arguments are now validated if `min` and `max` are both
given, that `min` <= `max`; raises `ValueError` if values are invalid.
- Fixed bug in srange, when parsing escaped '/' and '\' inside a
range set.
- Fixed exception messages for some `ParserElements` with custom names,
which instead showed their contained expression names.
- Fixed bug in pyparsing.common.url, when input URL is not alone
on an input line. Fixes Issue 459, reported by David Kennedy.
- Multiple added and corrected type annotations. With much help from
Stephen Rosen, thanks!
- Some documentation and error message clarifications on pyparsing's
keyword logic, cited by Basil Peace.
- General docstring cleanup for Sphinx doc generation, PRs submitted
by Devin J. Pohly. A dirty job, but someone has to do it - much
appreciated!
- `invRegex.py` example renamed to `inv_regex.py` and updated to PEP-8
variable and method naming. PR submitted by Ross J. Duff, thanks!
- Removed examples `sparser.py` and `pymicko.py`, since each included its
own GPL license in the header. Since this conflicts with pyparsing's
MIT license, they were removed from the distribution to avoid
confusion among those making use of them in their own projects.