================
Features added
--------------
* Passing the ``unicode`` type as ``encoding`` to ``tostring()`` will
serialise to unicode. The ``tounicode()`` function is now
deprecated.
* ``XMLSchema()`` and ``RelaxNG()`` can parse from StringIO.
* ``makeparser()`` function in ``lxml.objectify`` to create a new
parser with the usual objectify setup.
* Plain ASCII XPath string results are no longer forced into unicode
objects as in 2.0beta1, but are returned as plain strings as before.
* All XPath string results are 'smart' objects that have a
``getparent()`` method to retrieve their parent Element.
* ``with_tail`` option in serialiser functions.
* More accurate exception messages in validator creation.
* Parse-time XML schema validation (``schema`` parser keyword).
* XPath string results of the ``text()`` function and attribute
selection make their Element container accessible through a
``getparent()`` method. As a side-effect, they are now always
unicode objects (even ASCII strings).
* ``XSLT`` objects are usable in any thread - at the cost of a deep
copy if they were not created in that thread.
* Invalid entity names and character references will be rejected by
the ``Entity()`` factory.
* ``entity.text`` returns the textual representation of the entity,
e.g. ``&``.
* New properties ``position`` and ``code`` on ParseError exception (as
in ET 1.3)
* Rich comparison of ``element.attrib`` proxies.
* ElementTree compatible TreeBuilder class.
* Use default prefixes for some common XML namespaces.
* ``lxml.html.clean.Cleaner`` now allows for a ``host_whitelist``, and
two overridable methods: ``allow_embedded_url(el, url)`` and the
more general ``allow_element(el)``.
* Extended slicing of Elements as in ``element[1:-1:2]``, both in
etree and in objectify
* Resolvers can now provide a ``base_url`` keyword argument when
resolving a document as string data.
* When using ``lxml.doctestcompare`` you can give the doctest option
``NOPARSE_MARKUP`` (like `` doctest: +NOPARSE_MARKUP``) to suppress
the special checking for one test.
* Separate ``feed_error_log`` property for the feed parser interface.
The normal parser interface and ``iterparse`` continue to use
``error_log``.
* The normal parsers and the feed parser interface are now separated
and can be used concurrently on the same parser instance.
* ``fromstringlist()`` and ``tostringlist()`` functions as in
ElementTree 1.3
* ``iterparse()`` accepts an ``html`` boolean keyword argument for
parsing with the HTML parser (note that this interface may be
subject to change)
* Parsers accept an ``encoding`` keyword argument that overrides the encoding
of the parsed documents.
* New C-API function ``hasChild()`` to test for children
* ``annotate()`` function in objectify can annotate with Python types and XSI
types in one step. Accompanied by ``xsiannotate()`` and ``pyannotate()``.
* ``ET.write()``, ``tostring()`` and ``tounicode()`` now accept a keyword
argument ``method`` that can be one of 'xml' (or None), 'html' or 'text' to
serialise as XML, HTML or plain text content.
* ``iterfind()`` method on Elements returns an iterator equivalent to
``findall()``
* ``itertext()`` method on Elements
* Setting a QName object as value of the .text property or as an attribute
will resolve its prefix in the respective context
* ElementTree-like parser target interface as described in
http://effbot.org/elementtree/elementtree-xmlparser.htm
* ElementTree-like feed parser interface on XMLParser and HTMLParser
(``feed()`` and ``close()`` methods)
* Reimplemented ``objectify.E`` for better performance and improved
integration with objectify. Provides extended type support based on
registered PyTypes.
* XSLT objects now support deep copying
* New ``makeSubElement()`` C-API function that allows creating a new
subelement straight with text, tail and attributes.
* XPath extension functions can now access the current context node
(``context.context_node``) and use a context dictionary
(``context.eval_context``) from the context provided in their first
parameter
* HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup``
* New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified
doctests based on XML/HTML output. Use by importing ``lxml.usedoctest`` or
``lxml.html.usedoctest`` from within a doctest.
* New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS
selectors.
* New package ``lxml.html`` written by Ian Bicking for advanced HTML
treatment.
* Namespace class setup is now local to the ``ElementNamespaceClassLookup``
instance and no longer global.
* Schematron validation (incomplete in libxml2)
* Additional ``stringify`` argument to ``objectify.PyType()`` takes a
conversion function to strings to support setting text values from arbitrary
types.
* Entity support through an ``Entity`` factory and element classes. XML
parsers now have a ``resolve_entities`` keyword argument that can be set to
False to keep entities in the document.
* ``column`` field on error log entries to accompany the ``line`` field
* Error specific messages in XPath parsing and evaluation
NOTE: for evaluation errors, you will now get an XPathEvalError instead of
an XPathSyntaxError. To catch both, you can except on ``XPathError``
* The regular expression functions in XPath now support passing a node-set
instead of a string
* Extended type annotation in objectify: new ``xsiannotate()`` function
* EXSLT RegExp support in standard XPath (not only XSLT)
Bugs fixed
----------
* Missing import in ``lxml.html.clean``.
* Some Python 2.4-isms prevented lxml from building/running under
Python 2.3.
* XPath on ElementTrees could crash when selecting the virtual root
node of the ElementTree.
* Compilation ``--without-threading`` was buggy in alpha5/6.
* Memory leak in the ``parse()`` function.
* Minor bugs in XSLT error message formatting.
* Result document memory leak in target parser.
* Target parser failed to report comments.
* In the ``lxml.html`` ``iter_links`` method, links in ``<object>``
tags weren't recognized. (Note: plugin-specific link parameters
still aren't recognized.) Also, the ``<embed>`` tag, though not
standard, is now included in ``lxml.html.defs.special_inline_tags``.
* Using custom resolvers on XSLT stylesheets parsed from a string
could request ill-formed URLs.
* With ``lxml.doctestcompare`` if you do ``<tag xmlns="...">`` in your
output, it will then be namespace-neutral (before the ellipsis was
treated as a real namespace).
* AttributeError in feed parser on parse errors
* XML feed parser setup problem
* Type annotation for unicode strings in ``DataElement()``
* lxml failed to serialise namespace declarations of elements other than the
root node of a tree
* Race condition in XSLT where the resolver context leaked between concurrent
XSLT calls
* lxml.etree did not check tag/attribute names
* The XML parser did not report undefined entities as error
* The text in exceptions raised by XML parsers, validators and XPath
evaluators now reports the first error that occurred instead of the last
* Passing '' as XPath namespace prefix did not raise an error
* Thread safety in XPath evaluators
Other changes
-------------
* Exceptions carry only the part of the error log that is related to
the operation that caused the error.
* ``XMLSchema()`` and ``RelaxNG()`` now enforce passing the source
file/filename through the ``file`` keyword argument.
* The test suite now skips most doctests under Python 2.3.
* ``make clean`` no longer removes the .c files (use ``make
realclean`` instead)
* Minor performance tweaks for Element instantiation and subelement
creation
* Various places in the XPath, XSLT and iteration APIs now require
keyword-only arguments.
* The argument order in ``element.itersiblings()`` was changed to
match the order used in all other iteration methods. The second
argument ('preceding') is now a keyword-only argument.
* The ``getiterator()`` method on Elements and ElementTrees was
reverted to return an iterator as it did in lxml 1.x. The ET API
specification allows it to return either a sequence or an iterator,
and it traditionally returned a sequence in ET and an iterator in
lxml. However, it is now deprecated in favour of the ``iter()``
method, which should be used in new code wherever possible.
* The 'pretty printed' serialisation of ElementTree objects now
inserts newlines at the root level between processing instructions,
comments and the root tag.
* A 'pretty printed' serialisation is now terminated with a newline.
* Second argument to ``lxml.etree.Extension()`` helper is no longer
required, third argument is now a keyword-only argument ``ns``.
* ``lxml.html.tostring`` takes an ``encoding`` argument.
* The module source files were renamed to "lxml.*.pyx", such as
"lxml.etree.pyx". This was changed for consistency with the way
Pyrex commonly handles package imports. The main effect is that
classes now know about their fully qualified class name, including
the package name of their module.
* Keyword-only arguments in some API functions, especially in the
parsers and serialisers.
* Tag name validation in lxml.etree (and lxml.html) now distinguishes
between HTML tags and XML tags based on the parser that was used to
parse or create them. HTML tags no longer reject any non-ASCII
characters in tag names but only spaces and the special characters
``<>&/"'``.
* lxml.etree now emits a warning if you use XPath with libxml2 2.6.27
(which can crash on certain XPath errors)
* Type annotation in objectify now preserves the already annotated type by
default to prevent losing type information that is already there.
* ``element.getiterator()`` returns a list, use ``element.iter()`` to retrieve
an iterator (ElementTree 1.3 compatible behaviour)
* objectify.PyType for None is now called "NoneType"
* ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 -
original name is still available as alias
* In the public C-API, ``findOrBuildNodeNs()`` was replaced by the more
generic ``findOrBuildNodeNsPrefix``
* Major refactoring in XPath/XSLT extension function code
* Network access in parsers disabled by default