===========================
This release adds support for processing things in parallel, both by
using multiple processes on a single machine, and also by running
"build clients" on any number of machines, which run jobs managed by a
central queue.
Parsing of PDF files has been improved by the :py:class:`.PDFReader`
and :py:class:`.PDFAnalyzer` (new) classes. See :ref:`pdfreader`.
In addition, a lot of the included repositorys have been
overhauled. The general repos :py:class:`.MediaWiki` and
:py:class:`~ferenda.sources.general.Keyword` should be usable for most
projects by creating a subclass and configuring it.
Backwards-incompatible changes:
-------------------------------
* DocumentRepository and all derived classes now takes an optional
first config argument. If present, this should be a LayeredConfig
object that contains the repo configuration. If not provided, a
blank LayeredConfig object is created. All other optional keyword
arguments are then added to the config object. If you have
overridden __init__ for your docrepo, you'll need to make sure to
handle this first argument.
* The Newscriteria class has been removed, and
DocumentRepository.news_criteria with it. The Facet framework is now
used to define news feeds (as well as TOC pages, the ReST API and
fulltext indexing)
* The PDFReader constructor now takes, as first argument, a list of
pdfreader.Page objects. Normally, a client won't have these but must
instead provide a filename of a PDF file through the filename
argument (which used to be the first argument, but must now be
specified as a named argument).
* the getfont() method of pdfreader.Textbox objects used to return a
straight dict of strings, but has now been replaced with a font
property that is now a LayeredConfig object with proper typing. Code
like "int(textbox.getfont()['size'])" should now be written like
"textbox.font.size".
New features:
-------------
* The default serialization of Element objects to XHTML now inserts
appropriate dcterms:isPartOf statements when one element with a URI
is contained within another element with another URI. Custom element
classes can change this by changing the partrelation property of the
included document.
* Serialization of Element documents to XHTML now omits namespaces
defined in self.namespaces, but which never actually occur in the
data.
* CitationParser.parse_string and .parse_recursive now has an optional
predicate argument that determines the RDF predicate between the
refering and the referred resources (by default, this is
dcterms:references)
* manager (and by extension ./ferenda-build.py) has new commands that
allows processing jobs in parallell (see Advanced > Parallel
processing)
* The ferenda.sources.general.wiki can now transform mediawiki markup
to Element objects.
* The ferenda.sources.general.keyword can be used to build keyboard
hubs from all concepts that your documents point to through a
dcterms:subject property (as well as things in a wiki docrepo, and
configurable other sources).
* The ferenda.sources.legal.se docrepos have been updated generally
and are now close to being able to replicate the function set of
https://lagen.nu/ (which was the main motivation with this codebase
all along).
* ferenda.testutil.assertEqualXML now has a tidy_xhtml argument which
runs the XML documents to be compared through HTML tidy (in XML
mode) in order to produce easier-to-read diffs.
* Transformer now outputs the equivalent xsltproc command if the
environment variable FERENDA_TRANSFORMDEBUG is set.
* The relate() action now uses dependency management to avoid costly
re-indexing if no changes have been made to a document.
* TOC and newsfeed generation now uses dependency management to avoid
re-generating if no changes in the underlying data has occurred.
* Documentation in general has been improved (readers, testing).
Infrastructural changes:
------------------------
* Ferenda now uses the CI service Appveyor to automatically run the
entire test suite under Windows on every commit.
* LayeredConfig is now a separate package and not included with
Ferenda. It has been generalized and can take any number of
configuration sources (in the form of object instances) as
initialization arguments. Classes that provide configuration sources
from code defaults, INI files, command line arguments, environment
variables and more are included. It also has two new class methods,
.set and .get.