* Tesseract:
+ Don't insist that language codes are always 3-letter long.
(Tesseract 4 provides training data files for whole scripts, such as
“Latin”, with names that don't follow that pattern.)
https://github.com/jwilk/ocrodjvu/issues/29
Thanks to vltavskachobotnice for the bug report.
+ Make it possible to pass arbitrary options to Tesseract.
For example, to force legacy OCR engine, use:
-X extra_args='--oem 0'
https://github.com/jwilk/ocrodjvu/issues/30
Thanks to Janusz S. Bień for the bug report.
+ Speed up extraction of character-level details for Tesseract ≥ 3.04.
* Limit the number of OMP threads (used by Tesseract), so that the overall
number of threads doesn't exceed the number specified by -j.
Thanks to Alexey Shipunov for the bug report.
* Require subprocess32, even when no parallelism were requested by user.
This could fix rare deadlocks.
* Reduce memory consumption by keeping OCR results in memory only as long as
necessary.
* Stop honoring the “tessdata” environment variable.
The variable has been deprecated since ocrodjvu 0.2.1.
Use --language instead.
* Suggest using -e/--engine if (implicitly selected) default OCR engine
was not found.
https://github.com/jwilk/ocrodjvu/issues/20
Thanks to libBletchley for the bug report.
* Improve error handling.
+ Fix error handling when PyICU is missing.
Regression introduced in 0.10.2.
+ Check for existence of djvused early.
* Improve documentation:
+ Use HTTPS for lxml homepage URL.
+ Update hOCR specification URL.
+ Update GOCR homepage URL.
+ Improve ocrodjvu manual page:
+ Describe -j in more detail.
+ Remove 0.1 compatibility notes.
These versions are long obsolete.
+ Fix typos.
+ Make the dependency list less verbose.
+ Mention that argparse is only needed for Python 2.6.
+ Document lxml version requirement.
* Improve the test suite.
-- Jakub Wilk <jwilkjwilk.net> Mon, 11 Feb 2019 18:43:02 +0100