Ocrodjvu

Latest version: v0.13.2

Safety actively analyzes 682334 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 9

0.13.1

[ FriedrichFroebel ]
* Avoid deprecation warnings with the latest `lxml` versions.

-- FriedrichFroebel <> Wed, 15 May 2024 16:42:10 +0200

0.13

[ FriedrichFroebel ]
* Port to Python 3. At least Python 3.6 is required now.
* Migrate tests from `nose` to plain `unittest` stdlib module.
* Conform to PEP8 coding style.
* Use standardized `setup.py`-based approach.
* Clean-up directory structure to not use generic `lib`.
* Drop support for outdated ocropus/ocropy engine.

-- FriedrichFroebel <> Sun, 15 Jan 2023 17:13:54 +0100

0.12.1

*

-- Jakub Wilk <jwilkjwilk.net> Sat, 29 May 2021 14:16:01 +0200

0.12

[ Jakub Wilk ]
* Remake the build system.
The new build system if based on makefiles.
* Add “-j auto” to start as many OCR threads as the number of available
CPUs.
* Disallow -j without arguments. (It did the same thing as “-j auto”.)
* Disallow “-j 0” or “-j N” for negative N.
* Redirect unused file descriptors to /dev/null when executing OCR engines.
* Tesseract: be more eager at filtering out uninteresting messages on
stderr.
* Improve documentation.
+ Fix punctuation in the OCR engine list.
+ Remove .txt extensions.
* Improve help message:
+ Include the most important options in the “usage: …” part.
+ Update the description for -j.
+ Fix typo.
* Improve error handling.
* Improve the test suite.

[ Pieter-Tjerk de Boer ]
* Fix counting pages when file identifier cannot be converted to locale
encoding.

-- Jakub Wilk <jwilkjwilk.net> Sat, 08 Aug 2020 17:26:59 +0200

0.11

* Tesseract:
+ Don't insist that language codes are always 3-letter long.
(Tesseract 4 provides training data files for whole scripts, such as
“Latin”, with names that don't follow that pattern.)
https://github.com/jwilk/ocrodjvu/issues/29
Thanks to vltavskachobotnice for the bug report.
+ Make it possible to pass arbitrary options to Tesseract.
For example, to force legacy OCR engine, use:
-X extra_args='--oem 0'
https://github.com/jwilk/ocrodjvu/issues/30
Thanks to Janusz S. Bień for the bug report.
+ Speed up extraction of character-level details for Tesseract ≥ 3.04.
* Limit the number of OMP threads (used by Tesseract), so that the overall
number of threads doesn't exceed the number specified by -j.
Thanks to Alexey Shipunov for the bug report.
* Require subprocess32, even when no parallelism were requested by user.
This could fix rare deadlocks.
* Reduce memory consumption by keeping OCR results in memory only as long as
necessary.
* Stop honoring the “tessdata” environment variable.
The variable has been deprecated since ocrodjvu 0.2.1.
Use --language instead.
* Suggest using -e/--engine if (implicitly selected) default OCR engine
was not found.
https://github.com/jwilk/ocrodjvu/issues/20
Thanks to libBletchley for the bug report.
* Improve error handling.
+ Fix error handling when PyICU is missing.
Regression introduced in 0.10.2.
+ Check for existence of djvused early.
* Improve documentation:
+ Use HTTPS for lxml homepage URL.
+ Update hOCR specification URL.
+ Update GOCR homepage URL.
+ Improve ocrodjvu manual page:
+ Describe -j in more detail.
+ Remove 0.1 compatibility notes.
These versions are long obsolete.
+ Fix typos.
+ Make the dependency list less verbose.
+ Mention that argparse is only needed for Python 2.6.
+ Document lxml version requirement.
* Improve the test suite.

-- Jakub Wilk <jwilkjwilk.net> Mon, 11 Feb 2019 18:43:02 +0100

0.10.4

* Fix handling input files with non-ASCII names.
Regression introduced in 0.9.1.
https://github.com/jwilk/ocrodjvu/issues/23
Thanks to derrikF for the bug report.
* Improve documentation:
+ Fix punctuation.
+ Clarify Python version requirements.
+ Update the credits file to make it clear that the project is no longer
being funded.

-- Jakub Wilk <jwilkjwilk.net> Thu, 12 Jul 2018 21:07:40 +0200

Page 1 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.