Ocrodjvu

Latest version: v0.13.2

Safety actively analyzes 682361 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 9

0.7.15

* Strip trailing whitespace from text zones bigger than words (lines,
paragraphs, …).
* Fix compatibility with Tesseract 3.02.
Thanks to Janusz S. Bień for the bug report.
* ocrodjvu:
+ Make it possible to pass multiple languages to Tesseract ≥ 3.02.
https://github.com/jwilk/ocrodjvu/issues/3
Thanks to Janusz S. Bień for the bug report.
+ Cuneiform: rename mixed Russian-English language code:
“rus-eng” → “rus+eng”. This is for consistency with Tesseract.
+ Tesseract: fix support for Chinese language pack.
+ Tesseract: make it possible to pass the -psm option in order to
customize layout analysis. For example, to enable OSD, use:
-X extra_args='-psm 1'
+ Make --list-languages output sorted.
+ Tesseract: remove “osd” from language list.
+ Accept both ISO 639-2/T and ISO 639-2/B language codes.
+ Add the --save-raw-ocr option.
+ Add the --raw-ocr-filename-template option.
+ Improve documentation of the --ocr-only option.
* Require Python ≥ 2.6.
* Fix compatibility with nose 1.2.
Thanks to Kyrill Detinov for the bug report.

-- Jakub Wilk <jwilkjwilk.net> Wed, 17 Apr 2013 00:59:23 +0200

0.7.14

* Document which versions of OCRopus are supported.
* Document that PyICU and html5lib are only required for some optional
features.
* Document what software is needed to rebuild the manual pages from source.
* djvu2hocr:
+ Add the --title option.
+ Add the --css option.
+ Document the -p/--pages option.

-- Jakub Wilk <jwilkjwilk.net> Fri, 15 Mar 2013 13:54:05 +0100

0.7.13

* Abort early if one tries to use an incompatible Python version.
* Improve the manual pages, as per man-pages(7) recommendations:
+ Remove the “AUTHOR” sections.
+ Rename the “REPORTING BUGS” sections as “BUGS”.
* Improve the test suite.
* Make “setup.py clean -a” remove compiled manual pages (unless they were
built by “setup.py sdist”).

-- Jakub Wilk <jwilkjwilk.net> Thu, 14 Feb 2013 23:28:56 +0100

0.7.12

* Don't let “-X fix-html=1” break HTML snippets ocrodjvu generates itself
for the “-t chars” Tesseract support.
Thanks to Janusz S. Bień for the test case.

-- Jakub Wilk <jwilkjwilk.net> Wed, 15 Aug 2012 19:32:57 +0200

0.7.11

* hocr2djvused:
+ Allow processing multiple hOCR documents at once.
https://github.com/jwilk/ocrodjvu/issues/1
Thanks to Thomas Koch for the bug report and the initial patch.
* Fix merging results of two Tesseract runs.
Thanks to Janusz S. Bień for the bug report.

-- Jakub Wilk <jwilkjwilk.net> Mon, 28 May 2012 19:43:22 +0200

0.7.10

* Improve error handling.
* ocrodjvu:
+ Attempt to fix encoding issues and eliminate unwanted control characters
in files produced by Tesseract and Cuneiform.
https://bugs.debian.org/671764
Thanks to Thomas Koch for the bug report.
* hocr2djvused:
+ Add the --fix-utf8 option.
* djvu2hocr:
+ Translate DjVu “region” to <div class="ocrx_block"> (instead of <span…>,
which was causing XHTML validity errors).
* Tests: fix compatibility with PIL ≥ 1.2.
* Include example scans2djvu+hocr script.
* Fix merging results of two Tesseract runs.
Thanks to Janusz S. Bień for the bug report.
* Use RFC 3339 date format in the manual page. Don't call external programs
to build it.

-- Jakub Wilk <jwilkjwilk.net> Sat, 12 May 2012 00:37:50 +0200

Page 4 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.