Borb

Latest version: v2.1.25

Safety actively analyzes 707009 Python packages for vulnerabilities to keep your Python projects secure.

Page 10 of 12

1.9.0

This release features quite a few new functionalities:
- OCR
- Pantone colors
- Markdown to PDF conversion

It also features some minor improvements to general layout logic:
- `Tables` are now automatically completed (with empty `Paragraph` objects)
- support for heterogeneous paragraphs (see `ChunksOfText` object)
- layout package refactor to separate classes

OCR

Using `Tesseract` (or rather `pytesseract`), `pText` is now able to handle scanned images in a PDF.
Typically, a scanned document will present itself as a PDF, without containing any content other than the image of the page.
`pText` can now restore text to such PDF documents.

The OCR capabilities have been integrated nicely with the existing `EventListener` framework. New events have been added to represent scanned text being recognized.
Two extra implementations of `EventListener` deal with OCR:

- `OCRImageRenderEventListener` : is triggered whenever an image is detected in the PDF, scans the image, and produces `OCREvent` objects
- `OCRAsOptionalContentGroup` : extends `OCRImageRenderEventListener` and adds optional (invisible) content to the PDF, representing the recognized text

`pytesseract` is not added as a dependency in the setup script.
If you do choose to use OCR, you should install `pytesseract` and download the `Tesseract` data directories.

Pantone colors

Pantone colors are now supported, similar to `X11Color`, `Pantone` has a dictionary of names, mapped to hexadecimal strings.
When constructing a `Pantone` object, simply pass a valid color-name, and you'll receive its corresponding `HexColor` object.

Markdown to PDF

`pText` can now convert (simple) Markdown to PDF.
It does not (yet) support HTML, since that would require an entire HTML engine.

`pText` supports:
- Headers
- Tables
- Ordered lists (not nested)
- Unordered lists (not nested)
- Code snippet (by indent, and fenced)
- Blockquote
- Images
- Paragraphs
- Horizontal rules

Check the examples and tests to get a better idea of what is supported, and find a demo-document and its matching output.

1.8.9

This release features a few non-essential updates to the pText codebase that are mostly related to testing.
This includes:
- All tests have been refactored to follow the same format, with a small table atop the resulting `PDF` describing the test, when the test was run, etc
- All tests (attempt to) follow the same color-scheme (making them look more professional and consistent)
- Tests against the entire corpus have been limited to the essentials, with extensive reporting

:arrow_up: Performance Boost

There are a few minor tweaks that have boosted the performance of `pText` as a whole.
This includes the copy-behaviour of `Font` objects in the `CanvasGraphicsState`. This has caused a speed-up of nearly 33%.

:page_facing_up: Fonts

I have also implemented some minor fixes to the whole `Font` logic, ensuring font-sizes are now handled properly,
regardless of whether they are passed as an argument to the `Tf` operator or via the text-matrix in the `CanvasGraphicsState`.

I have also started implementing OCR. But more on that in a future release.

:lock: Redaction

Finally, this release includes everything needed to perform redaction.
This is the process of:
- marking content to be removed (but not removing it, enabling review by a third party)
- removing content that has been marked

This functionality integrates nicely in the existing `pText` framework of `Page` annotations.
Check the examples for more details (look for "adding redaction annotations to a PDF")

1.8.8

Major overhaul of all font-related functionality.
This release features
- a speedup in copying fonts (often performed when processing pages),
- as well as the ability to use custom (ttf) fonts

Furthermore, all public methods have been documented.

1.8.7

This release features:
- more documentation
- ´setup.py´ and ´requirements.txt´ have changed to ensure pText can easily be installed
- support for embedded files in PDF

1.8.6

This is a documentation release.
The documentation percentage is now 90%

1.8.3

- Bugfix release
- Improvements to layout algorithm
- Improvements to IO (enabling a Document to be saved multiple times)

Page 10 of 12

Releases

Has known vulnerabilities

Previous Next

Borb

Page 10 of 12

1.9.0

1.8.9

1.8.8

1.8.7

1.8.6

1.8.3

Page 10 of 12

Links

Releases