Borb

Latest version: v2.1.25

Safety actively analyzes 682449 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 10 of 12

1.9.0

This release features quite a few new functionalities:
- OCR
- Pantone colors
- Markdown to PDF conversion

It also features some minor improvements to general layout logic:
- `Tables` are now automatically completed (with empty `Paragraph` objects)
- support for heterogeneous paragraphs (see `ChunksOfText` object)
- layout package refactor to separate classes

OCR

Using `Tesseract` (or rather `pytesseract`), `pText` is now able to handle scanned images in a PDF.
Typically, a scanned document will present itself as a PDF, without containing any content other than the image of the page.
`pText` can now restore text to such PDF documents.

The OCR capabilities have been integrated nicely with the existing `EventListener` framework. New events have been added to represent scanned text being recognized.
Two extra implementations of `EventListener` deal with OCR:

- `OCRImageRenderEventListener` : is triggered whenever an image is detected in the PDF, scans the image, and produces `OCREvent` objects
- `OCRAsOptionalContentGroup` : extends `OCRImageRenderEventListener` and adds optional (invisible) content to the PDF, representing the recognized text

`pytesseract` is not added as a dependency in the setup script.
If you do choose to use OCR, you should install `pytesseract` and download the `Tesseract` data directories.

Pantone colors

Pantone colors are now supported, similar to `X11Color`, `Pantone` has a dictionary of names, mapped to hexadecimal strings.
When constructing a `Pantone` object, simply pass a valid color-name, and you'll receive its corresponding `HexColor` object.

Markdown to PDF

`pText` can now convert (simple) Markdown to PDF.
It does not (yet) support HTML, since that would require an entire HTML engine.

`pText` supports:
- Headers
- Tables
- Ordered lists (not nested)
- Unordered lists (not nested)
- Code snippet (by indent, and fenced)
- Blockquote
- Images
- Paragraphs
- Horizontal rules

Check the examples and tests to get a better idea of what is supported, and find a demo-document and its matching output.

1.8.9

This release features a few non-essential updates to the pText codebase that are mostly related to testing.
This includes:
- All tests have been refactored to follow the same format, with a small table atop the resulting `PDF` describing the test, when the test was run, etc
- All tests (attempt to) follow the same color-scheme (making them look more professional and consistent)
- Tests against the entire corpus have been limited to the essentials, with extensive reporting

:arrow_up: Performance Boost

There are a few minor tweaks that have boosted the performance of `pText` as a whole.
This includes the copy-behaviour of `Font` objects in the `CanvasGraphicsState`. This has caused a speed-up of nearly 33%.

:page_facing_up: Fonts

I have also implemented some minor fixes to the whole `Font` logic, ensuring font-sizes are now handled properly,
regardless of whether they are passed as an argument to the `Tf` operator or via the text-matrix in the `CanvasGraphicsState`.

I have also started implementing OCR. But more on that in a future release.

:lock: Redaction

Finally, this release includes everything needed to perform redaction.
This is the process of:
- marking content to be removed (but not removing it, enabling review by a third party)
- removing content that has been marked

This functionality integrates nicely in the existing `pText` framework of `Page` annotations.
Check the examples for more details (look for "adding redaction annotations to a PDF")

1.8.8

Major overhaul of all font-related functionality.
This release features
- a speedup in copying fonts (often performed when processing pages),
- as well as the ability to use custom (ttf) fonts

Furthermore, all public methods have been documented.

1.8.7

This release features:
- more documentation
- ´setup.py´ and ´requirements.txt´ have changed to ensure pText can easily be installed
- support for embedded files in PDF

1.8.6

This is a documentation release.
The documentation percentage is now 90%

1.8.3

- Bugfix release
- Improvements to layout algorithm
- Improvements to IO (enabling a Document to be saved multiple times)

Page 10 of 12

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.