Dedoc

Latest version: v2.3.2

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

2.3.2

* Improve merging multi-page tables in `PdfTabbyReader`.
* Stop parsing after client disconnection (for API usage, see [issue 488](https://github.com/ispras/dedoc/issues/488)).

2.3.1

* Fix bug with bold lines in `DocxReader` (see [issue 479](https://github.com/ispras/dedoc/issues/479)).
* Upgraded requirements.txt (`beautifulsoup4` to 4.12.3 version).
* Added support for external grobid (added support parameter `Authorization`).
* Added GOST (Russian government standard) frame recognition in `PdfTabbyReader` (`need_gost_frame_analysis` parameter).
* Update documentation (added GOST frame recognition).
* Added multi-page table handling to `PdfTabbyReader`.

2.3

* [Dedoc telegram chat](https://t.me/dedoc_chat) created.
* Added `patterns` parameter for configuring default structure type.
* Added notebooks with Dedoc usage (see [issue 484](https://github.com/ispras/dedoc/issues/484)).
* Fix bug `OutOfMemoryError: Java heap space` in `PdfTabbyReader` (see [issue 489](https://github.com/ispras/dedoc/issues/489)).
* Fix bug with numeration in `DocxReader` (see [issue 494](https://github.com/ispras/dedoc/issues/494)).
* Added GOST (Russian government standard) frame recognition in `PdfImageReader` and `PdfTxtlayerReader` (`need_gost_frame_analysis` parameter).

2.2.7

* Fix bugs with `start`, `end` of `BBoxAnnotation` in `PdfTabbyReader`.
* Improve columns classification and orientation detection for PDF and images (`is_one_column_document` and `document_orientation` parameters).
* Upgrade `docker`: `docker-compose` is no longer supported, use `docker compose` instead.
* Fix bug of tables parsing in `DocxReader` (see [issue](https://github.com/ispras/dedoc/issues/478)).
* Added simple textual layer detection in `PdfAutoReader` (`fast_textual_layer_detection` parameter).
* Improve paragraph extraction from PDF documents and images.
* Retrain a classifier for diplomas (document_type="diploma") on a new dataset.

2.2.6

* Upgrade dependencies: `numpy<2.0` and `dedoc-utils==0.3.7`.

2.2.5

* Added internal functions and classes to support integration of Dedoc into [langchain](https://github.com/langchain-ai/langchain)
* Upgrade some dependencies, in particular, `xgboost>=1.6.0`, `pandas`, `pdfminer.six`

Page 1 of 4

Releases

Has known vulnerabilities

Dedoc

Page 1 of 4

2.3.2

2.3.1

2.3

2.2.7

2.2.6

2.2.5

Page 1 of 4

Links

Releases