Trafilatura

Latest version: v1.12.2

Safety actively analyzes 682441 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 8

1.2.2

- more efficient rules for extraction
- metadata: further attributes used (with felipehertzer)
- better baseline extraction
- issues fixed: 202, 204, 205
- evaluation updated

1.2.1

- ``--precision`` and ``--recall`` arguments added to the CLI
- better text cleaning: paywalls and comments
- improvements for Chinese websites (with glacierck & immortal-autumn): 186, 187, 188
- further bugs fixed: 189, 192 (with felipehertzer), 200
- efficiency: faster module loading and improved RAM footprint

1.2.0

- efficiency: replaced module readability-lxml by trimmed fork
- bug fixed: (179, 180, 183, 184)
- improved baseline extraction
- cleaner metadata (with felipehertzer)

1.1.0

- encodings: better detection, output NFC-normalized Unicode
- maintenance and performance: more efficient code
- bugs fixed (119, 136, 147, 160, 161, 162, 164, 167 and others)
- prepare compatibility with upcoming Python 3.11
- changed default settings
- extended documentation

1.0.0

- compress HTML backup files & seamlessly open .gz files
- support JSON web feeds
- graphical user interface integrated into main package
- faster downloads: reviewed backoff, compressed data
- optional modules: downloads with `pycurl`, language identification with `py3langid`
- bugs fixed (111, 125, 132, 136, 140)
- minor optimizations and fixes by vbarbaresi in [124](https://github.com/adbar/trafilatura/pull/124) & [#130](https://github.com/adbar/trafilatura/pull/130)
- fixed array with single or multiples entries on json extractor by felipehertzer in [143](https://github.com/adbar/trafilatura/pull/143)
- code base refactored with sourcery-ai [121](https://github.com/adbar/trafilatura/pull/121), improved and optimized for Python 3.6+
- drop support for Python 3.5

0.9.3

- better, faster encoding detection: replaced `chardet` with `charset_normalizer`
- faster execution: updated `justext` to 3.0
- better extraction of sub-elements in tables (78, 90)
- more robust web feed parsing
- further defined precision- and recall-oriented settings
- license extraction in footers (118)

Page 4 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.