Tosixinch

Latest version: v0.10.0

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 3

0.10.0

-------------------

**Add:**

* Add 'clean' option (specify how to clean html (both, head, body, none)).

* Update Selenium headless option codes (4.8.0-, they changed the syntax)

* Add styles_to_retain option

* Update lxml (use now independent lxml-html-clean library)

**Fix:**

* Fix regression: selenium downloader errors when given multiple URLs.

* Fix wikipedia sample config (math formulas)

0.9.0

-------------------

Many filename related API changes.

The terms (both in code and doc) themselves are changed.
Basically::

url -> rsrc (resource)
fname, download_file -> dfile
fnew, extracted_file -> efile

This is so great a change,
I recommend to use brand-new working directory (new '_htmls' directory).

Since previous syntax created so many now-redundant directories for files,
if you have many old files (cache) in '_htmls' directory,
it is very confusing.

**Change:**

* **!!** Change name syntax of dfile, Using suffix '.f'

Given URL::

https://en.wikipedia.org/wiki/XPath

Old dfile::

_htmls/en.wikipedia.org/wiki/XPath/_

Now::

_htmls/en.wikipedia.org/wiki/XPath

(Only when there is a file-directory conflict in system)::

_htmls/en.wikipedia.org/wiki/XPath.f

NOTE: The change is always to the file (not the directory),
e.g. when trying to write a dfile ('a/b/c') but there is already a dfile 'a/b',
the latter changes to 'a/b.f'.

* **!!** Change name syntax of efile using suffix '.orig'

Reuse the same name as dfile,
but make sure to rename dfile with the suffix.

Given URL::

https://en.wikipedia.org/wiki/XPath

Old::

_htmls/en.wikipedia.org/wiki/XPath (dfile)
_htmls/en.wikipedia.org/wiki/XPath~.html (efile)

Now::

_htmls/en.wikipedia.org/wiki/XPath.orig (dfile)
_htmls/en.wikipedia.org/wiki/XPath (efile)

* **!!** Change default resource filename 'urls.txt' -> 'rsrcs.txt'

**Add:**

* Use temporary file when downloading (with suffix '.part')

0.8.0

-------------------

Cut dependencies I haven't used for years myself.

**Change:**

* **!!** Cut lib: chardet

* **!!** Cut lib: Qt (webkit and webengine options)

* **!!** Cut lib: readability

* **!!** Cut lib: wkhtmltopdf

* **!!** Cut support: Windows

0.7.0

-------------------

Incremental Improvements

**Change:**

* Change page and <body> margin slightly (for weasyprint)

* Add --inspect action, cutting --link and --news

* Add --headless option, cutting --javascript

* Add '//article' to guess option defaults

* Change '_add_index' from URL to Map class

Very slightly change the handling of '/' and '?' in dfile name

**Add:**

* Add github discussions to sample

* Add loose heuristic for too big images (for inside <table>)

* Add more exclusion of secondary boxes (sample wikipedia)

* Update and Change slugify, add more word separation ('-')

* Add minus number feature to --trimdirs option

* Add --timeout option

* Add --interval option

* Add KeepExtract (no-selection extractor, '--keep-html')

* Add IDTable (fragment link re-resolution when merging htmls)

* Add expose loc_index and loc_appendix to commandline

* Add reading tosixinch.ini and site.ini from current dir

* Add download_dir option

* Add overwrite_html option

* Add current directory search for css and css2 options

**Fix:**

* Fix Add charset for BLANK_HTML (for weasyprint)

* Fix toc title (able to use unicode)

* Fix link when baseurl (<base> tag) is provided

* Fix merge_htmls (no css references for weasyprint)

0.6.0

-------------------

I have refrained from uploading local changes,
waiting for a time when I could look into it closely.
But the time is not coming for a long time,
let's make it updated now.

**Change:**

* When the name of dfile is too long
(if it has a path segment more than 255 characters),
the filename is hashed,
and the filename now takes '_htmls/_hash/<sha1-hexdigit>' form.

* Change browser_engine option default: webkit -> selenium-firefox

* Skip cleaning possible MathJax tag attributes

**Add:**

* Add 'html5prescan' encoding option

* Add elements_to_keep_attrs option

* Add selenium downloading (browser_engine option)

* Add dprocess option

**Fix:**

* Fix ignore any errors in component download

* Fix --browser option error (url was not percent escaped)

0.5.0

-------------------

Most changes are just internal refactorings.

**Change:**

* Add latin_1 to default encoding option

From::

utf-8, cp1252

To::

utf-8, cp1252, latin_1

Which means no encoding errors from input,
and in general it should be preferable.

* Rename method ``_get_relpath`` to ``_get_relative_url``
in ``tosixinch.pcode._pygments.PygmentsCode``.

* Change application config data files to normal INI format

(``data/tosixinch.ini`` and ``data/site.ini``)

Previously the program exposed foreign FINI format files
which is specific to configfetch library.

* Cut Python 3.5

**Add:**

* Add urlno.py and urlmap.py (internal)

('urlno' means url normalization)

* Add lxml_html.py (internal)

* Add action.py (internal)

**Fix:**

* Fix and change user package import

Previously if user's Python environment includes some library
which also has, say, 'script' package,
the program aborted.

Page 1 of 3

Releases

Has known vulnerabilities

Tosixinch

Page 1 of 3

0.10.0

0.9.0

0.8.0

0.7.0

0.6.0

0.5.0

Page 1 of 3

Links

Releases