Archivebox

Latest version: v0.7.2

Safety actively analyzes 685670 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 6

0.7.2

Not secure
<sup>Note: this is not packaged using "proper" debian techniques like 0.6.2 was, instead it's just a wrapper for executing `pip install archivebox` w/ a few extras. This is because ArchiveBox relies on some binary and dynamic dependencies (node, chrome, playwright, ffmpeg, yt-dlp, etc.) which aren't allowed in Debian packages.<br/>
(Launchpad `apt` `ppa` & `brew` updates coming eventually, packaging all the vendored binaries that archivebox depends on has gotten harder lately)</sup>

---

<img width="300" alt="CLI version screenshot" align="right" src="https://github.com/ArchiveBox/ArchiveBox/assets/511499/df00a0b8-a42f-4236-8e12-5e881c47d44e"/>

bash
Then run this to upgrade an existing collection data dir to 0.7.2
cd ~/path/to/data/dir
archivebox init


What's Changed

- add `--tag=tag1,tag2,tag3` support to `archivebox schedule` command
- allow `PGID=0` root-group ownership of data dir (but PUID=0 is still not allowed)
- improve error messages, hints, and logging about permissions issues in Docker
- notify users when new ArchiveBox version is available on Github (thanks benmuth!)
- bump dependency versions (yt-dlp, chrome, readability, node, python)
- warn when Docker `/` or `/data` volume mounts don't have any space available
- limit to compatible python version to >= 3.8 and <= 3.11

Bug Fixes

- fix action buttons in Snapshot admin page not showing up correctly
- tag links immediately in first stage of `archivebox add` instead of at the end (so that imports that are paused or interrupted still get tagged correctly)
- fix config variables in `CHROME_USER_AGENT` format string not getting interpolated properly
- switch readability to prefer Chrome DOM dumps for article text instead of singlefile (because singlefile output is often huge and crashes readability/times out)
- make Docker image smaller by removing unneeded docs files
- better current version detection and remove annoying `+editable` string and also add BUILD_TIME
- fix `/browsers/*` does not exist warning on startup

0.7.1

Not secure
bash
Get it with brew on macOS (`amd64`, `arm64`)
brew tap archivebox/archivebox
brew install archivebox


bash
Get it with apt on Ubuntu/Debian based systems (`any`)
wget 'https://github.com/ArchiveBox/debian-archivebox/raw/main/archivebox-0.7.1.deb'
apt install ./archivebox-0.7.1.deb
OR
dpkg -i ./archivebox-0.7.1.deb


<sup>Note: this is not packaged using "proper" debian techniques like 0.6.2 was, instead it's just a wrapper for executing `pip install archivebox` w/ a few extras. This is because ArchiveBox relies on some binary and dynamic dependencies (node, chrome, playwright, ffmpeg, yt-dlp, etc.) which aren't allowed in Debian packages.<br/>
(Launchpad `apt` `ppa` update coming eventually, packaging for `apt` has gotten harder lately)</sup>

---

bash
Then run this to upgrade an existing collection data dir to 0.7.1
cd ~/path/to/data/dir
archivebox init


---

What's Changed

Lots of bugfixes, speedups, and small convenience features.

* fix bookmarklet script by dryrain39 in https://github.com/ArchiveBox/ArchiveBox/pull/708
* point to master image, not latest by FiddlyRumpus in https://github.com/ArchiveBox/ArchiveBox/pull/739
* Docs: Improve spelling on readme by Namdrib in https://github.com/ArchiveBox/ArchiveBox/pull/766
* Exempt /add route from CSRF by tjhorner in https://github.com/ArchiveBox/ArchiveBox/pull/777
* Bump ws from 5.2.2 to 5.2.3 by dependabot in https://github.com/ArchiveBox/ArchiveBox/pull/784
* Discard Referer header from iframe and link to original URL by Inndy in https://github.com/ArchiveBox/ArchiveBox/pull/799
* Update setup.sh in https://github.com/ArchiveBox/ArchiveBox/pull/804
* Fix Pinboard RSS parsing valid links as `None` by overhacked in https://github.com/ArchiveBox/ArchiveBox/pull/822
* healthcheck endpoint by ajgon in https://github.com/ArchiveBox/ArchiveBox/pull/873
* Update README.md by adamwolf in https://github.com/ArchiveBox/ArchiveBox/pull/884
* Fixes Add button behavior on Safari by adamwolf in https://github.com/ArchiveBox/ArchiveBox/pull/886
* Tweak JS so Safari can choose admin actions by adamwolf in https://github.com/ArchiveBox/ArchiveBox/pull/885
* Avoid KeyError on Pocket API parser by bltavares in https://github.com/ArchiveBox/ArchiveBox/pull/843
* (847) Decode error output hints to string if needed by TheCakeIsNaOH in https://github.com/ArchiveBox/ArchiveBox/pull/904
* Change logfile open to write mode only by tuupola in https://github.com/ArchiveBox/ArchiveBox/pull/906
* Fix 725 - correctly parse tags on json import by hannah98 in https://github.com/ArchiveBox/ArchiveBox/pull/908
* Bump ansi-regex from 5.0.0 to 5.0.1 by dependabot in https://github.com/ArchiveBox/ArchiveBox/pull/910
* Bump jszip from 3.6.0 to 3.7.1 by dependabot in https://github.com/ArchiveBox/ArchiveBox/pull/909
* Added TAG_SEPARATOR_PATTERN option for splitting tags by hannah98 in https://github.com/ArchiveBox/ArchiveBox/pull/911
* Fix typo: volumes section in docker-compose.yml should use array notation by akhilleusuggo in https://github.com/ArchiveBox/ArchiveBox/pull/918
* Fix broken URI fragment in README.md by xfq in https://github.com/ArchiveBox/ArchiveBox/pull/942
* Fix typo in README.md by hyfen in https://github.com/ArchiveBox/ArchiveBox/pull/932
* Fix bin_version: set LANG=C when calling executables to avoid parsing localized output by pellaeon in https://github.com/ArchiveBox/ArchiveBox/pull/936
* Fix arch installation command by CrazyPython in https://github.com/ArchiveBox/ArchiveBox/pull/923
* Update pywb entrypoint by kusold in https://github.com/ArchiveBox/ArchiveBox/pull/961
* Fix missing input redirection in a hint text by rossvor in https://github.com/ArchiveBox/ArchiveBox/pull/967
* improve title extractor by prnake in https://github.com/ArchiveBox/ArchiveBox/pull/924
* Bump node-fetch from 2.6.1 to 2.6.7 by dependabot in https://github.com/ArchiveBox/ArchiveBox/pull/969
* Add PikaPods as commercial hosting option by m3nu in https://github.com/ArchiveBox/ArchiveBox/pull/974
* Attempted to warn on 984 and 1014 by turian in https://github.com/ArchiveBox/ArchiveBox/pull/1020
* Method typo? by EsEnZeT in https://github.com/ArchiveBox/ArchiveBox/pull/1048
* Added standalone dockerfile instructions by turian in https://github.com/ArchiveBox/ArchiveBox/pull/1023
* Add missing migration 0021 by turian in https://github.com/ArchiveBox/ArchiveBox/pull/1027
* get setup.sh to run on FreeBSD again (13.x) by mwestza in https://github.com/ArchiveBox/ArchiveBox/pull/1068
* Warn on broken steps, use yt-dlp to avoid youtube-dl errors, and don't crash on bad UTF-8 by turian in https://github.com/ArchiveBox/ArchiveBox/pull/1026
* Add SINGLEFILE_ARGS to control single-file arguments by notevenaperson in https://github.com/ArchiveBox/ArchiveBox/pull/1021
* Support for Reverse Proxy authentication backends (like authelia) by ajgon in https://github.com/ArchiveBox/ArchiveBox/pull/866
* Bump moment from 2.29.3 to 2.29.4 by dependabot in https://github.com/ArchiveBox/ArchiveBox/pull/1081
* Install the CodeSee workflow. by codesee-maps in https://github.com/ArchiveBox/ArchiveBox/pull/1103
* Revert "Install the CodeSee workflow." by pirate in https://github.com/ArchiveBox/ArchiveBox/pull/1104
* add systemd config by fa0311 in https://github.com/ArchiveBox/ArchiveBox/pull/1115
* add CHROME_TIMEOUT args by fa0311 in https://github.com/ArchiveBox/ArchiveBox/pull/1120
* add explicitly specify --headless=new by fa0311 in https://github.com/ArchiveBox/ArchiveBox/pull/1123
* Add missing closing quote to style attribute by tejr in https://github.com/ArchiveBox/ArchiveBox/pull/1128
* Fix for Issue 1008 by dcalano in https://github.com/ArchiveBox/ArchiveBox/pull/1131

New Contributors

<details><summary><h2>Expand to see the list...</h2></summary>

* dryrain39 made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/708
* FiddlyRumpus made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/739
* Namdrib made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/766
* tjhorner made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/777
* Inndy made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/799
* ajgon made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/873
* TheCakeIsNaOH made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/904
* tuupola made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/906
* akhilleusuggo made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/918
* xfq made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/942
* hyfen made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/932
* pellaeon made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/936
* CrazyPython made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/923
* kusold made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/961
* rossvor made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/967
* prnake made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/924
* m3nu made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/974
* turian made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/1020
* EsEnZeT made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/1048
* mwestza made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/1068
* notevenaperson made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/1021
* codesee-maps made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/1103
* fa0311 made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/1115
* tejr made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/1128
* dcalano made their first contribution in https://github.com/ArchiveBox/ArchiveBox/pull/1131

</details>

**Full Changelog**: https://github.com/ArchiveBox/ArchiveBox/compare/v0.6.2...v0.7.1

0.6.2

Not secure
New features

- new ArchiveResult log in the admin web UI, with full editing ability of individual extractor outputs + list of outputs under each Snapshot admin entry
- ability to save multiple snapshots of the same URL over time using new `Re-snapshot` button
- add `init --quick` and `server --quick-init` options to quickly update the db version without doing a full re-init (for users with large archive collections this will make version upgrades a lot faster / less painful)
- add new `archivebox setup` command and `archivebox init --setup` flag to aid in automatically installing dependencies and creating a superuser during initial setup
- new `SNAPSHOTS_PER_PAGE=40` and `MEDIA_MAX_SIZE=750m` config options
- allow hotlinking directly to specific extractor output on the snapshot detail page using URL `hash` e.g. `/archive/<timestamp>/index.htmlgit`
- add ability to view snapshot matching a given URLs by visiting `/archive/https://example.com/some/url` -> redirects to -> `/archive/<timestamp>/index.html` (also works without scheme `/archive/example.com`)
- 660 add ability to tag URLs while adding them via the web UI and via the CLI using `archivebox add --tag=tag1,tag2,tag3 ...`
- 659 add back ability to override visual styling with custom HTML and CSS using new config option `CUSTOM_TEMPLATES_DIR`
- ability to add and remove multiple tags at once from the snapshot admin using autocompleting dropdown

Enhancements

- lots of performance improvements! (in testing with 100k entries, the main index was brought down from 10-14 second load times to ~110ms once cache warms up)
- full text search now works on the public snapshot list
- dates and times are now localized to your browser's timezone instead of showing in UTC
- integrity and correctness improvements to readability, mercury, warc, and other extractors
- video subtitles and description are now added to the full-text search index as well (including youtube's autogenerated transcripts in all languages)
- log all errors with full tracebacks to new `data/logs/errors.log` file (so users no longer have to run in --debug mode to see error details)
- better `archivebox schedule` logging and changed logfile location to `./logs/schedule.log`
- better docker-compose setup experience with sonic config example in `docker-compose.yml`
- add Django Debug Toolbar + `djdt_flamegraph` for developers to profile UI performance
- add `--overwrite` flag support to `archivebox schedule`, archived urls get added similarly to `add --overwrite`
- 644 remove boostrap and jquery remove network requests to CDNs by inlining them instead
- 647 allow filtering by ArchiveResult status in the Snapshot admin UI to select only links that have been archived or not archived
- 550 kill all orphan child processes after each extractor finishes to prevent dangling chromium/node subprocesses and memory leaks
- 3276434 add new `SEARCH_BACKEND_TIMEOUT` config option to tune amount of time search backend can take before it gives up
- more diagnostic info added to the Snapshot admin view including most recent status code, content type, detected server, etc
- make the order of the table columns, layout, and spacing the same on the public view and private view (also remove DataTable, we're not using it)
- better snapshot grid page (faster load times, nicer CSS for tags and cards, more actions supported and metadata shown)
- added `Cache-Control` headers to dramatically speed up load times by caching favicons, screenshots, etc. in browsers/upstreams
- new project releases page https://releases.archivebox.io and demo url https://demo.archivebox.io

Bugfixes

- 673 fix searching by URL substring in Snapshot admin list
- 658 fix Snapshot admin action buttons not working in Safari and some other browsers
- 678 fix `AssertionError` error when archivebox would to attempt archive with `CHROME_BINARY=None` when Chrome was not found on host system
- 654 fix some issues with sonic attempting to index massive text blobs or binary blobs on some pages and hanging
- 674 fix UTF-8 encoding encoding problems with file reading/writing on Windows (supporting a Python pkg on Windows is unreasonably painful ya'll)
- 433 fix deleted items sometimes reappearing on next import/update
- 473 fix issue preventing use of archivebox python API inside raw REPL (not using archivebox shell)
- fix stdin/stdout/stderr handling for some edge cases in Docker/Docker-Compose

![image](https://user-images.githubusercontent.com/511499/114269685-661a9000-99d6-11eb-8bbd-1b370b9eddf0.png)
![image](https://user-images.githubusercontent.com/511499/113627981-3d8b4280-9632-11eb-94fa-06f310e916f5.png)

0.5.6

- add ARMv7 and ARMv8 CPU support for `apt` / `deb` distribution on Launchpad PPA
- fix nodesource apt repo not supported on i386 b90afc8
- fix handling of skipped ArchiveResult entries with null output 0aea5ed
- catch exception on import of old index.json into ArchiveResult 171bbeb
- move debsign to release not build 66fb5b2
- skip tests during debian build a32eac3
- fix emptystrings in cmd_version causing exception a49884a
- automate deb dist better and bump version 0e6ac39
- fix assertion 6705354
- change wording of db not found error 683a087

0.5.4

Not secure
Thank you contributors who helped with the 181 commits in this release!
cdvv7788, jdcaballerov, thedanbob, aggroskater, mAAdhaTTah, mario-campos, mikaelf

- fix migration failing due to null cmd_versions in older archives a3008c8
- Publish, minor, & major version to DockerHub and add set up CodeQL codeql-analysis.yml c5b7d9f, bbb6cc8
- fix DATABASE_NAME posixpath, and dependencies dict bug 02bdb3b, 5c7842f
- use relative imports for `.util` to fix windows import clash 72e2c7b
- fix `COOKIES_FILE` config param breaking in wget ef7711f
- Refactor `should_save_extractor` methods to accept `overwrite` parameter 5420903
- Fix issue 617 by using mark_safe in combination with format_html … 1989275
- make permission chowning on docker start less fancy, respect PUID/PGID 635
- add createsuperuser flag to server command 39ec77e
- fix files icons styling and use the db exclusively for rendering them, instead of filesystem f004058, 7d8fe66, 5c54bcc, 534ead2
- limit youtubedl download size to 750m and stop splitting out audio files 3227f54
- also search url, timestamp, tags on public index 8a4edb4
- fix trailing slash problems and wget not detecting download path 9764a8e
- add response status code to headers.json c089501
- fix singlefile path used for sonic 24e2493
- cleanup template layout in filesystem, new snapshot detail page UI
<img width="2046" alt="Screen Shot 2021-01-30 at 9 53 22 p" src="https://user-images.githubusercontent.com/511499/106431311-126e5200-643b-11eb-9cd9-05c3102d9945.png">

0.5.3

Not secure
- ArchiveResult moved to SQLite3 DB for performance cdvv7788
- lots of assorted bugfixes and improvements courtesy of cdvv7788 and jdcaballerov
- new full-text search support with ripgrep and sonic courtesy of jdcaballerov
- new `archivebox oneshot` command for downloading a single site without starting a whole collection
- new Pocket API importer courtesy of mAAdhaTTah
- new Wallabag importer courtesy of ehainry
- new extractor options on Add page courtesy of BlipRanger
- new apt/deb/homebrew/pip packaging setup into separate repos under new Github Org https://github.com/ArchiveBox
- new official PPA and Docker Hub accounts https://hub.docker.com/r/archivebox/archivebox (with automatic armv7 builds courtesy of chrismeller)
- new Snapshot grid view courtesy of jdcaballerov
![image](https://user-images.githubusercontent.com/511499/104231980-ad1bd800-541d-11eb-9bfb-58fdfd4ce1ef.png)

Page 2 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.