---------------------
This is a major pre-release of what will come up for 3.0
This has a lot of new changes including improved plugins, speed and detection
that are not yet fully documented but it can be used for testing.
API changes:
- Command line API
- `--diag` option renamed to `--license-diag`
- `--format <format code>` option has been replaced by multiple options one
for each format such as `--format-csv` `--format-json` and multiple formats
can be requested at once
- new experimental `--cache-dir` option and `SCANCODE_CACHE` environment variable
and `--temp-dir` and `SCANCODE_TMP` environment variable to set the temp and
cache directories.
- JSON data output format: no major changes
- programmatic API in scancode/api.py:
- get_urls(location, threshold=50): new threshold argument
- get_emails(location, threshold=50): new threshold argument
- get_file_infos renamed to get_file_info
- Resource moved to scancode.resource and significantly updated
- get_package_infos renamed to get_package_info
Command line
- You can select multiple outputs at once (e.g. JSON and CSV, etc.) 789
- There is a new capability to reload a JSON scan to reprocess it with postcsan
plugins and or converting a JSON scan to CSV or else.
Licenses:
- There are several new and improved licenses and license detection rules 799 774 589
- Licenses data now contains the full name as well as the short name.
- License match have a notion of "coverage" which is the number of matched
words compared to the number of words in the matched rule.
- The license cache is not checked anymore for consistency once created which
improved startup times. (unless you are using a Git checkout and you are
developping with a SCANCODE_DEV_MODE tag file present)
- License catagory names have been improved
Copyrights:
- Copyright detection in binary files has been improved
- There are several improvements to the copyright detection quality fixing these
tickets: 795 677 305 795
- There is a new post scan plugin that can be used to ignore certain copyright in
the results
Summaries:
- Add new support for copyright summaries using smart holder deduplication 930
Misc:
- Add options to limit the number of emails and urls that are collected from
each file (with a default to 50) 384
- When configuring in dev mode, VS Code settings are created
- Archive detection has been improved
- There is a new cache and temporary file configuration with --cache-dir and
--temp-dir CLI options. The --no-cache option has been removed
- Add new --examples to show usage examples help
- Move essential configuration to a scancode_config.py module
- Only read a few pages from PDF files by default
- Improve handling of files with weird characters in their names on all OSses
- Improve detection of archive vs. comrpessed files
- Make all copyright tests data driven using YAML files like for license tests
Plugins
- Prescan plugins can now exclude files from the scans
- Plugins can now contribute arbitrary command line options 787 and 748
- there is a new plugin stage called output_filter to optionally filter a scan before output.
One example is to keep "only findings" 787
- The core processing is centered now on a Codebase and Resource abstraction
that represents the scanned filesystem in memory 717 736
All plugins operate on this abstraction
- All scanners are also plugins 698 and now everything is a plugin including the scans
- The interface for output plugins is the same as other plugins 715
Credits: Many thanks to everyone that contributed to this release with code and bug reports
(and this list is likely missing some)
- SaravananOffl
- jpopelka
- yashdsaraf
- haikoschol
- jdaguil
- ajeans
- DennisClark
- susg
- pombredane
- mjherzog
- Sidsharik
- nishakm
- yasharmaster
- techytushar
- JonoYang
- majurg
- aviral1701
- haikoschol
- chinyeungli
- vivonk
- Chaitya62
- inishchith