Scancode-toolkit

Latest version: v32.2.1

Safety actively analyzes 666166 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 12

31.0.0

-----------------------

This is a major release with important bug and security fixes, new and improved
features and API changes.

Note that we no longer support Python 3.6. Use Python 3.7+ instead.


Important API changes:
~~~~~~~~~~~~~~~~~~~~~~~~

- The data structure of the JSON output has changed for copyrights, authors
and holders. We now use a proper name for attributes and not a generic "value".

- The data structure of the JSON output has changed for packages. We now
return "package_data" package information at the manifest file-level
rather than "packages". This has all the data attributes of a "package_data"
field plus others: "package_uuid", "package_data_files" and "files".

- There is a a new top-level "packages" attribute that contains package
instances that can be aggregating data from multiple manifests.

- There is a a new top-level "dependencies" attribute that contains each
dependency instance, these can be standalone or releated to a package.
These contain a new "extra_data" object.

- There is a new resource-level attribute "for_packages" which refers to
packages through package_uuids (pURL + uuid string).

- The data structure for HTML output has been changed to include emails and
urls under the "infos" object. The HTML template displays output for holders,
authors, emails, and urls into separate tables like "licenses" and "copyrights".

- The data structure for CSV output has been changed to rename the Resource
column to "path". "copyright_holder" has been renamed to "holder"

- The license clarity scoring plugin has been overhauled to show new license
clarity criteria. More details of the new scoring criteria are provided below.

- The functionality of the summary plugin has been imprived to provide declared
origin and license information for the codebase being scanned. The previous
summary plugin functionality has been preserved in the new ``tallies`` plugin.
More details are provided below.

- ScanCode has adopted the new code skeleton from https://github.com/nexB/skeleton
The key change is the location of the virtual environment. It used to be
created at the root of the scancode-toolkit directory. It is now created
under the ``venv`` subdirectory. You mus be aware of this if you use ScanCode
from a git clone

- ``DatafileHandler.assemble()``, ``DatafileHandler.assemble_from_many()``, and
the other ``.assemble()`` methods from the other Package handlers from
packagedcode, have been updated to yield Package items before Dependency or
Resource items. This is particulary important in the case where we are calling
the ``assemble()`` method outside of the scancode-toolkit context, where we
need to ensure that a Package exists before we assocate a Resource or
Dependency to it.

Copyright detection:
~~~~~~~~~~~~~~~~~~~~

- The data structure in the JSON is now using consistently named attributes as
opposed to plain values.
- Several copyright detection bugs have been fixed.
- French and German copyright detection is improved.
- Some spurious trailing dots in holders are not stripped.


License detection:
~~~~~~~~~~~~~~~~~~~

- There have been significant license detection rules and licenses updates:

- 107 new licenses have been added (total is now 1954)
- 6780 new license detection rules have been added (total is now 32259)
- 6753 existing false positive license rules have been removed (see below).
- The SPDX license list has been updated to the latest v3.17

- The rule attribute "only_known_words" has been renamed to "is_continuous" and its
meaning has been updated and expanded. A rule tagged as "is_continuous" can only
be matched if there are no gaps between matched words, be they stopwords, extra
unknown or known words. This improves several false positive license detections.
The processing for "is_continous" has been merged in "key phrases" processing
below.

- Key phrases can now be defined in a RULE text by surrounding one or more words
with double curly braces `{{` and `}}`. When defined a RULE will only match
when the key phrases match exactly. When all the text of rule is a "key phrase",
this is the same as being "is_continuous".

- The "--unknown-licenses" option now also detects unknown licenses using a
simple and effective ngrams-based matching in area that are not matched or
weakly matched. This helps detects things that look like a license but are not
yet known as licenses.

- False positive detection of "license lists" like the lists seen in license and
package management tools has been entirely reworked. Rather than using
thousands of small false positive rules, there is a new filter to detect a
long run of license references and tags that is typical of license lists.
As a results, thousands of rules have been replaced by a simpler filter, and
the license detection is more accurate, faster and has fewer false
positives.

- The new license flag "is_generic" tags licenses that are "generic" licenses
such as "other-permissive" or "other-copyleft". This is not yet
returned in the JSON API.

- When scanning binary files, the detection of single word rules is filtered when
surrounded by gibberish or mixed case. For instance $%$GpL$ is a false
positive and is no longer reported.

- Several rules we tagged as is_license_notice incorrectly but were references
and have been requalified as is_license_reference. All rules made of a single
ord have been requalified as is_license_reference if they were not qualified
this way.

- Matches to small license rules (with small defined as under 15 words)
that are scattered over too many lines are now filtered as false matches.

- Small, two-words matches that overlap the previous or next match by
by the word "license" and assimilated are now filtered as false matches.

- The new --licenses-reference option adds a new "licenses_reference" top
level attribute to a scan when using the JSON and YAML outputs. This contains
all the details and the full text of every license seen in a file or
package license expression of a scan. This can be added added after the fact
using the --from-json option.

- New experimental support for non-English licenses. Use the command
./scancode --reindex-licenses-for-all-languages to index all known non-English
licenses and rules. From that point on, they will be detected. Because of this
some licenses that were not tagged with their languages are now correctly
tagged and they may not be detected unless you activate this new indexing
feature.


Package detection:
~~~~~~~~~~~~~~~~~~

- Major changes in package detection and reporting, codebase-level attribute `packages`
with one or more `package_data` and files for the packages are reported.
The specific changes made are:

- The resource level attribute `packages` has been renamed to `package_data`,
as these are really package data that are being detected, such as manifests,
lockfiles or other package data. This has the data attributes of a `package_data`
field plus others: `package_uuid`, `package_data_files` and `files`.

- A new top-level attribute `packages` has been added which contains package
instances created from `package_data` detected in the codebase.

- A new codebase level attribute `dependencies` has been added which contains dependency
instances created from lockfiles detected in the codebase.

- The package attribute `root_path` has been deleted from `package_data` in favour
of the new format where there is no root conceptually, just a list of files for each
package.

- There is a new resource-level attribute `for_packages` which refers to
packages through package_uids (pURL + uuid string). A `package_adder`
function is now used to associate a Package to a Resource that is part of
it. This gives us the flexibility to use the packagedcode Package handlers
in other contexts where `for_packages` on Resource is not implemented in the
same way as scancode-toolkit.

- The package_data attribute `dependencies` (which is a list of DependentPackages),
now has a new attribute `resolved_package` with a package data mapping.
Also the `requirement` attribute is renamed to `extracted_requirement`.
There is a new `extra_data` to collect extra data as needed.

- For Pypi packages, python_requires is treated as a package dependency.


License Clarity Scoring Update:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- We are moving away from the original license clarity scoring designed for
ClearlyDefined in the license clarity score plugin. The previous license
clarity scoring logic produced a score that was misleading when it would
return a low score due to the stringent scoring criteria. We are now using
more general criteria to get a sense of what provenance information has been
provided and whether or not there is a conflict in licensing between what
licenses were declared at the top-level key files and what licenses have been
detected in the files under the top-level.

- The license clarity score is a value from 0-100 calculated by combining the
weighted values determined for each of the scoring elements:

- Declared license:

- When true, indicates that the software package licensing is documented at
top-level or well-known locations in the software project, typically in a
package manifest, NOTICE, LICENSE, COPYING or README file.
- Scoring Weight = 40

- Identification precision:

- Indicates how well the license statement(s) of the software identify known
licenses that can be designated by precise keys (identifiers) as provided in
a publicly available license list, such as the ScanCode LicenseDB, the SPDX
license list, the OSI license list, or a URL pointing to a specific license
text in a project or organization website.
- Scoring Weight = 40

- License texts:

- License texts are provided to support the declared license expression in
files such as a package manifest, NOTICE, LICENSE, COPYING or README.
- Scoring Weight = 10

- Declared copyright:

- When true, indicates that the software package copyright is documented at
top-level or well-known locations in the software project, typically in a
package manifest, NOTICE, LICENSE, COPYING or README file.
- Scoring Weight = 10

- Ambiguous compound licensing:

- When true, indicates that the software has a license declaration that
makes it difficult to construct a reliable license expression, such as in
the case of multiple licenses where the conjunctive versus disjunctive
relationship is not well defined.
- Scoring Weight = -10

- Conflicting license categories:

- When true, indicates that the declared license expression of the software
is in the permissive category, but that other potentially conflicting
categories, such as copyleft and proprietary, have been detected in lower
level code.
- Scoring Weight = -20


Summary Plugin Update:
~~~~~~~~~~~~~~~~~~~~~~

- The summary plugin's behavior has been changed. Previously, it provided a
count of the detected license expressions, copyrights, holders, authors, and
programming languages from a scan.

We have preserved this functionality by creating a new plugin called ``tallies``.
All functionality of the previous summary plugin have been preserved in the
tallies plugin.

- The new summary plugin now attempts to determine a declared license expression,
declared holder, and the primary programming language from a scan. And the
updated license clarity score provides context on the quality of the license
information provided in the codebase key files.

- The new summary plugin also returns lists of tallies for the other "secondary"
detected license expressions, copyright holders, and programming languages.

All summary information is provided at the codebase-level attribute named ``summary``.


Outputs:
~~~~~~~~

- Added new outputs for the CycloneDx format.
The CLI now exposes options to produce CycloneDx BOMs in either JSON or XML format

- A new field ``warnings`` has been added to the headers of ScanCode toolkit output
that contains any warning messages that occur during a scan.

- The CSV output format --csv option is now deprecated. It will be replaced by
new CSV and tabular output formats in the next ScanCode release.
Visit https://github.com/nexB/scancode-toolkit/issues/3043 to provide inputs
and feedback.


Output version
--------------

Scancode Data Output Version is now 2.0.0.


Changes:

- Rename resource level attribute `packages` to `package_data`.
- Add top-level attribute `packages`.
- Add top-level attribute `dependencies`.
- Add resource-level attribute `for_packages`.
- Remove `package-data` attribute `root_path`.
- The fields of the license clarity scoring plugin have been replaced with the
following fields. An overview of the new fields can be found in the "License
Clarity Scoring Update" section above.

- `score`
- `declared_license`
- `identification_precision`
- `has_license_text`
- `declared_copyrights`
- `conflicting_license_categories`
- `ambigious_compound_licensing`

- The fields of the summary plugin have been replaced with the following fields.
An overview of the new fields can be found in the "Summary Plugin Update"
section above.

- `declared_license_expression`
- `license_clarity_score`
- `declared_holder`
- `primary_language`
- `other_license_expressions`
- `other_holders`
- `other_languages`

- A new field ``run_order`` has been added to ``BasePlugin`` and set on all
ScanCode plugins. Plugin run order and output order are now set independently
of one another.


Documentation Update
~~~~~~~~~~~~~~~~~~~~~~~~

- Various documentation files have been updated to reflects API changes and
correct minor documentation issues.


Development environment and Code API changes:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- The main package API function `get_package_infos` is deprecated, and
replaced by `get_package_data`.

- The Resources path are always the same regardless of the strip-root or
full-root arguments.

- The license cache consistency is not checked anymore when you are using a git
checkout. The SCANCODE_DEV_MODE tag file has been removed entirely. Use
instead the --reindex-licenses option to rebuild the license index.

- We can now regenerate test fixtures using the new SCANCODE_REGEN_TEST_FIXTURES
environment variable. There is no need to replace the regen=False with
regen=True in the code.


Miscellaneous
~~~~~~~~~~~~~~~~~~~~~~~~

- Added support for usage of shortcut flags
- `-A` or `--about`
- `-q` or `--quiet`
- `-v` or `--verbose`
- `-V` or `--version` can be used.

30.1.0

Not secure
--------------------

This is a bug fix release for these bugs:

- https://github.com/nexB/scancode-toolkit/issues/2717

We now return the package in the summaries as before.

There is also a minor API change: we no longer return a count of "null" empty
values in the summaries for license, copyrights, etc.


Thank you to:
- Thomas Druez tdruez

30.0.1

Not secure
--------------------

This is a minor bug fix release for these bugs:

- https://github.com/nexB/commoncode/issues/31
- https://github.com/nexB/scancode-toolkit/issues/2713

We now correctly work with all supported Click versions.

Thank you to:
- Konstantin Kochin vznncv
- Thomas Druez tdruez

30.0.0

Not secure
--------------------

This is a major release with new features, and several bug fixes and
improvements including major updates to the license detection.

We have droped using calendar-based versions and are now switched back to semver
versioning. To ensure that there is no ambiguity, the new major version has been
updated from 21 to 30. The primary reason is that calver was not helping
integrators to track major version changes like semver does.

We also have introduced a new JSON output format version based on semver to
version the JSON output format data structure and have documented the new
versioning approach.


Package detection:
~~~~~~~~~~~~~~~~~~

- The Debian packages declared license detection in machine readable copyright
files and unstructured copyright has been significantly improved with the
tracking of the detection start and end line of a license match. This is not
yet exposed outside of tests but has been essential to help improve detection.

- Debian copyright license detection has been significantly improved with new
license detection rules.

- Support for Windows packages has been improved (and in particular the handling
of Windows packages detection in the Windows registry).

- Support for Cocoapod packages has been significantly revamped and is now
working as expected.

- Support for PyPI packages has been refined, in particular package descriptions.



Copyright detection:
~~~~~~~~~~~~~~~~~~~~

- The copyright detection accuracy has been improved and several bugs have been
fixed.


License detection:
~~~~~~~~~~~~~~~~~~~

There have been some significant updates in license detection. We now track
34,164 license and license notices:

- 84 new licenses have been added,
- 34 existing license metadata have been updated,
- 2765 new license detection rules have been added, and
- 2041 existing license rules have been updated.


- Several license detection bugs have fixed.

- The SPDX license list 3.14 is now supported and has been synced with the
licensedb. We also include the version of the SPDX license list in the
ScanCode YAML, JSON and the SPDX outputs, as well as display it with the
"--version" command line option.

- Unknown licenses have a new flag "is_unknown" in their metadata to identify
them explicitly. Before that we were just relying on the naming convention of
having "unknown" as part of a license key.

- Rules that match at least one unknown license have a flag "has_unknown" set
and returned in the match results.

- Experimental: License detection can now "follow" license mentions that
reference another file such as "see license in COPYING" where we can relate
this mention to the actual license detected in the COPYING file. Use the new
"--unknown-licenses" command line option to test this new feature.
This feature will evolve significantly in the next version(s).


Outputs:
~~~~~~~~

- The SPDX output now has the mandatory ids attribute per SPDX spec. And we
support SPDX 2.2 and SPDX license list 3.14.


Miscellaneous
~~~~~~~~~~~~~~~

- There is a new "--no-check-version" CLI option to scancode to bypass live,
remote outdated version check on PyPI

- The scan results and the CLI now display an outdated version warning when
the installed ScanCode version is older than 90 days. This is to warn users
that they are relying on outdated, likely buggy, insecure and inaccurate scan
results and encourage them to update to a newer version. This is made entirely
locally based on date comparisons.

- We now display again the command line progressbar counters correctly.

- A bug has been fixed in summarization.

- Generated code detection has been improved with several new keywords.


Thank you!
~~~~~~~~~~~~

Many thanks to the many contributors that made this release possible and in
particular:

- Akanksha Garg akugarg
- Armijn Hemel armijnhemel
- Ayan Sinha Mahapatra AyanSinhaMahapatra
- Bryan Sutula sutula
- Chin-Yeung Li chinyeungli
- Dennis Clark DennisClark
- dyh yunhua-deng
- Dr. Frank Heimes FrankHeimes
- gunaztar gunaztar
- Helio Chissini de Castro heliocastro
- Henrik Sandklef hesa
- Jiyeong Seok dd-jy
- John M. Horan johnmhoran
- Jono Yang JonoYang
- Joseph Heck heckj
- Luis Villa tieguy
- Konrad Weihmann priv-kweihmann
- mapelpapel mapelpapel
- Maximilian Huber maxhbr
- Michael Herzog mjherzog
- MMarwedel MMarwedel
- Mikko Murto mmurto
- Nishchith Shetty inishchith
- Peter Gardfjäll petergardfjall
- Philippe Ombredanne pombredanne
- Rainer Bieniek rbieniek
- Roshan Thomas Thomshan
- Sadhana s4-2
- Sarita Singh itssingh
- Siddhant Khare Siddhant-K-code
- Soim Kim soimkim
- Thomas Druez tdruez
- Thorsten Godau tgodau
- Yunus Rahbar yns88

21.8.4

Not secure
---------

This is a minor bug fix release primarily for Windows installation.
There is no feature change.

Installation:
~~~~~~~~~~~~~~~~~~

- Application installation on Windows works again. This fixes 2610
- We now build and test app bundles on all supported Python versions: 3.6 to 3.9


Thank you to gunaztar for reporting the 2610 bug

Documentation:
~~~~~~~~~~~~~~~~~~

- Documentation is updated to reference supported Python versions 3.6 to 3.9

21.7.30

Not secure
---------

This is a minor release with several bug fixes, major performance improvements
and support for new and improved package formats


Many thanks to every contributors that made this possible and in particular:

- Abhigya Verma abhi27-web
- Ayan Sinha Mahapatra AyanSinhaMahapatra
- Dennis Clark DennisClark
- Jono Yang JonoYang
- Mayur Agarwal mrmayurgithub
- Philippe Ombredanne pombredanne
- Pierre Tardy tardyp


Outputs:
~~~~~~~~

- Add new YAML-formatted output. This is exactly the same data structure as for
the JSON output
- Add new Debian machine readable copyright output.
- The CSV output "Resource" column has been renamed to "path".
- The SPDX output now has the mandatory DocumentNamespace attribute per SPDX specs 2344


Copyright detection:
~~~~~~~~~~~~~~~~~~~~

- The copyright detection speed has been significantly improved with the tests
taking roughly 1/2 of the time to run. This is achieved mostly by replacing
NLTK with a the minimal and simplified subset we need in a new library named
pygmars.

License detection:
~~~~~~~~~~~~~~~~~~~

- Add new licenses: now tracking 1763 licenses
- Add new license detection rules: now tracking 29475 license detection rules
- We have also improved license expression parsing and processing


Package detection:
~~~~~~~~~~~~~~~~~~

- The Debian packages declared license detection has been significantly improved.
- The Alpine packages declared license detection has been significantly improved.
- There is new support for shell parsing and Alpine packages APKBUILD data collection.
- There is new support for various Windows packages detection using multiple
techniques including MSI, Windows registry and several more.
- There is new support for Distroless Debian-like installed packages.
- There is new support for Dart Pub package manifests.

Page 5 of 12

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.