Bootleg

Latest version: v1.1.0

Safety actively analyzes 687732 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

1.1.1dev1

---------------------

1.1.0

---------------------
Changed
^^^^^^^^^
* We did an architectural change and switched to a biencoder model. This changes our task flow and dataprep. This new model uses less CPU storage and uses the standard BERT architecture. Our entity encoder now takes a textual input of an entity that contains its title, description, KG relationships, and types.
* To support larger files for dumping predictions over, we support adding an ``entity_emb_file`` to the model (extracted from ``extract_all_entities.py``. This will make evaluation faster. Further, we added ``dump_preds_num_data_splits`` to split a file before dumping. As each file pass gets a new dataload object, this can mitiage any torch dataloader memory issues that happens over large files.
* Renamed ``eval_accumulation_steps`` to ``dump_preds_accumulation_steps``.
* Removed option to ``dump_embs``. Users should use ``dump_preds`` instead. The output file will have ``entity_ids`` attribute that will index into the extracted entity embeddings.
* Restructured our ``entity_db`` data for faster loading. It uses Tries rather than jsons to store the data for read only mode. The KG relations are not backwards compatible.
* Moved to character spans for input data. Added utils.preprocessing.convert_to_char_spans as a helper function to convert from word offsets to character offsets.

Added
^^^^^^
* ``BOOTLEG_STRIP`` and ``BOOTLEG_LOWER`` environment variables for ``get_lnrm``.
* ``extract_all_entities.py`` as a way to extract all entity embeddings. These entity embeddings can be used in eval and be used downstream. Uses can use ``get_eid`` from the ``EntityProfile`` to extract the row id for a specific entity.

1.0.5

---------------------
Fixed
^^^^^^^^
* Fixed -1 command line argparse error
* Adjusted requirements

1.0.4

---------------------
Added
^^^^^^
* Tutorial to generate contextualized entity embeddings that perform better downstream

Fixed
^^^^^^^^
* Bump version of Pydantic to 1.7.4

1.0.3

---------------------
Fixed
^^^^^^^
* Corrected how custom candidates were handled in the BootlegAnnotator when using ``extracted_examples``
* Fixed memory leak in BooltegAnnotator due to missing ``torch.no_grad()``

1.0.2

---------------------

Added
^^^^^^
* Support for ``min_alias_len`` to ``extract_mentions`` and the ``BootlegAnnotator``.
* ``return_embs`` flag to pass into ``BootlegAnnotator`` that will return the contextualized embeddings of the entity (using key ``embs``) and entity candidates (using key ``cand_embs``).

Changed
^^^^^^^^^
* Removed condition that aliases for eval must appear in candidate lists. We now allow for eval to not have known aliases and always mark these as incorrect. When dumping predictions, these get "-1" candidates and null probabilities.

Fixed
^^^^^^^
* Corrected ``fit_to_profile`` to rebuild the title embeddings for the new entities.

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.