Scribe-data

Latest version: v4.1.0

Safety actively analyzes 723954 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

4.1.0

✨ Features

- Queries for noun genders and other properties that require the Wikidata label service now return their English label rather than auto label that was returning just the Wikidata QID.
- SPARQL queries for English and Portuguese prepositions were added to allow the CLI to query these types of data.
- The convert functionality once again works for lists of languages all data types for them.

🐞 Bug Fixes

- SQLite conversion was fixed for all queries ([527](https://github.com/scribe-org/Scribe-Data/issues/527)).
- The data conversion process outputs were improved including capitalizing language names and repeat notices to the user were removed.
- The CLI's `get` command now returns all data types if none is passed.
- The Portuguese verbs query was fixed as it wasn't formatted correctly.
- The emoji keyword functionality was fixed given the new lexeme ID based form of the data.
- Arguments were fixed that were breaking the functionality.
- Languages for the user were capitalized.
- `case` has been renamed `grammaticalCase` in preposition queries to assure that SQLite reserved keywords are not used.

4.0.0

✨ Features

- Queries for countless data types for countless languages were expanded and added ❤️
- Scribe-Data is now a fully functional CLI.
- Querying Wikidata lexicographical data can be done via the `get` command ([159](https://github.com/scribe-org/Scribe-Data/issues/159)).
- The output type of queries can be in JSON, CSV, TSV and SQLite, with converting output types also being possible ([145](https://github.com/scribe-org/Scribe-Data/issues/145), [#146](https://github.com/scribe-org/Scribe-Data/issues/146))
- Output paths can be set for query results ([144](https://github.com/scribe-org/Scribe-Data/issues/144)).
- The version of the CLI can be printed to the command line and the CLI can further be used to upgrade itself ([186](https://github.com/scribe-org/Scribe-Data/issues/186), [#157 ](https://github.com/scribe-org/Scribe-Data/issues/157)).
- Total Wikidata lexemes for languages and data types can be derived with the `total` command ([147](https://github.com/scribe-org/Scribe-Data/issues/147)).
- Interactive and total commands can be used via an interactive mode with the `--interactive` argument ([158](https://github.com/scribe-org/Scribe-Data/issues/158), [#203](https://github.com/scribe-org/Scribe-Data/issues/203)).
- Outputs were standardized to assure that the CLI experience is consistent
- The machine translation process has been removed to make way for the Wiktionary based implementation ([292](https://github.com/scribe-org/Scribe-Data/issues/292)).
- Package metadata files were standardized for languages, data types and Wikidata lexeme forms.
- CLI commands have an argument check that can suggest correct languages and data types ([341](https://github.com/scribe-org/Scribe-Data/issues/341)).

🐞 Bug Fixes

- Wikidata query process stages no longer trigger the tqdm progress bar when they're unsuccessful ([155](https://github.com/scribe-org/Scribe-Data/issues/155)).

✅ Tests

- Tests have been written for the CLI to assure that it's functionality remains consistent.
- Workflows were created to assure that the Wikidata queries and project structure are consistent to assure package functionality ([339](https://github.com/scribe-org/Scribe-Data/issues/339), [#357](https://github.com/scribe-org/Scribe-Data/issues/357))
- Project queries and its structure have been updated to match the rules developed for the checks.

📝 Documentation

- The CLI's functionality has been fully documented ([152](https://github.com/scribe-org/Scribe-Data/issues/152), [#208](https://github.com/scribe-org/Scribe-Data/issues/208)).
- Documentation was created to show how to write Scribe-Data queries ([395](https://github.com/scribe-org/Scribe-Data/issues/395)).

♻️ Code Refactoring

- `word_type` has been switched to `data_type` throughout the codebase ([160](https://github.com/scribe-org/Scribe-Data/issues/160)).
- Case, gender and annotation utility functions were removed as the formatting process that used them has changed.
- The SPARQLWrapper access method has been extracted to the Wikidata utils and is imported into the files that need it ([164](https://github.com/scribe-org/Scribe-Data/issues/164)).
- Export data paths have been converted to centrally saved variables to reduce hard coded string repetition.
- Many files were renamed including `update_data.py` being renamed `query_data.py`
- Paths within the package have been updated to work for all operating systems via `pathlib` ([125](https://github.com/scribe-org/Scribe-Data/issues/125)).
- The language formatting scripts have been dramatically simplified given changes to export paths all being the same.
- The `update_files` directory was removed in preparation of other means of showing data totals.
- The `language_data_extraction` directory was moved under the Wikidata directory as it's only used for those processes now ([446](https://github.com/scribe-org/Scribe-Data/issues/446)).
- The emoji keyword process was centralized to simplify project maintenance ([359](https://github.com/scribe-org/Scribe-Data/issues/359)).
- PyICU was removed as a dependency and a process was made to install it and its needed dependencies given the operating system of the user ([196](https://github.com/scribe-org/Scribe-Data/issues/196)).
- The data formatting step was centralized such that we only have one for all languages ([142](https://github.com/scribe-org/Scribe-Data/issues/142)).
- Sub-query processes are now no longer hard coded such that we'd need to maintain the total possible sub-queries within the `query_data.py` process.

3.3.0

✨ Features

- The translation process has been updated to allow for translations from non-English languages ([72](https://github.com/scribe-org/Scribe-Data/issues/72), [#73](https://github.com/scribe-org/Scribe-Data/issues/73), [#74](https://github.com/scribe-org/Scribe-Data/issues/74), [#75](https://github.com/scribe-org/Scribe-Data/issues/75), [#75](https://github.com/scribe-org/Scribe-Data/issues/75), [#76](https://github.com/scribe-org/Scribe-Data/issues/76), [#77](https://github.com/scribe-org/Scribe-Data/issues/77), [#78](https://github.com/scribe-org/Scribe-Data/issues/78), [#79](https://github.com/scribe-org/Scribe-Data/issues/79)).

🐞 Bug Fixes

- Annotation bugs were removed like repeat or empty values.
- Perfect tenses of Portuguese verbs were fixed via finding the appropriate PID ([68](https://github.com/scribe-org/Scribe-Data/issues/68)).
- Note that the most common past perfect property is not the standard one, so this will need to be fixed.

📝 Documentation

- The documentation has been given a new layout with the logo in the top left ([90](https://github.com/scribe-org/Scribe-Data/issues/90)).
- The documentation now has links to the code at the top of each page ([91](https://github.com/scribe-org/Scribe-Data/issues/91)).

♻️ Code Refactoring

- [pre-commit](https://pre-commit.com/) have been added to the repo to improve the development experience ([#137](https://github.com/scribe-org/Scribe-Data/issues/137)).
- Code formatting was shifted from [black](https://github.com/psf/black) to [Ruff](https://github.com/astral-sh/ruff).
- A Ruff based GitHub workflow was added to check the code formatting and lint the codebase on each pull request ([109](https://github.com/scribe-org/Scribe-Data/issues/109)).
- The `_update_files` directory was renamed `update_files` as these files are used in non-internal manners now ([57](https://github.com/scribe-org/Scribe-Data/issues/57)).
- A common function has been created to map Wikidata ids to noun genders ([69](https://github.com/scribe-org/Scribe-Data/issues/69)).
- The project now is installed locally for development and command line usage, so usages of `sys.path` have been removed from files ([122](https://github.com/scribe-org/Scribe-Data/issues/122)).
- The directory structure has been dramatically streamlined and includes folders for future projects where language data could come from other sources like Wiktionary ([139](https://github.com/scribe-org/Scribe-Data/issues/139)).
- Translation files are moved to their own directory.
- The `extract_transform` directory has been removed and all files within it have been moved one level up.
- The `languages` directory has been renamed `language_data_extraction`.
- All files within `wikidata/_resources` have been moved to the `resources` directory.
- The gender and case annotations for data formatting have now been commonly defined.
- All language directory `formatted_data` files have been now moved to the `scribe_data_json_export` directory to prepare for outputs being required to be directed to a directory outside of the package.
- Path computing has been refactored throughout the codebase, and unneeded functions for data transfers have been removed.

3.2.2

- Minor fixes to documentation index and file docstrings to fix errors.
- Revert change to package path definition to hopefully register the resources directory.

3.2.1

♻️ Code Refactoring

- The docs and tests were grafted into the package using `MANIFEST.in`.
- Minor fixes to file and function docstrings and documentation files.
- `include_package_data=True` is used in `setup.py` to hopefully include all files in the package distribution.

3.2.0

✨ Features

- The data and process needed for an English keyboard has been added ([39](https://github.com/scribe-org/Scribe-Data/issues/39)).
- The Wikidata queries for English have been updated to get all nouns and verbs.
- Formatting scripts have been written to prepare the queried data and load it into an SQLite database.
- The data update process has been cleaned up in preparation for future changes to Scribe-Data and to implement better practices.
- Language data was extracted into a JSON file for more succinct referencing ([52](https://github.com/scribe-org/Scribe-Data/issues/52)).
- Language codes are now checked with the package [langcodes](https://github.com/rspeer/langcodes) for easier expansion.
- A process has been created to check and update words that can be translated for each Scribe language ([44](https://github.com/scribe-org/Scribe-Data/issues/44)).
- The baseline data returned from Wikidata queries is now removed once a formatted data file is created.

🐞 Bug Fixes

- Tensorflow was removed from the download wiki process to fix build problems on Macs.

✅ Tests

- A full testing suite has been added to run on GitHub Actions ([37](https://github.com/scribe-org/Scribe-Data/issues/37)).
- Unit tests have been added for Wikidata queries ([48](https://github.com/scribe-org/Scribe-Data/issues/48)) and utility functions ([#50](https://github.com/scribe-org/Scribe-Data/issues/50)).

♻️ Code Refactoring

- The Anaconda based virtual environment was removed and documentation was updated to reflect this.
- Language data processes were moved into the `src/scribe_data/extract_transform/languages` directory to clean up the structure.
- Code formatting processes were defined with common structures based on language and word type variables defined at the top of files.

Page 1 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.