Demicode

Latest version: v1.4.0

Safety actively analyzes 682387 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

1.4

This release updates Demicode with support for the upcoming release of Unicode 16.0. That includes the ability to run with prerelease data in general and to run code generation without requiring full access to the Unicode character database files (which creates a circular dependency and results in a crash).

Unicode 16.0 again makes substantial changes to the definition of grapheme clusters. Nonetheless, Demicode's implementation of grapheme cluster breaking passed *all* updated tests without requiring any changes. I see that as validation of Demicode's approach, which uses a clever encoding of Unicode properties as Unicode letters and a straight-forward regular expression obtained by applying the encoding to the rules from [Unicode Standard Annex 29](http://www.unicode.org/reports/tr29/) on text segmentation.

Since the preliminary files for version 16.0 of the Unicode Character Database have already been posted on Unicode's website, you too can run Demicode 1.4 with the prerelease data. Just add the `--ucd-version 16.0.0` option on the command line. Without that option, Demicode continues to default to Unicode 15.1—until the next weekly update check after the release of Unicode 16.0. By contrast, Demicode 1.3 fails with an error declaring that Unicode 16.0 is "from future." Well, with Demicode 1.4, the future is now! 🎉

1.3

This release greatly simplifies running demicode across several popular terminal emulators, at least on macOS. It also fixes 1.

1.2

With this release, demicode gains the ability to benchmark page rendering. [Initial results](https://github.com/apparebit/demicode/blob/boss/perf.json) for nine terminal applications suggest that all of them are reasonably fast at rendering styled text, taking 4–9ms for a 120×40 page on a four-year-old macOS laptop. But when demicode queries the terminal for the current column (once each for 38 of those 40 lines), the spread of average latencies explodes to 10-946ms. Judging by these results, it seems that a few terminals strongly oversell their nimbleness.

This release also improves the mirroring of UCD and CLDR data, introducing a from the ground rewrite that uses an [explicit manifest](https://github.com/apparebit/demicode/blob/boss/ucd.manifest.json) to track what data has been mirrored. To see for yourself, `--ucd-list-versions` lists the UCD versions included in the current mirror. The implementation also is more structured and performs more aggressive error checking. As of today, demicode is using GitHub actions for CI, which hopefully ensures that demicode releases become only more robust.

1.1

User-Visible Changes

This release makes the following major changes:

* It fixes a crashing bug for mirrored CLDR files.
* It improves terminal input/output, notably by `--incrementally`/`-I` displaying character blots. That does markedly slow down tool output. But it also allows for measuring the size of character blots by querying the terminal.

Internal Changes

This release also makes significant internal changes. Notably, the **UCD implementation** is becoming more uniform and more decoupled. The long-term goal is to provide a generally useful UCD abstraction that may not be the fastest but has excellent support for exploratory coding against the UCD.

The **development setup** has also been updated. Instead of mypy, demicode now uses [pyright](https://github.com/microsoft/pyright) for type-checking. In my experience, pyright is more accurate than mypy for the same annotations. It has also surfaced two very subtle bugs. They both are fixed.

The `runtest.py` script runs both type checker and unit tests. Tests are based on Python's `unittest` package because I find `pytest` too invasive and too magical, which always ends up interfering with tests in the long term. Unfortunately, `unittest` is rather baroque and hard to extend because (1) its interfaces are too wide and (2) it hides critical state. The `test.runtime` module introduces adapter classes that fix these issues for `unittest.TestCase` and `unittest.TestResult`. The test script uses them to provide more readable and helpful output.

1.0

This version adds support for Unicode 15.1. Notably, it incorporates the changes to the grapheme cluster breaking algorithm, which changed substantially since Unicode 15.0. The changes are automatically activated when `UnicodeCharacterDatabase` is instantiated with 15.1 and they are effectively no-ops for 15.0 and earlier.

The `--stats` option now prints the bit-width for Unicode properties, too. It also includes data on code points that have non-default values for both the `Indic_Conjunct_Break` and `Grapheme_Cluster_Break` properties. Such overlap matters because both properties help determine grapheme cluster breaks. If feasible, integrating both into the same enumeration with single letter enumeration constant values simplifies the implementation of the break algorithm significantly.

1.0.b1

Demicode's user experience is much improved: It now pages back and forth. On Linux and macOS it only takes a keypress—take your pick: `‹left›`/`‹right›`, `b`/`f`, `p`/`n`, `‹tab›`/`‹shift-tab›`, `‹space›`/`‹delete›`—to select the next page. For now, Windows still requires you to type a letter, `backward`/`forward`, `previous`/`next` work too, and then follow the letter or command with `‹return›`. Though `‹return›` by itself continues to page forward as well.

This release has been tested with all known Unicode versions from 4.1 forward and does run with them. It also removes several unused Unicode properties that are likely to remain so and introduces several more, which will be needed for implementing grapheme cluster breaks according to the revised Unicode 15.1 algorithm.

The new `--with-ucd-extended-pictographic` command line option blots all characters that have the Extended_Pictographic property, including unassigned ones. Since that's quite the mouthful and the set of characters especially important for fixed-width rendering, the much shorter `-x` works, too. Similarly, `--with-curation` has `-q` as an alias.

Internally, this release incorporates a significant refactor of the code for loading Unicode Character Database files. Much of the clutter and boilerplate has been eliminated, since I finally found a pattern that is both simple and also flexible enough to accommodate the loading of most files: It requires two lines, one for the context manager that mirrors and opens the file and one for the parser, with a callback constructing the desired datatype. The global `UCD` singleton instance has been eliminated as well. A direct beneficiary is statistics collection with `--stats`: It now uses its own private instance and can hence print counts for both the unoptimized and optimized internal representation in one run.

There are no more features to add nor modules to refactor. At least no in the short term. Once Unicode 15.1 has been released, I'll update the grapheme cluster breaking algorithm to account for Indic syllables as well. So please consider this first beta more or less a release candidate for the big 1.0.0, too.

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.