This has turned into a suprisingly big release, with a major refactor and a brand new
class (the first for 12 years!) There are also a couple of small possibly breaking changes
detailed below, in particular 'auto' initialising bitstrings from integers is now disallowed.
Speed increased with bitarray dependency.
The major weakness of bitstring has been its poor performance for computationally
intensive tasks relative to lower level alternatives. This was principally due to
relying on pure Python code to achieve things that the base language often didn't have
fast ways of doing.
This release starts to address that problem with a fairly extensive rewrite to replace
much of the pure Python low-level bit operations with methods from the bitarray package.
This is a package that does many of the same things as bitstring, and the two packages
have co-existed for a long time. While bitarray doesn't have all of the options and
facilities of bitstring it has the advantage of being very fast as it is implemented in C.
By replacing the internal datatypes I can speed up bitstring's operations while keeping
the same API.
Huge kudos to Ilan Schnell for all his work on bitarray.
New Array class for homogeneous data (beta)
If your data is all of the same type you can make use of the new Array class, which
mirrors much of the functionality of the standard array.array type, but doesn't restrict
you to just a dozen formats.
>>> from bitstring import Array
>>> a = Array('uint7', [9, 100, 3, 1])
>>> a.data
BitArray('0x1390181')
>>> b = Array('float16', a.tolist())
>>> b.append(0.25)
>>> b.tobytes()
b'H\x80VB\x00<\x004\x00'
>>> b.tolist()
[9.0, 100.0, 3.0, 1.0, 0.25]
The data is stored efficiently in a BitArray object, and you can manipulate both the
data and the Array format freely. See the main documentation for more details. Note that
this feature carries the 'beta' flag so may change in future point versions.
Other changes:
* Added two new floating point interpretations: float8_143 and float8_152. These are 8-bit
floating point formats, with very limited range and precision, but useful in some fields,
particularly machine learning. This is an experimental feature - the formats haven't
even been standardised yet.
>>> a = Bits(float8_143=16.5)
>>> a.bin
'01100000'
>>> a.float8_143
16.0
* Auto initialistion from ints has been removed and now raises a TypeError. Creating a
bitstring from an int still creates a zeroed bitstring of that length but ints won't
be promoted to bitstrings as that has been a constant source of errors and confusion.
>>> a = BitArray(100) Fine - create with 100 zeroed bits
>>> a += 0xff TypeError - previously this would have appended 0xff (=255) zero bits.
>>> a += '0xff' Probably what was meant - append eight '1' bits.
>>> a += Bits(255) Fine, append 255 zero bits.
This is a breaking change, but it breaks loudly with an exception, it is easily recoded,
and it removes a confusing wrinkle.
* Explicitly specifying the 'auto' parameter is now disallowed rather than discouraged.
It was always meant to be a positional-only parameter (and will be once I can drop
Python 3.7 support) but for now it's renamed to `__auto`. In the unlikely event
this breaks code, the fix should be just to delete the `auto=` if it's already the
first parameter.
>>> s = Bits(auto='0xff') Now raises a CreationError
>>> s = Bits('0xff') Fine, as always
* Deleting, replacing or inserting into a bitstring resets the bit position to 0 if the
bitstring's length has been changed. Previously the bit position was adjusted but
this was not well defined.
* Only empty bitstring are now considered False in a boolean sense. Previously s was
False is no bits in s were set to 1, but this goes against what it means to be a
container in Python so I consider this to be a bug, even if it was documented. I'm
guessing it's related to `__nonzero__` in Python 2 becoming `__bool__` in Python 3, and
it's never been fixed before now.
* Casting to bytes now behaves as expected, so that `bytes(s)` gives the same result as
`s.tobytes()`. Previously it created a byte per bit.
* Pretty printing with the 'bytes' format now uses characters from the 'Latin Extended-A'
unicode block for non-ASCII and unprintable characters instead of replacing them with '.'
* When using struct-like codes you can now use '=' instead of '' to signify native-
endianness. They behave identically, but the new '=' is now preferred.
* More fixes for LSB0 mode. There are now no known issues with this feature.