Dataiter

Latest version: v1.0

Safety actively analyzes 722581 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 11

1.0

========================

* Silence warnings about writing NPZ files with StringDType:
"UserWarning: Custom dtypes are saved as python objects using the
pickle protocol. Loading this file requires allow_pickle=True to be
set."

Dataiter can now be considered stable. If upgrading from <= 0.51,
please read the release notes for 0.99–0.9999.

0.9999

===========================

* New module `dataiter.regex` for vectorized regular expressions
* Add proxy object `Vector.dt` for `dataiter.dt`
* Add proxy object `Vector.re` for `dataiter.regex`
* Add proxy object `Vector.str` for `numpy.strings`
* Use PyArrow instead of Pandas to read and write CSV files
* Replace Pandas dependency with PyArrow

This is likely to be a breaking change in some rare weirdly formatted
CSV files that Pandas and PyArrow might parse differently, resulting in
something like diffently guessed data types or differently detected
missing value markers. The note about stability below release 0.99 still
applies.

0.999

==========================

* `DataFrame.fom_arrow`: Remove `strings_as_object` argument
* `DataFrame.from_pandas`: Remove `strings_as_object` argument
* `DataFrame.read_csv`: Remove `strings_as_object` argument
* `DataFrame.read_parquet`: Remove `strings_as_object` argument
* `GeoJSON.read`: Remove `strings_as_object` argument
* `ListOfDicts.to_data_frame`: Remove `strings_as_object` argument
* `read_csv`: Remove `strings_as_object` argument
* `read_geojson`: Remove `strings_as_object` argument
* `read_parquet`: Remove `strings_as_object` argument
* `Vector.as_string`: Remove `length` argument
* `Vector.is_na`: Fix to work in multidimensional cases where the
elements of an object vector are arrays/vectors
* `Vector.rank`: Change default `method` to "min"
* `Vector.rank`: Remove `method` "average"

This is a breaking change to switch the string data type from the
fixed-width `str_` a.k.a. `<U` to the variable-width `StringDType`
introduced in NumPy 2.0. The main benefit is greatly reduced memory use,
making strings usable without needing to be careful or falling back to
`object`. The note about stability below release 0.99 still applies.

Note that as `StringDType` is only in NumPy >= 2.0, any NPZ or Pickle
files saved cannot be opened using Dataiter < 0.99 and NumPy < 2.0. If
you need that kind of interoperability, consider using the Parquet file
format.

0.99

=========================

* Adapt to changes in NumPy 2.0
* Bump NumPy dependency to >= 2.0

This is a minimal change to be NumPy 2.0 compatible. In the 0.99+
releases, we plan to adopt the new NumPy string dtype and fix any
regressions that come up, leading to a 1.0 release when everything looks
to be working reliably (26). Anyone looking for extreme stability
should consider avoiding the 0.99+ releases and waiting for 1.0.

0.51

=========================

* Mark NumPy dependency as < 2.0

0.50

=========================

* `ListOfDicts.drop_na`: New method
* `ListOfDicts.keys`: New method
* `ListOfDicts.print_memory_use`: New method
* Fix tabular display of Unicode characters with width != 1
* Add dependency on wcwidth: https://pypi.org/project/wcwidth

Page 1 of 11

Releases

Has known vulnerabilities

Dataiter

Page 1 of 11

1.0

0.9999

0.999

0.99

0.51

0.50

Page 1 of 11

Links

Releases