Vaex

Latest version: v4.17.0

Safety actively analyzes 682244 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 12

4.3.0

* Performance
* Reuse filter data when slicing a dataframe [1287](https://github.com/vaexio/vaex/pull/1287)
* Features
* Cache task results, with support for Redis and diskcache [1393](https://github.com/vaexio/vaex/pull/1393)
* df.func.stack for stacking columns into Nd arrays [1287](https://github.com/vaexio/vaex/pull/1287)
* Sliding windows / shift / diff / sum [1287](https://github.com/vaexio/vaex/pull/1287)
* Embed join/groupby/shift in dataset (opt in via df._future(), will be default in vaex v5) [1287](https://github.com/vaexio/vaex/pull/1287)
* df.fingerprint() - a cross runtime unique key for caching [1287](https://github.com/vaexio/vaex/pull/1287)
* limit rows in groupby using early stop [1391](https://github.com/vaexio/vaex/pull/1391)
* Compare date columns to string values formatted in ISO 8601 format 621a341b54f9b4112f24e2ffd86612753df19fef
* Fixes
* df.concat did not copy functions [1287](https://github.com/vaexio/vaex/pull/1287)
* Filters with column name equals to function names a159777e2dc13ec762914c51c8b5550efec5f845

4.2.0

* Performance
* Perform groupby in a sparse way for less memory usage/performance (up to 250x faster) [1381](https://github.com/vaexio/vaex/pull/1381)
* Features
* Sorted groupby [1339](https://github.com/vaexio/vaex/pull/1339)
* Fixes
* Proper use of logging framework [1384](https://github.com/vaexio/vaex/pull/1384)
* Aggregating with 'count' would ignore custom names [1345](https://github.com/vaexio/vaex/pull/1345)
* Join supports datetime column

4.1.0

* Features
* groupby datetime support [1265](https://github.com/vaexio/vaex/pull/1265)
* Fixes
* Improved fsspec support [1268](https://github.com/vaexio/vaex/pull/1268)
* Performance
* df.extract() uses mask instead of indices 398b682fe9042b3336120e9013e15bbd638620ed

4.0.0

* Fixes
* Repeated dropna/dropnan/dropmissing could report cached length. [874](https://github.com/vaexio/vaex/pull/874)
* Trimming concatenated columns. [860](https://github.com/vaexio/vaex/pull/860)
* percentile_approx works for 0 and 100 percentile. [818](https://github.com/vaexio/vaex/pull/818)
* Expression containing kwarg=True were treated as invalid. [861](hhttps://github.com/vaexio/vaex/pull/861)
* Unicode column names fully supported [974](https://github.com/vaexio/vaex/issues/974)
* Features
* Datetime floor method [843](https://github.com/vaexio/vaex/pull/843)
* dropinf (similar to dropna) [821](https://github.com/vaexio/vaex/pull/821)
* Support for streaming from Google Cloud Storage. [898](https://github.com/vaexio/vaex/pull/898)
* IPython autocomplete support (e.g. `df['hom' (tab)`) [961](https://github.com/vaexio/vaex/pull/961)
* Out of core Parquet support using Arrow Dataset scanning [993](https://github.com/vaexio/vaex/pull/993)
* Refactor
* Use `arrow.compute` for several string functions/kernels. [885](https://github.com/vaexio/vaex/pull/885)
* Separate DataFrame and Dataset. [865](https://github.com/vaexio/vaex/pull/865)
* Performance
* concat (vaex.concat or df.concat) is about 100x faster. [994](https://github.com/vaexio/vaex/pull/994)

vaex-distributed (DEPRECATED)
This is now part of vaex-enterprise (was a proof of content, never functional).

3.1.0

vaex-jupyter 0.5.2 (2020-6-12)
* Features
* Normalize histogram and change selection mode. [826](https://github.com/vaexio/vaex/pull/826)

3.0.0

* Breaking changes:
* Python 2 is not supported anymore
* Variables don't have access to pi and e anymore
* `df.rename_column` is now `df.rename` (and also renames variables)
* DataFrame uses a normal dict instead of OrderedDict, requiring Python >= 3.6
* Default limits (e.g. for plots) is minmax, so we don't miss outliers
* `df.get_column_names()` returns the aliased names (invalid identifiers), pass `alias=False` to get the internal column name
* Default value of `virtual` is True in method `df.export`, `df.to_dict`, `df.to_items`, `df.to_arrays`.
* df.dtype is a property, to get data types for expressions, use df.data_type(), df.expr.dtype is still behaving the same
* df.categorize takes min_value and max_value, and no longer needs the check argument, also the labels do not have to be strings.
* vaex.open/from_csv etc does not copy the pandas index by default [756](https://github.com/vaexio/vaex/pull/756)
* df.categorize takes an inplace argument, similar to most methods, and returns the dataframe affected.

vaex-core 2.0.0 (2020-5-24)
* Performance
* Printing out of dataframes done in 1 evaluate call, making remote dataframe printing faster. [571](https://github.com/vaexio/vaex/pull/557)
* Joining is faster and uses less memory (2x speedup measured) [586](https://github.com/vaexio/vaex/pull/586)
* Faster typechecks when adding columns of dtype=object (as often happens when coming from pandas) [612](https://github.com/vaexio/vaex/pull/612)
* Groupby 2x to 4x faster [730](https://github.com/vaexio/vaex/pull/730)
* Refactor
* Task system is refactored, with task execution on CPU being default, and makes (de)serialization easier. [571](https://github.com/vaexio/vaex/pull/557)
* Serialization/encoding of data structures is more flexible, allowing binary blobs and json over the wire. [571](https://github.com/vaexio/vaex/pull/557)
* Execution and tasks support async await [654](https://github.com/vaexio/vaex/pull/654)
* Fixes
* Renaming columns fixes [571](https://github.com/vaexio/vaex/pull/571)
* Joining with virtual columns but different data, and name collision fixes [570](https://github.com/vaexio/vaex/pull/570)
* Variables are treated similarly as columns, and respected in join [573](https://github.com/vaexio/vaex/pull/573)
* Arguments to lazy function which are numpy arrays gets put in the variables [573](https://github.com/vaexio/vaex/pull/573)
* Executor does not block after failed/interrupted tasks. [571](https://github.com/vaexio/vaex/pull/557)
* Default limits (e.g. for plots) is minmax, so we don't miss outliers [581](https://github.com/vaexio/vaex/pull/581)
* Do no fail printing out dataframe with 0 rows [582](https://github.com/vaexio/vaex/pull/582)
* Give proper NameError when using non-existing column names [299](https://github.com/vaexio/vaex/pull/299)
* Several fixes for concatenated dataframes. [590](https://github.com/vaexio/vaex/pull/590)
* dropna/nan/missing only dropped rows when all column values were missing, if no columns were specified. [600](https://github.com/vaexio/vaex/pull/600)
* Flaky test for RobustScaler skipped for p36 [614](https://github.com/vaexio/vaex/pull/614)
* Copying/printing sparse matrices [615](https://github.com/vaexio/vaex/pull/615)
* Sparse columns names with invalid identifiers are not rewritten. [617](https://github.com/vaexio/vaex/pull/617)
* Column names with invalid identifiers which are rewritten are shown when printing the dataframe. [617](https://github.com/vaexio/vaex/pull/617)
* Column name rewriting for invalid identifiers also works on virtual columns. [617](https://github.com/vaexio/vaex/pull/617)
* Fix the links to the example datasets. [609](https://github.com/vaexio/vaex/pull/609)
* Expression.isin supports dtype=object [669](https://github.com/vaexio/vaex/pull/669)
* Fix `colum_count`, now only counts hidden columns if explicitly specified [593](https://github.com/vaexio/vaex/pull/593)
* df.values respects masked arrays [640](https://github.com/vaexio/vaex/pull/640)
* Rewriting a virtual column and doing a state transfer does not lead to `ValueError: list.remove(x): x not in list` [592](https://github.com/vaexio/vaex/pull/592)
* `df.<stat>(limits=...)` will now respect the selection [651](https://github.com/vaexio/vaex/pull/651)
* Using automatic names for aggregators led to many underscores in name [687](https://github.com/vaexio/vaex/pull/687)
* Support Python3.8 [559](https://github.com/vaexio/vaex/pull/559)

* Features
* New lazy numpy wrappers: np.digitize and np.searchsorted [573](https://github.com/vaexio/vaex/pull/573)
* `df.to_arrow_table`/`to_pandas_df`/`to_items`/`df.to_dict`/`df.to_arrays` now take a chunk_size argument for chunked iterators [589](https://github.com/vaexio/vaex/pull/589) (https://github.com/vaexio/vaex/pull/699)
* Filtered datasets can be concatenated. [590](https://github.com/vaexio/vaex/pull/590)
* DataFrames/Executors are thread safe (meaning you can schedule/compute from any thread), which makes it work out of the box for Dash and Flask [670](https://github.com/vaexio/vaex/pull/670)
* `df.count/mean/std` etc can output in xarray.DataArray array type, makes plotting easier [671](https://github.com/vaexio/vaex/pull/671)
* Column names can have unicode, and we use str.isidentifier to test, also dont accidently hide columns. [617](https://github.com/vaexio/vaex/pull/617)
* Percentile approx can take a sequence of percentages [527](https://github.com/vaexio/vaex/pull/527)
* Polygon testing, useful in combinations with geo/geojson data [685](https://github.com/vaexio/vaex/pull/685)
* Added dt.quarter property and dt.strftime method to expression (by Juho Lauri) [682](https://github.com/vaexio/vaex/pull/682)

vaex-server 0.3.0 (2020-5-24)
* Refactored server, can return multiple binary blobs, execute multiple tasks, cancel tasks, encoding/serialization is more flexible (like returning masked arrays). [571](https://github.com/vaexio/vaex/pull/557)

vaex-viz 0.4.0 (2020-5-24)
* Requirement of vaex-core >=2,<3

vaex-graphql 0.1.0 (2020-5-24)
* Requirement of vaex-core >=2,<3

vaex-astro 0.7.0 (2020-5-24)
* Requirement of vaex-core >=2,<3

vaex-hdf5 0.6.0 (2020-5-24)
* Requirement of vaex-core >=2,<3

vaex-ml 0.9.0 (2020-5-24)
* Requirement of vaex-core >=2,<3

vaex-arrow 0.5.0 (2020-5-24)
* Requirement of vaex-core >=2,<3
* Fixes
* Booleans were negated, and didn't respect offsets.

vaex-jupyter 0.5.0 (2020-5-24)
* Requirement of vaex-core >=2,<3
* Breaking changes
* vaex-jupyter is refactored [654](https://github.com/vaexio/vaex/pull/654)

Page 4 of 12

Releases

Has known vulnerabilities

Previous Next

Vaex

Page 4 of 12

4.3.0

4.2.0

4.1.0

4.0.0

3.1.0

3.0.0

Page 4 of 12

Links

Releases