Polars-ds

Latest version: v0.4.3

Safety actively analyzes 623165 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

0.19.13

3. FFT. Currently it is slow for larger series.
4. Column-wise Jaccard similarity + Pairwise list vs. list Jaccard similarity
5. Binarize a column
6. Computes t-statistics

**String:**

1. Freq removal. Remove rare or very common words from list of strings. (output of tokenize)
2. Infer and merge strings based on frequency
3. Extract numbers as list
7. Added parallel option and documentation. When using str_ext functions, parallel is generally faster if it is the only expression you are running and has the same caveat as the parallel parameter in .value_counts

What's Changed
* Change statistical description of standard error by mishpat in https://github.com/abstractqqq/polars_ds_extension/pull/8
* Fix/import api and more feats by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/7
* feat/roc_auc by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/9
* updated readme by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/10

New Contributors
* mishpat made their first contribution in https://github.com/abstractqqq/polars_ds_extension/pull/8

**Full Changelog**: https://github.com/abstractqqq/polars_ds_extension/compare/v0.1.0...v0.1.1

0.4.3

What's Changed
* Kendall tau correlation coefficient by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/137
* added combined pds.corr expression by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/139
* new pca plot and lstsq plot by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/140
* fixed a bug in lstsq regarding group by by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/143
* added null dist by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/144
* Improve convolve perf, especially when filter size is small. added method option by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/145


**Full Changelog**: https://github.com/abstractqqq/polars_ds_extension/compare/v0.4.2...v0.4.3

0.4.2

Highlights

In Diagnosis, you can now generate 2d principal component graphs. This can help you visualize higher dimensional data.
python
dia = DIA(df)
dia.plot_pc2(pl.all().exclude("species"), by = "species")

In addition, the following PCA related queries are available:

df.select(
pds.query_pca("a", "b") singular values and weight vectors
).unnest("a")

df.select(
pds.query_singular_values("a", "b", center = True, as_explained_var=True)
)

The Xi - Correlation is also implemented:

df.select(
pds.xi_corr("x", "y")
)


What's Changed
* try new maturin config by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/128
* added basic pca exprs by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/129
* added infer_prob by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/131
* More pca by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/132
* xi_corr by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/134


**Full Changelog**: https://github.com/abstractqqq/polars_ds_extension/compare/v0.4.1-release...v0.4.2

v0.4.1-release
[Original release note](https://github.com/abstractqqq/polars_ds_extension/releases/tag/v0.4.1)

What's Changed
* upgraded download action in CI by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/124
* removed unncessary test-deps by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/126


**Full Changelog**: https://github.com/abstractqqq/polars_ds_extension/compare/v0.4.1-fix-release...v0.4.1-release

v0.4.1-fix-release

0.4.1

What's Changed
* updated packages, OLS now falls back to LU decomp when X^TX is not strictly positive definite. by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/122

0.4.0

Breaking Changes

For almost all old functions that were invoked like

python
pl.col("a").name_space.method_call(, ...)


You can now call them via

python
import polars_ds as pds

pds.method_call(...)


Linters now should recognize the package and all its methods.


What's Changed
* added meta and str_stats for Diagnosis by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/110
* added iqr in Diagnosis by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/111
* Refactor F stats by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/112
* Sampling functionalities by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/114
* Added weighted corr, etc. by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/116
* Fix regex `SyntaxWarning` on Python 3.12+ by jorenham in https://github.com/abstractqqq/polars_ds_extension/pull/118
* Clean up codebase by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/119
* added heat map for corr in Diagnosis by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/121

New Contributors
* jorenham made their first contribution in https://github.com/abstractqqq/polars_ds_extension/pull/118

**Full Changelog**: https://github.com/abstractqqq/polars_ds_extension/compare/v0.3.5...v0.4.0

0.3.5

Breaking Changes:
1. Now there are two types of random column generation methods: 1. With a column reference. These methods are renamed from sample_xxx to rand_xxx. They behave the same as the old way and must be called with a reference column. New methods generate random columns without any reference, therefore new methods won't respect null and won't use reference's statistics. But new methods are easier to use. Using new methods to get a random df.

python
import polars as pl
import polars_ds as pds

df = pds.random_data(size=100_000, n_cols = 1).select(
pds.random(0.0, 12.0).alias("uniform_1"),
pds.random(0.0, 1.0).alias("uniform_2"),
pds.random_exp(0.5).alias("exp"),
pds.random_normal(0.0, 1.0).alias("normal"),
pds.random_normal(0.0, 1000.0).alias("fat_normal"),
)
df.head()




What's Changed
* Knn entropy by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/104
* diagonsis basics by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/105
* Better stats by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/106
* Add profile by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/107
* added exclude in dependency plots by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/108
* add Transfer Entropy and related measures by remiadon in https://github.com/abstractqqq/polars_ds_extension/pull/84

New Contributors
* remiadon made their first contribution in https://github.com/abstractqqq/polars_ds_extension/pull/84

**Full Changelog**: https://github.com/abstractqqq/polars_ds_extension/compare/v0.3.4-fix-release...v0.3.5

v0.3.4-fix-release
Highlights in v0.3.4 (Re-upload to PyPI)


import polars_ds as pds


You can now access the following queries by calling them directly from pds (e.g. pds.query_lstsq, etc.), and your Linter will recognize them!

1. All knn related queries
2. All lstsq (linear regression) related queries
3. All graph related queries
4. Most common metrics
5. Miscallenous, which are self-explanatory by the names

More will be added if appropriate. For most occasions, in pds.func() arguments, you can use either str or pl.Expr. For example:

python
df.select(
pds.query_lstsq_report(
pl.col("x1"), "x2", str | pl.Expr
target = "y",
add_bias=False
).alias("report")
).unnest("report")

shape: (2, 5)
┌──────────┬───────┬────────────┬────────────┬───────┐
│ feat_idx ┆ coeff ┆ std_err ┆ t ┆ p>|t| │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ u16 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞══════════╪═══════╪════════════╪════════════╪═══════╡
│ 0 ┆ 2.0 ┆ 2.3854e-16 ┆ 8.3842e15 ┆ 0.0 │
│ 1 ┆ -1.0 ┆ 9.0158e-17 ┆ -1.1092e16 ┆ 0.0 │
└──────────┴───────┴────────────┴────────────┴───────┘


See more in [examples/basics.ipynb](https://github.com/abstractqqq/polars_ds_extension/blob/v0.3.4/examples/basics.ipynb)

BREAKING
* Graph queries are refactored to take in two columns of u32 as node and link. If node = 10, and link = 2, it means node 10 has an out-edge to node 2.
* Eigenvector Centrality is temporarily gone.
* KNN's default index is now of type u32 instead of u64

What's Changed
* use uv in building env by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/87
* update readme by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/88
* Update polars by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/89
* added power transform without estimating lambda by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/91
* added tversky index by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/92
* Better ux - part 1 by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/95
* added skip_null in lstsq by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/96
* fix lstsq by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/98
* faster ols by abstractqqq in https://github.com/abstractqqq/polars_ds_extension/pull/100

Special Thanks
* Special thanks to wukan1986 for pushing me to make changes to make lstsq better.

**Full Changelog**: https://github.com/abstractqqq/polars_ds_extension/compare/v0.3.3...v0.3.4

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.