In the past, pygwalker has been using the data computing capabilities provided by graphic-walker, and all calculations were done in the browser.
However, when we need to analyze slightly larger datasets, the browser may crash or become unresponsive.
In the new pygwalker, graphic-walker only loads the data it needs; the calculation of the data is performed on the python side using duckdb. Duckdb is a very great database system; it provides a very simple invoke method and stronger computing power.♪(・ω・)ノ
Now, you can use pygwalker to explore larger datasets.
Enable kernel computation mode (duckDB) with
py
walker = pyg.walk(df, use_kernel_calc=True)
performance before (computation in javascript+webworker)
dataset with 300K rows
https://github.com/Kanaries/pygwalker/assets/22167673/0a22ae41-e66d-4790-9cce-828cf48a2010
performance now (based on duckDB)
https://github.com/Kanaries/pygwalker/assets/22167673/12da8496-2658-4f6c-afb2-b600da48237a
Feat
* add new calculation, use duckdb as computing engine in the kernel. https://github.com/Kanaries/pygwalker/pull/193
* update graphic-walker, add new filter tool and support limit feature. https://github.com/Kanaries/pygwalker/pull/193
Refactor
* refactor data parser https://github.com/Kanaries/pygwalker/pull/185
longxiaofei
**Full Changelog**: https://github.com/Kanaries/pygwalker/compare/0.2.0...0.3.0