✨ Highlights ✨
📣 Pandera now supports validation of `polars.DataFrame` and `polars.LazyFrame` 🐻❄️!
You can now do this:
python
import pandera.polars as pa
import polars as pl
class Schema(pa.DataFrameModel):
state: str
city: str
price: int = pa.Field(in_range={"min_value": 5, "max_value": 20})
lf = pl.LazyFrame(
{
'state': ['FL','FL','FL','CA','CA','CA'],
'city': [
'Orlando',
'Miami',
'Tampa',
'San Francisco',
'Los Angeles',
'San Diego',
],
'price': [8, 12, 10, 16, 20, 18],
}
)
Schema.validate(lf).collect()
And of course you can do functional validation with decorators like so:
python
from pandera.typing.polars import LazyFrame
pa.check_types
def function(lf: LazyFrame[Schema]) -> LazyFrame[Schema]:
return lf.filter(pl.col("state").eq("CA"))
function(lf).collect()
You can read more about the integration [here](https://pandera.readthedocs.io/en/stable/polars.html). Not all pandera features are supported at this point, but depending on community demand/contributions we'll slowly add them. To learn more about what's currently supported, check out [this table](https://pandera.readthedocs.io/en/stable/index.html#supported-features-by-dataframe-backend).
Special shoutout to AndriiG13 and FilipAisot for their contributions on the built-in checks and polars datatypes, respectively, and to evanrasmussen9, baldwinj30, obiii, Filimoa, philiporlando, r-bar, alkment, jjfantini, and robertdj for their early feedback and bug reports during the 0.19.0 beta.
What's Changed
* Support polars DataFrames, LazyFrames by cosmicBboy, AndriiG13, and FilipAisot in https://github.com/unionai-oss/pandera/pull/1373
* bugfix: optional columns in polars schema should no longer raise errors when not present by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1532
* `check_nullable` does not uselessly compute `isna()` anymore in pandas backend by smarie in https://github.com/unionai-oss/pandera/pull/1538
* Polars LazyFrames are validated at the schema-level by default by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1534
* Enable from_format_kwargs for dict format by ektar in https://github.com/unionai-oss/pandera/pull/1539
* Convert docs to myst by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1542
* fix README(tab to space) by np-yoe in https://github.com/unionai-oss/pandera/pull/1544
* pandas DataFrameModel accepts python generic types by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1547
* Backend registration happens at schema initialization by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1548
* do not format if test is not necessary by mattB1989 in https://github.com/unionai-oss/pandera/pull/1530
* Register default backends when restoring state by alkment in https://github.com/unionai-oss/pandera/pull/1550
* Bump actions/setup-python from 4 to 5 by dependabot in https://github.com/unionai-oss/pandera/pull/1452
* fix: prevent environment pollution when importing pyspark by sam-goodwin in https://github.com/unionai-oss/pandera/pull/1552
* use rst to speed up api docs generation by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1557
* Add _GenericAlias.__call__ patch by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1561
* support typeguard < 3 for better compatability by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1563
* Add parse function to DataFrameModel in https://github.com/unionai-oss/pandera/pull/1181
* localize GenericAlias patch to DataFrameBase subclasses by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1571
* Bump idna from 3.4 to 3.7 by dependabot in https://github.com/unionai-oss/pandera/pull/1569
* docs: fix typo in env var name by alekseik1 in https://github.com/unionai-oss/pandera/pull/1562
* polars: fix element-wise checks, register backends by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1572
* remove pytest ignore on modin, dask. pyspark tests with pandas >= 2 by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1573
* make sure check name is propagated to error report by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1574
* update ci to run pyspark, modin, dask with pandas >= v2 by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1575
* use sphinx-design instead of sphinx-panels by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1581
* Update bug_report.md by philiporlando in https://github.com/unionai-oss/pandera/pull/1585
* bugfix: polars column core checks now return check output by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1586
* make pandera.typing.Series[TYPE] error in polars DataFrameModel more readable by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1588
* implement timezone agnostic polars_engine.DateTime type by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1589
* fix pyspark import error by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1591
* fix pyspark tests when run on full test suite by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1593
* Bugfix/1580 by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1596
* Set pandas_io.from_frictionless_schema to use a raw string for docs by mark-thm in https://github.com/unionai-oss/pandera/pull/1597
* Add a generic Series type for polars by baldwinj30 in https://github.com/unionai-oss/pandera/pull/1595
* Add StructType and DDL extraction from Pandera schemas by filipeo2-mck in https://github.com/unionai-oss/pandera/pull/1570
* Clean up typing for pandas GenericDtype by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1601
* Adding warning for unique in pyspark field and a test showing the issue as well as config when it works. by zippeurfou in https://github.com/unionai-oss/pandera/pull/1592
* bugfix/1607: coercion error should correctly report relevant failure cases by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1608
* Create a common DataFrameSchema class, update mypy used in pre-commit by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1609
* Dataframe column schema by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1611
* bugfix: column-level coercion is properly implemented by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1612
* update docs for polars by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1613
* fix: properly coerce dtypes for columns with regex=True by tesslinden in https://github.com/unionai-oss/pandera/pull/1602
* rewrite Check class docstrings to remove pandas assumption by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1614
* add tests for polars decorators by cosmicBboy in https://github.com/unionai-oss/pandera/pull/1615
New Contributors
* smarie made their first contribution in https://github.com/unionai-oss/pandera/pull/1538
* ektar made their first contribution in https://github.com/unionai-oss/pandera/pull/1539
* np-yoe made their first contribution in https://github.com/unionai-oss/pandera/pull/1544
* alkment made their first contribution in https://github.com/unionai-oss/pandera/pull/1550
* sam-goodwin made their first contribution in https://github.com/unionai-oss/pandera/pull/1552
* alekseik1 made their first contribution in https://github.com/unionai-oss/pandera/pull/1562
* philiporlando made their first contribution in https://github.com/unionai-oss/pandera/pull/1585
* mark-thm made their first contribution in https://github.com/unionai-oss/pandera/pull/1597
* baldwinj30 made their first contribution in https://github.com/unionai-oss/pandera/pull/1595
* zippeurfou made their first contribution in https://github.com/unionai-oss/pandera/pull/1592
* tesslinden made their first contribution in https://github.com/unionai-oss/pandera/pull/1602
**Full Changelog**: https://github.com/unionai-oss/pandera/compare/v0.18.3...v0.19.0