Pyogrio

Latest version: v0.8.0

Safety actively analyzes 629678 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.8.0

Improvements

- Support for writing based on Arrow as the transfer mechanism of the data
from Python to GDAL (requires GDAL >= 3.8). This is provided through the
new `pyogrio.raw.write_arrow` function, or by using the `use_arrow=True`
option in `pyogrio.write_dataframe` (314, 346).
- Add support for `fids` filter to `read_arrow` and `open_arrow`, and to
`read_dataframe` with `use_arrow=True` (304).
- Add some missing properties to `read_info`, including layer name, geometry name
and FID column name (365).
- `read_arrow` and `open_arrow` now provide
[GeoArrow-compliant extension metadata](https://geoarrow.org/extension-types.html),
including the CRS, when using GDAL 3.8 or higher (366).
- The `open_arrow` function can now be used without a `pyarrow` dependency. By
default, it will now return a stream object implementing the
[Arrow PyCapsule Protocol](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html)
(i.e. having an `__arrow_c_stream__`method). This object can then be consumed
by your Arrow implementation of choice that supports this protocol. To keep
the previous behaviour of returning a `pyarrow.RecordBatchReader`, specify
`use_pyarrow=True` (349).
- Warn when reading from a multilayer file without specifying a layer (362).
- Allow writing to a new in-memory datasource using io.BytesIO object (397).

Bug fixes

- Fix error in `write_dataframe` if input has a date column and
non-consecutive index values (325).
- Fix encoding issues on windows for some formats (e.g. ".csv") and always write ESRI
Shapefiles using UTF-8 by default on all platforms (361).
- Raise exception in `read_arrow` or `read_dataframe(..., use_arrow=True)` if
a boolean column is detected due to error in GDAL reading boolean values for
FlatGeobuf / GPKG drivers (335, 387); this has been fixed in GDAL >= 3.8.3.
- Properly ignore fields not listed in `columns` parameter when reading from
the data source not using the Arrow API (391).
- Properly handle decoding of ESRI Shapefiles with user-provided `encoding`
option for `read`, `read_dataframe`, and `open_arrow`, and correctly encode
Shapefile field names and text values to the user-provided `encoding` for
`write` and `write_dataframe` (384).
- Fixed bug preventing reading from bytes or file-like in `read_arrow` /
`open_arrow` (407).

Packaging

- The GDAL library included in the wheels is updated from 3.7.2 to GDAL 3.8.5.

Potentially breaking changes

- Using a `where` expression combined with a list of `columns` that does not include
the column referenced in the expression is not recommended and will now
return results based on driver-dependent behavior, which may include either
returning empty results (even if non-empty results are expected from `where` parameter)
or raise an exception (391). Previous versions of pyogrio incorrectly
set ignored fields against the data source, allowing it to return non-empty
results in these cases.

0.7.2

Not secure
Bug fixes

- Add `packaging` as a dependency (320).
- Fix conversion of WKB to geometries with missing values when using
`pandas.ArrowDtype` (321).

0.7.1

Not secure
Bug fixes

- Fix unspecified dependency on `packaging` (318).

0.7.0

Not secure
Improvements

- Support reading and writing datetimes with timezones (253).
- Support writing dataframes without geometry column (267).
- Calculate feature count by iterating over features if GDAL returns an
unknown count for a data layer (e.g., OSM driver); this may have signficant
performance impacts for some data sources that would otherwise return an
unknown count (count is used in `read_info`, `read`, `read_dataframe`) (271).
- Add `arrow_to_pandas_kwargs` parameter to `read_dataframe` + reduce memory usage
with `use_arrow=True` (273)
- In `read_info`, the result now also contains the `total_bounds` of the layer as well
as some extra `capabilities` of the data source driver (281).
- Raise error if `read` or `read_dataframe` is called with parameters to read no
columns, geometry, or fids (280).
- Automatically detect supported driver by extension for all available
write drivers and addition of `detect_write_driver` (270).
- Addition of `mask` parameter to `open_arrow`, `read`, `read_dataframe`,
and `read_bounds` functions to select only the features in the dataset that
intersect the mask geometry (285). Note: GDAL < 3.8.0 returns features that
intersect the bounding box of the mask when using the Arrow interface for
some drivers; this has been fixed in GDAL 3.8.0.
- Removed warning when no features are read from the data source (299).
- Add support for `force_2d=True` with `use_arrow=True` in `read_dataframe` (300).

Other changes

- test suite requires Shapely >= 2.0

- using `skip_features` greater than the number of features available in a data
layer now returns empty arrays for `read` and an empty DataFrame for
`read_dataframe` instead of raising a `ValueError` (282).
- enabled `skip_features` and `max_features` for `read_arrow` and
`read_dataframe(path, use_arrow=True)`. Note that this incurs overhead
because all features up to the next batch size above `max_features` (or size
of data layer) will be read prior to slicing out the requested range of
features (282).
- The `use_arrow=True` option can be enabled globally for testing using the
`PYOGRIO_USE_ARROW=1` environment variable (296).

Bug fixes

- Fix int32 overflow when reading int64 columns (260)
- Fix `fid_as_index=True` doesn't set fid as index using `read_dataframe` with
`use_arrow=True` (265)
- Fix errors reading OSM data due to invalid feature count and incorrect
reading of OSM layers beyond the first layer (271)
- Always raise an exception if there is an error when writing a data source
(284)

Potentially breaking changes

- In `read_info` (281):
- the `features` property in the result will now be -1 if calculating the
feature count is an expensive operation for this driver. You can force it to be
calculated using the `force_feature_count` parameter.
- for boolean values in the `capabilities` property, the values will now be
booleans instead of 1 or 0.

Packaging

- The GDAL library included in the wheels is updated from 3.6.4 to GDAL 3.7.2.

0.6.0

Not secure
Improvements

- Add automatic detection of 3D geometries in `write_dataframe` (223, 229)
- Add "driver" property to `read_info` result (224)
- Add support for dataset open options to `read`, `read_dataframe`, and
`read_info` (233)
- Add support for pandas' nullable data types in `write_dataframe`, or
specifying a mask manually for missing values in `write` (219)
- Standardized 3-dimensional geometry type labels from "2.5D <type>" to
"<type> Z" for consistency with well-known text (WKT) formats (234)
- Failure error messages from GDAL are no longer printed to stderr (they were
already translated into Python exceptions as well) (236).
- Failure and warning error messages from GDAL are no longer printed to
stderr: failures were already translated into Python exceptions
and warning messages are now translated into Python warnings (236, 242).
- Add access to low-level pyarrow `RecordBatchReader` via
`pyogrio.raw.open_arrow`, which allows iterating over batches of Arrow
tables (205).
- Add support for writing dataset and layer metadata (where supported by
driver) to `write` and `write_dataframe`, and add support for reading
dataset and layer metadata in `read_info` (237).

Packaging

- The GDAL library included in the wheels is updated from 3.6.2 to GDAL 3.6.4.
- Wheels are now available for Linux aarch64 / arm64.

0.5.1

Not secure
Bug fixes

- Fix memory leak in reading files (207)
- Fix to only use transactions for writing records when supported by the
driver (203)

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.