daffodil Changelog

0.4.2

v0.4.2 (2024-05-01)
Modified packaging for package distribution on PyPI to hopefully make it compatible with installing into AWS Lambdas.
Tried to use pyproject.toml and flit, but flit has poor toml parsing it seems, and could not find a suitable toml file.
Went back to setup.py and setuptools, but reorganized files into daffodil/src folder which will be included in the distro.
To use --editable mode for local development, must set PYTHONPATH to refer to the daffodil/src folder.
In that folder is daffodil/daf.py and daffodil/lib/daf_(name).py
To import supporting py files from lib, must use import daffodil.lib.daf_utils as utils, for example.

0.4.1

v0.4.0 (2024-04-30)
v0.4.0 Better dtypes support; apply_dtypes(), flatten(), copy()
added disabling of garbage collection during timing, getting more consistent results, but does not explain anomaly.
Improved philosophy of apply_dtypes() and flatten()
Upon loading of csv file, set dtypes and then use my_daf.apply_dtypes()
Before writing, use my_daf.flatten() to flatten any list or dict types, if applicable.

apply_dtypes() now handles the entire array, and will skip str entries if initially_all_str is True.

unflatten_cols() DEPRECATED. use apply_dtypes()
unflatten_by_dtypes() DEPRECATED. use apply_dtypes()
flatten_cols() DEPRECATED. use flatten()
flatten_by_dtypes() Renamed: use flatten()

added optional dtypes parameter in apply_dtypes() which will be used to initialize dtypes in daf object and
use it to convert types within the array.
Changed from la type to interable in reduce()
added disabling of garbage collection in daf_benchmarks.py
deprecated functions dealing with hdlol type which was a precursor to daf.
added convert_type_value() to convert a single value to a desired type and unflatten if enabled.
removed use of set_dict_dtypes from apply_dtypes() and instead it is done on the entire daf array for efficiency.
added in daf_utils.py unflatten_val(), json_decode(), validate_json_with_error_details, and safe_convert_json_to_obj.
Added .copy(deep:bool) method to match pandas syntax.
Added reference to 1994 workshop in flatten() method docstr.
Changed packaging code from setup.py approach to pyproject, but still not able to correctly import in Lambdas container.

v0.4.1 (2024-04-30)
fixed tests to reflect changes to type conversion paradigm.
Changed apply_dtypes parameter 'initially_all_str' to 'from_str'
fixed set_dict_dtypes() in the case of dtypes = {}; Changed parameter to 'dtypes' for uniformity.
set_dict_dtypes() now also modifies types in-place.

Fixes: https://github.com/raylutz/daffodil/issues/10

0.3.0

v0.2.2 Changed the file structure to be incompliance with Python traditions.
user can use 'from daffodil.daf import Daf' and then Daf()
Moved daf.py containing class Daf to the top level.
put supporting functions in lib.
added narrow_to_wide and wide_to_narrow methods.

v0.3.0 (2024-04-14)
Added fmt parameter so file are saved with proper Content-Type metadata.
Added 'from .Daf import Daf' to __init__.py to reduce level. Eventually removed this.
Added CODE_OF_CONDUCT.md
Improved performance of daf_sum(), reduce() and sum_da() by avoiding explicit comparisons and leveraging try/except.
Improved daf_benchmarks.py to help to diagnose nonperformant design of sum_da().
Added basic indexed manipulation to daf_demo.py
Changes due to the suggestions by Trey Hunner.
Missing:
changes to make rowkeys supported without an existing keyfield column, thus making the structure symmetrical and easier to fully transpose.

0.2.0

v0.2.0 (2024-02-03)
Copied related code to Pydf repo and resolved all imports. All tests running.
Added option of appending a plain list to daf instance using .append()
Added 'omit_nulls' option to col(), col_to_la(), icol_to_la(), valuecounts_for_colname()
Added ability to groupby multiple cols
in select_records_daf(self, keys_ls: T_ls, inverse:bool=False), added inverse boolean.
Started to add selcols for zero-copy support.
Added _num_cols()
Added unit tests.
Add groupby_cols_reduce() and sum_np()
Fixed bug in item setter for str row.
Added demo of making list of file in a folder.
groupby_cols_reduce() added, unit tests added. Demo added.
Fix demo to run on windows, mac or linux.
Add produced test files to gitignore.
changed _num_cols() to num_cols()
removed selcols_ls from class.
Pulled in from_csv_file()
Added buff_to_file()
improved is_d1_in_d2() by using idiom in Python 3.
moved sanitize_cols to set_cols() method.
1. read with no headers.
2. pull off first row using indexing (could add pop_row())
3. set set_cols with sanitize_cols=True.
Removed:
unflatten_dirname() <-- remove?
flatten_dirname() <-- remove?
cols_to_strbool() <-- remove?
insert_icol()
keyfield <-- remove! use set_keyfield instead
insert_col() (removed keyfield)
from_selected_cols <-- remove! but check for usage! (use my_daf[:, colnames_ls]
Refactor get_item and set_item
started this but complexity not that much better.
Redid it again after discussion with Jeremy which was helpful.

tests added
initialization from dtypes and no cols.
set_cols()
set_keyfield()
daf_utils.is_d1_in_d2()
improved set_cols() to test sanitize_cols flag.
.len(), num_cols(), shape(), len()
row_idx_of()
remove_key -- keyfield not set. (remove)
get_existing_keys
select_record_da -- row_idx >= len(self.lol)
_basic_get_record_da -- no hd, include_cols
select_first_row_by_dict
select_where_idxs
select_cols
calc_cols
no hd defined.
include_cols > 10
exclude_cols > 10
exclude_types
insert_col()
colname already exists.
from_lod() with empty records_lod
to_cols_dol()
set_lol()
from_pandas_df()
to_pandas_df()
from_dod()
to_dod()
from_excel_buff()

Added:
to_cols_dol()
Moved code for __getitem__ and __setitem__ to 'indexing' and
used method equating to introduce them in the class.
Moved to_pandas and from_pandas to file daf_pandas.py
Changed constructors from_... from staticmethods to classmethods
move toward deprecating remove_key() remove_keys()
add silent_error in gkeys_to_idxs and trigger error if not selected.
Handle missing keyfield, hd, kd, in inverse mode.
Added to_value(), to_list(), to_dict() and tests.
Tested negative indexes in []
retmode attribute and constructor parameter to set 'val' vs. 'obj' return value.
moved __getitem__ and __setitem__ back to main class now that they are vastely reduced in complexity.
Name change from Pydf to Daffodil and resolve issues.

Daffodil

Page 1 of 1

0.4.2

0.4.1

0.3.0

0.2.0

Page 1 of 1

Links

Releases