daffodil Changelog

0.5.5

Largest improvements:
- introduction of PYON format.
- added indirect col and support in apply and reduce to handle embedded PYON.
- Added indexing with range and T_lor (list of range) types, for both column and row indexing.
- Added __contains__ method to allow " if key in my_daf: " to test if a given key exists.
- Added daf_sql.py mainly for testing, but will be the basis for extension to sqlite backing with same syntax.

v0.5.5 (2024-12-27)
add daf_crm.py as demonstration.
add op_import()
add get_csv_column_names() as refactoring in daf_utils for reading csv.
precheck_csv_cols()
compare_lists() -- imported from utils
Introduce ops_daf for running operations, can also use in audit-engine.
operation descriptions taken from docstring.
added 'default_type' to apply_dtypes for any cols not specified in passed dtypes.
Improved preprocessing of csv file when line is commented out and embedded newlines exist in the line.
Improved Daf.from_lod() by using columns in dtypes dict if provided instead of relying only on first record of lod.

Added indexing with range and T_lor (list of range) types, for both column and row indexing.
Added __contains__ method to allow " if key in my_daf: " to test if a given key exists. Requires kd exists.
revised .sum_da() based on feedback from user group.
Improve formatting of README.md to include tables of examples.
improve daf_benchmarks.py to use objsize instead of pympler to evaluate memory use.
Corrected set_keyfield in daffodil to do nothing if daf is empty.
Added 'sparse_rows' to reduction 'by' type using an indirect_col.
Improve daf_sum() to support indirect_col.
Revised apply_in_place to support by='row_klist'. Func will modify row_klist and that will modify the array.
Changed name of keyword parameter in apply_in_place() from keylist to rowkeys to avoid confusion.
added astype parameter for to_list() and to_value()
Introduced standardization around PYON instead of JSON:
- Easier to convert esp. during serialization using csv.writer().
- Compatible with more Python data types.
- Still easy to convert to JSON.
Copied function create_index_at_cursor() for sql tables in daf_benchmarks.py
Added daf_sql.py mainly to support benchmarks at this point.
This will be the last release before sql enhancements.

0.5.4

Add sort_by_colnames(self, colnames:T_ls, reverse: bool=False, length_priority: bool=False)
Add daf_utils.sort_lol_by_cols()
Add argument 'omit_nulls' to .to_list() method.
Change references to klist.values to ._values to avoid amiguity with property getter and setters.
Add annotate_daf(self, other_daf: 'Daf', my_to_other_dict: T_ds) to effectively join two tables.
Fix value_counts_daf() by adding .to_list for total.

0.5.3

added tests:
flatten()
to_json() not completely working.
from_json() not completely working.
added __format__ to allow use of {:,} and other f-string formatting. Invokes .to_value()
added alias for valuecounts_for_colname() to value_counts() to match Pandas syntax.

extend .iloc to support klist and list rtypes.
Added .to_klist() to return a record as KeyedList type.
extended .assign_col() to insert a column if the colname does not exist.
Enhanced KeyedList() to allow both args to be None, and thus initialize to empty KeyedList.
insert_col_in_lol_at_icol():
fix bug if icol resolves to add a column. --> '>' changed to '>='
allow empty lol and create a lol with one column if col_la exists.
Add .iter_list() to allow iteration over just lol without cols defined.
fixed __format__ so unadorned daf name prints summary. It takes more than {daf:} in fstring to cause formatting.
Improved robustness of num_cols() to check first few rows.
TODO: It will probably be better to keep a value of the num cols and not calculate evertime.
changed name of values in KeyedList to _values and created accessors.
added support for Iterables passed for row and col selection.
Added method "remove_dups()" which returns unique records and duplicated records based on keyfield.
Changed operation of assign_col to append col to right if colname not exist.

worked around error in Pympler.asizeof.asizeof() function, used in daf_benchmarks.
this appears to be resolved in future updates of pympler.

0.5.2

v0.5.1 (2024-05-25)
changed dependencies in pyproject.toml so they would allow newer versions.
Upgraded to Python 3.11 and upgraded all libraries to the latest.
Using venv311

v0.5.2 (2024-05-30)
Added .iter_dict() and .iter_klist() to force iteration to produce either dicts or KeyedLists.
Producing KeyedLists means the list is not copied into a dict but can be mutated and the lol will be mutated.
Correct calculation of slice_len to correct column assignment from another column
This may still have some ambiguity if a nested list structure is meant to be assigned to an array cell.
collist = my_daf[:, 'colname'].to_list() this will return a list, but sometimes of only one value.
my_daf[:, 'colname2'] = collist there is ambiguity here as to whether the list with one
item should be placed in the cell or if just the value.

0.5.0

v0.5.0 (2024-05-23)
Added split_where(self, where: Callable) which makes a single pass and splits the daf array in two
true_daf, false_daf.
Added to Daffodil multi_groupby(), reduce_dodaf_to_daf() and multi_groupby_reduce()
Added class KeyedList() to provide a new data item that functions like a dict but is a hd plus list.
can result in much better performance by not redistributing values in the dict structure.
This is not yet integrated into daffodil fully.

Removed '_da' from many Daffodil methods and for keyword parameters, to allow future upgrade to KeyedList.
select_record_da() -> select_record()
record_append()
_basic_get_record_da -> _basic_get_record
assign_record_da() -> assign_record()
assign_record_da_irow -> assign_record_irow
update_by_keylist()
update_record_da_irow -> update_record_irow
changed test_daf accordingly.

Added _build_hd() to consistently build header dict structure.
Added to_json() and from_json() methods to allow generation of custom JSONEncoder.
Changed nomenclature in KeyedList class from dex to hd.
Added from_json and to_json to KeyedList class to allow custom JSONEncoder to be developed.

select_record() silently returns {} if self is empty.

fixed _itermode vs. itermode.
Added .strip() method.
correct icols when providing a single str column name, and when column names have more than one character each.
Added 'flatten' in '.to_list' method which will combine lol to a single list.
Added .num_rows() which will more robustly calculate the number of rows in edge cases.
Fix unflattening issue discovered when running edge_test_utils.py.
Updated documentation to reflect new approach to dtypes and flattening.

0.4.2

v0.4.2 (2024-05-01)
Modified packaging for package distribution on PyPI to hopefully make it compatible with installing into AWS Lambdas.
Tried to use pyproject.toml and flit, but flit has poor toml parsing it seems, and could not find a suitable toml file.
Went back to setup.py and setuptools, but reorganized files into daffodil/src folder which will be included in the distro.
To use --editable mode for local development, must set PYTHONPATH to refer to the daffodil/src folder.
In that folder is daffodil/daf.py and daffodil/lib/daf_(name).py
To import supporting py files from lib, must use import daffodil.lib.daf_utils as utils, for example.

Daffodil

Page 1 of 2