Pygrametl

Latest version: v2.8

Safety actively analyzes 623608 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

2.8

-----------
**Added**
pygrametl's existing set of unit tests. By default, the unit tests are executed
against an in-memory SQLite database so no configuration is needed.

``SQLTransformingSource`` a new class supporting transformation of rows by loading
them into a temporary table in an RDBMS and then retrieving them using an SQL
query.

``SlowlyChangingDimension.lookupasof`` which can be used to lookup the version of a
member that was valid at a given time.

**Changed**
Beginner guide updated and dataset added to it.

If a ``rowexpander`` does not return a row in the form of a ``dict``,
``Dimension.ensure`` now explicitly raises a ``TypeError`` with the name of
the function set as the ``rowexpander``.

``ymdhmsparser`` can now handle ``datetime.datetime`` as input. Any other
input is cast to a string. (GitHub issue 40)

``ymdparser`` can now handle ``datetime.datetime`` and ``datetime.date`` as
input. Any other input is cast to a string. (GitHub issue 40)

If ``orderingatt`` is not specified for a ``SlowlyChangingDimension``,
``fromatt`` will now be used if ``versionatt`` are ``toatt`` not specified.

When using ``fromatt`` or ``toatt`` as ``orderingatt``, the generated SQL
will specify NULLS FIRST or NULLS LAST. Before this change, NULLS FIRST was
assumed for ORDER BY DESC, but this is not guaranteed to hold for all
DBMSs. The change thus requires the used DBMS to support that NULLS FIRST or
NULLS LAST is specified in the generated SQL.

**Fixed**
``BulkFactTable.__init__`` now sets the attributes ``keyrefs``, ``measures``,
and ``all``. These attributes are required by the ``FactTablePartitioner``.

``BulkFactTable`` constructed with ``usemultirow=True`` (the default is
``False``) can now load rows containing ``NULL`` values. (GitHub issue 50)

Incorrect quotation of identifiers in ``SlowlyChangingDimension`` fixed.

Missing key value of root when calling ``getbykey`` of ``SnowflakedDimension`` fixed.

Added explicit commit and rollback to fix problems with hanging DTT.

2.7

-----------
**Note**
This is the last version to actively support Python 2. Support for it will
slowly be reduced as we continue to develop pygrametl.

**Added**
``drawntabletesting`` a new module for testing ETL flows. The module makes it
easy to define the preconditions and postconditions for the database as part
of each test. This is done simply by "drawing" the tables and their contents
using strings.

``AccumulatingSnapshotFactTable`` a new class supporting accumulating snapshot
fact tables where facts can be updated as a process progresses.

``BatchFactTable.__init__`` now optionally takes the argument ``usemultirow``.
When this argument is ``True`` (the default is ``False``), batches are loaded
using ``execute`` with a single ``INSERT INTO name VALUES`` statement instead
of ``executemany()``. (GitHub issue 19).

``closecurrent`` method added to ``SlowlyChangingDimension`` to make it
possible to set an end date for the most current version without adding a new
version.

A (read-only) property ``awaitingrows`` added to ``BatchFactTable`` and
``_BaseBulkloadable`` to get the number of inserted rows awaiting to be loaded
into the database table. (GitHub issue 23)

**Changed**
``SlowlyChangingDimension.scdensure`` now checks if the newest version has its
``toatt`` set to a value different from ``maxto`` (if ``toatt`` is defined).
This can happen from a call to ``closecurrent`` or a manual update. If it is
the case, a new version will be added when ``scdensure`` is called even if no
other differences are present.

Generators in ``datasources`` don't raise ``StopIteration`` anymore as
required by PEP 479.

``__author__`` and ``__maintainer__`` removed from all .py files.

``__version__`` removed from all .py files except ``pygrametl/__init__.py``
The version of pygrametl is thus now available as ``pygrametl.__version__``
and will be updated for every release.

**Fixed**
Outdated information stating that type 1 slowly changing dimensions are not
supported has been removed from the documentation. In addition, minor errors
and inconsistencies have been corrected throughput the documentation. (GitHub
issue 27)

Wrong use of paramstyle in ``ConnectionWrapper.executemany`` fixed.

A call to an incorrect method in ``aggregators.Avg.finish()``.

The ``datespan()`` function now checks whether ``fromdate`` and ``todate`` are
strings before calling ``.split()``. In addition, the function now uses
``dict.items()`` instead of ``dict.iteritems()`` which is not supported in
Python 3.

2.6

-----------
**Added**
``PandasSource`` a new class, that given a Pandas ``DataFrame`` acts as a data
source. Each row of the ``DataFrame`` is returned as a ``dict`` that can be
loaded into a data warehouse using ``tables``.

``MappingSource`` a new class, that given a data source and a dictionary of
columns to callables, maps the callables over each element of the specified
column before returning the row.

**Changed**
``SlowlyChangingDimension`` improved to make ``versionatt`` optional. (GitHub
issue 12. Thanks to HereticSK)

``ConnectionWrapper.__init__`` now optionally takes the argument
``copyintonew``. When this argument is ``True`` (the default is ``False``), a
new ``dict`` with parameters is created when a statement is executed. The new
``dict`` only holds the k/v pairs needed by the statement. This is to avoid
``DatabaseError: ORA-01036: illegal variable name/number`` with cx_Oracle.
(GitHub issue 9).

First argument to ``TypedCSVSource.__init__`` renamed from ``csvfile`` to
``f`` to be consistent with documentation and ``CSVSource``

**Fixed**
``ConnectionWrapper.execute`` does not pass the argument ``arguments`` to the
underlying cursor's execute method if ``arguments`` is ``None``. Some drivers
raise an ``Error`` if ``None`` is passed, some don't.

2.5

-----------
**Added**
``TypedCSVSource`` a new class that reads a CSV file (by means of
``csv.DictReader``) and performs user-specified casts (or other function
calls) on the values before returning the rows.

Added ``definequote`` function to enable quoting of SQL identifiers in all
tables.

Added ``getdbfriendlystr`` function to enable conversion of values into
strings that are accepted by an RDBMS. Boolean values become 0`` or ``1``,
``None`` values can be replaced by another value.

All Bulkloadables now accept the argument ``strconverter`` to their
``__init__`` methods. This should be a function that converts values into
strings that are written to a temporary file and eventually bulkloaded. The
default value is the new ``getdbfriendlystr``.

``SlowlyChangingDimension`` can now optionally be given the argument
``useorderby`` when instantiated. If ``True`` (the default), the SQL used by
``lookup`` uses ``ORDER BY`` (this is the same behaviour as before). If
``False``, ``ORDER BY`` is not used and the SQL used by ``lookup`` will fetch
all versions of the member and then find the key value for the newest version
with Python code. For some systems, this can lead to significant performance
improvements.

**Changed**
Generator used in ``ConnectionWrapper.fetchalltuples`` to reduce memory
consumption. (Thanks to Alexey Kuzmenko)

``SlowlyChangingDimension`` can sometimes avoid deleting from the cache on
updates, now checked in the same way as in ``CachedDimension``

``rowfactory`` now tries to use ``fetchmany``. (Suggested by Alexey Kuzmenko).

``_BaseBulkloadable`` now has the method ``insert`` while the methods
``_insertwithnull`` and ``_insertwithoutnull`` have been removed (and
subclasses do thus not pick one of them at runtime). The ``insert`` method
will always call ``strconverter`` (see above) no matter if a ``nullsubst`` has
been specified or not.

``_BaseBulkloadable`` will now raise a ``TypeError`` if no ``nullsubst`` is
specified and a ``None`` value is present. Before this change, the ``None``
value would silently be converted into the string ``'None'``. Users must now
give a ``nullsubst`` argument when instantiating a subclass of
``_BaseBulkloadable`` that should be able to handle ``None`` values.

``SubprocessFactTable`` has been changed similarly to ``_BaseBulkloadable``
and does now define ``insert`` which uses ``strconverter``. Thus
``_insertwithnull`` and ``_insertwithoutnull`` have been removed.

``getunderlyingmodule`` has been changed and now tries different possible
module names and looks for ``'paramstyle'`` and ``'connect'``.
``ConnectionWrapper`` now uses ``getunderlyingmodule`` in ``__init__`` when
trying to determine the paramstyle to use.

**Fixed**
Using ``cachesize=0`` with ``SlowlyChangingDimension`` no longer causes
crash.

Problem with double use of namemappings in ``_before_update`` in
``CachedDimension`` and ``SlowlyChangingDimension`` fixed. (Thanks to Alexey
Kuzmenko).

Problem with ``rowfactory`` only returning one row fixed. (Thanks to Alexey
Kuzmenko).

Problem with ``JDBCConnectionWrapper.rowfactory`` returning dictionaries with
incorrect keys fixed. (GitHub issue 5).

Problem with ``TypeOneSlowlyChangingDimension`` caching ``None`` after an
update if a namemapping mapped to an attribute not in the update row fixed.

Problem in ``__init__.copy`` fixed.

Namemapping is now used when comparing measure values in ``FactTable.ensure``
with ``compare=True``.

2.4

-----------
**Note**
This is the last version to support versions of Python 2 older than 2.7

**Added**
``TypeOneSlowlyChangingDimension`` a new class that adds support for efficient
loading and updating of a type 1 exclusive slowly changing dimension.

``CachedBulkLoadingDimension`` a new class that supports bulk loading a
dimension without requiring the caching of all rows that are loaded.

Alternative implementation of ``FIFODict`` based on an ``OrderedDict``.
(Thanks to Alexey Kuzmenko).

Dimension classes with finite caches can now be prefilled more efficiently
using the ``FETCH FIRST`` SQL statement for increased performance.

Examples on how to perform bulk loading in MySQL, Oracle Database, and
Microsoft SQL Server. (Thanks to Alexey Kuzmenko).

**Changed**
It is now verified that ``lookupatts`` is a subset of all attributes.

All method calls to a superclass constructor now uses named parameters.

Made cosmetic changes, and added additional information about how to ensure
cache coherency between pygrametl and the database to existing docstrings.

The entire codebase was updated to adhere more closely to PEP 8 using
autopep8.

**Fixed**
Using ``dependson`` no longer causes crashes due to multiple loads of a table.
(Thanks to Alexey Kuzmenko).

Using ``defaultidvalue`` no longer causes ``Dimension.ensure`` to fail to
insert correctly, or make ``CachedDimension.ensure`` produce duplicates.
(Thanks to Alexey Kuzmenko).

Using ``SlowlyChangingDimension`` with the cache disabled no longer causes a
crash in ``SlowlyChangingDimension.scdensure``.

Using ``BulkDimension``, ``CachedBulkDimension`` or ``BulkFactTable`` with
``tempdest`` and ``usefilename`` no longer causes a crash in
``_BaseBulkloadable._bulkloadnow``.

2.3.2

-------------
**Fixed**
``SnowflakedDimension`` no longer crashes due to ``levellist`` not being a
list before the length of it is computed.

``FactTable`` now inserts the correct number of commas to the SQL statements
used for inserting rows, independent of the value of ``keyrefs``.

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.