Relstorage

Latest version: v4.1.1

Safety actively analyzes 701671 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 14

3.0a9

Not secure

==================

- Several minor logging improvements.

- Allow many internal constants to be set with environment variables
at startup for experimentation. These are presently undocumented; if
they prove useful to adjust in different environments they may be
promoted to full configuration options.

- Fix importing RelStorage when ``zope.schema`` is not installed.
``zope.schema`` is intended to be a test dependency and optional for
production deployments. Reported in :issue:`334` by Jonathan Lung.

- Make the gevent MySQL driver more efficient at avoiding needless waits.

- Due to a bug in MySQL (incorrectly rounding the 'minute' value of a
timestamp up), TIDs generated in the last half second of a minute
would suddenly jump ahead by 4,266,903,756 integers (a full minute).

- Fix leaking an internal value for ``innodb_lock_timeout`` across
commits on MySQL. This could lead to ``tpc_vote`` blocking longer
than desired. See :issue:`331`.

- Fix ``undo`` to purge the objects whose transaction was revoked from
the cache.

- Make historical storages read-only, raising
``ReadOnlyHistoryError``, during the commit process. Previously this
was only enforced at the ``Connection`` level.

- Rewrite the cache to understand the MVCC nature of the connections
that use it.

This eliminates the use of "checkpoints." Checkpoints established a
sort of index for objects to allow them to be found in the cache
without necessarily knowing their ``_p_serial`` value. To achieve
good hit rates in large databases, large values for the
``cache-delta-size-limit`` were needed, but if there were lots of
writes, polling to update those large checkpoints could become very
expensive. Because checkpoints were separate in each ZODB connection
in a process, and because when one connection changed its
checkpoints every other connection would also change its checkpoints
on the next access, this could quickly become a problem in highly
concurrent environments (many connections making many large database
queries at the same time). See :issue:`311`.

The new system uses a series of chained maps representing polling
points to build the same index data. All connections can share all
the maps for their view of the database and earlier. New polls add
new maps to the front of the list as needed, and old mapps are
removed once they are no longer needed by any active transaction.
This simulates the underlying database's MVCC approach.

Other benefits of this approach include:

- No more large polls. While each connection still polls for each
transaction it enters, they now share state and only poll against
the last time a poll occurred, not the last time they were used.
The result should be smaller, more predictable polling.

- Having a model of object visibility allows the cache to use more
efficient data structures: it can now use the smaller LOBTree to
reduce the memory occupied by the cache. It also requires
fewer cache entries overall to store multiple revisions of an
object, reducing the overhead. And there are no more key copies
required after a checkpoint change, again reducing overhead and
making the LRU algorithm more efficient.

- The cache's LRU algorithm is now at the object level, not the
object/serial pair.

- Objects that are known to have been changed but whose old revision
is still in the cache are preemptively removed when no references
to them are possible, reducing cache memory usage.

- The persistent cache can now guarantee not to write out data that
it knows to be stale.

Dropping checkpoints probably makes memcache less effective, but
memcache hasn't been recommended for awhile.

3.0a8

Not secure

==================

- Improve the safety of the persistent local cache in high-concurrency
environments using older versions of SQLite. Perform a quick
integrity check on startup and refuse to use the cache files if they
are reported corrupt.

- Switch the order in which object locks are taken: try shared locks
first and only then attempt exclusive locks. Shared locks do not
have to block, so a quick lock timeout here means that a
``ReadConflictError`` is inevitable. This works best on PostgreSQL
and MySQL 8, which support true non-blocking locks. On MySQL 5.7,
non-blocking locks are emulated with a 1s timeout. See :issue:`310`.

.. note:: The transaction machinery will retry read conflict errors
by default. The more rapid detection of them may lead to
extra retries if there was a process still finishing its
commit. Consider adding small sleep backoffs to retry
logic.

- Fix MySQL to immediately rollback its transaction when it gets a
lock timeout, while still in the stored procedure on the database.
Previously it would have required a round trip to the Python
process, which could take an arbitrary amount of time while the
transaction may have still been holding some locks. (After
:issue:`310` they would only be shared locks, but before they would
have been exclusive locks.) This should make for faster recovery in
heavily loaded environments with lots of conflicts. See :issue:`313`.

- Make MySQL clear its temp tables using a single round trip.
Truncation is optional and disabled by default. See :issue:`319`.

- Fix PostgreSQL to not send the definition of the temporary tables
for every transaction. This is only necessary for the first
transaction.

- Improve handling of commit and rollback, especially on PostgreSQL.
We now generate many fewer unneeded rollbacks. See :issue:`289`.

- Stop checking the status of ``readCurrent`` OIDs twice.

- Make the gevent MySQL driver yield more frequently while getting
large result sets. Previously it would block in C to read the entire
result set. Now it yields according to the cursor's ``arraysize``.
See :issue:`315`.

- Polling for changes now iterates the cursor instead of using
``fetchall()``. This can reduce memory usage and provide better
behaviour in a concurrent environment, depending on the cursor
implementation.

- Add three environment variables to control the odds of whether any
given poll actually suggests shifted checkpoints. These are all
floating point numbers between 0 and 1. They are
``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_FULL`` (default to 0.7,
i.e., 70%), ``RELSTORAGE_CP_REPLACEMENT_BEGIN_CONSIDERING_PERCENT``
(default 0.8) and ``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_CLOSE``
(default 0.2). (There are corresponding class variables on the
storage cache that could also be set.) Use values of ``1``, ``1``
and ``0`` to restore the old completely deterministic behaviour.
It's not clear whether these will be useful, so they are not
officially options yet but they may become so. Feedback is
appreciated! See :issue:`323`.

.. note::

These were removed in 3.0a9.

3.0a7

Not secure

==================

- Eliminate runtime dependency on ZEO. See :issue:`293`.

- Fix a rare race condition allocating OIDs on MySQL. See
:issue:`283`.

- Optimize the ``loadBefore`` method. It appears to be mostly used in
the tests.

- Fix the blob cache cleanup thread to use a real native thread if
we're monkey-patched by gevent, using gevent's thread pool.
Previously, cleaning up the blob cache would block the event loop
for the duration. See :issue:`296`.

- Improve the thread safety and resource usage of blob cache cleanup.
Previously it could spawn many useless threads.

- When caching a newly uploaded blob for a history free storage, if
there's an older revision of the blob in the cache, and it is not in
use, go ahead and preemptively remove it from disk. This can help
prevent the cache size from growing out of hand and limit the number
of expensive full cache checks required. See :issue:`297`.

- Change the default value of the configuration setting
``shared-blob-dir`` to false, meaning that the default is now to use
a blob cache. If you were using shared blobs before, you'll need to
explicitly set a value for ``shared-blob-dir`` to ``true`` before
starting RelStorage.

- Add an option, ``blob-cache-size-check-external``, that causes the
blob cache cleanup process to run in a subprocess instead of a
thread. This can free up the storage process to handle requests.
This is not recommended on Windows. (``python -m
relstorage.blobhelper.cached /path/to/cache size_in_bytes`` can be
used to run a manual cleanup at any time. This is currently an
internal implementation detail.)

- Abort storage transactions immediately when an exception occurs.
Previously this could be specified by setting the environment
variable ``RELSTORAGE_ABORT_EARLY``. Aborting early releases
database locks to allow other transactions to make progress
immediately. See :issue:`50`.

- Reduce the strength of locks taken by ``Connection.readCurrent`` so
that they don't conflict with other connections that just want to
verify they haven't changed. This also lets us immediately detect a
conflict error with an in-progress transaction that is trying to
alter those objects. See :issue:`302`.

- Make databases that use row-level locks (MySQL and PostgreSQL) raise
specific exceptions on failures to acquire those locks. A different
exception is raised for rows a transaction needs to modify compared
to rows it only needs to read. Both are considered transient to
encourage transaction middleware to retry. See :issue:`303`.

- Move more of the vote phase of transaction commit into a database
stored procedure on MySQL and PostgreSQL, beginning with taking the
row-level locks. This eliminates several more database round trips
and the need for the Python thread (or greenlet) to repeatedly
release and then acquire the GIL while holding global locks. See
:issue:`304`.

- Make conflict resolution require fewer database round trips,
especially on PostgreSQL and MySQL, at the expense of using more
memory. In the ideal case it now only needs one (MySQL) or two
(PostgreSQL) queries. Previously it needed at least twice the number
of trips as there were conflicting objects. On both databases, the
benchmarks are 40% to 80% faster (depending on cache configuration).

3.0a6

Not secure

==================

Enhancements
------------

- Eliminate a few extra round trips to the database on transaction
completion: One extra ``ROLLBACK`` in all databases, and one query
against the ``transaction`` table in history-preserving databases.
See :issue:`159`.

- Prepare more statements used during regular polling.

- Gracefully handle certain disconnected exceptions when rolling back
connections in between transactions. See :issue:`280`.

- Fix a cache error ("TypeError: NoneType object is not
subscriptable") when an object had been deleted (such as through
undoing its creation transaction, or with ``multi-zodb-gc``).

- Implement ``IExternalGC`` for history-preserving databases. This
lets them be used with `zc.zodbdgc
<https://pypi.org/project/zc.zodbdgc/>`_, allowing for
multi-database garbage collection (see :issue:`76`). Note that you
must pack the database after running ``multi-zodb-gc`` in order to
reclaim space.

.. caution::

It is critical that ``pack-gc`` be turned off (set to false) in a
multi-database and that only ``multi-zodb-gc`` be used to perform
garbage collection.

Packing
~~~~~~~

- Make ``RelStorage.pack()`` also accept a TID from the RelStorage
database to pack to. The usual Unix timestamp form for choosing a
pack time can be ambiguous in the event of multiple transactions
within a very short period of time. This is mostly a concern for
automated tests.

Similarly, it will accept a value less than 0 to mean the most
recent transaction in the database. This is useful when machine
clocks may not be well synchronized, or from automated tests.

Implementation
--------------

- Remove vestigial top-level thread locks. No instance of RelStorage
is thread safe.

RelStorage is an ``IMVCCStorage``, which means that each ZODB
``Connection`` gets its own new storage object. No visible storage
state is shared among Connections. Connections are explicitly
documented as not being thread safe. Since 2.0, RelStorage's
Connection instances have taken advantage of that fact to be a
little lighter weight through not being thread safe. However, they
still paid the overhead of locking method calls and code complexity.

The top-level storage (the one belonging to a ``ZODB.DB``) still
used heavyweight locks in earlier releases. ``ZODB.DB.storage`` is
documented as being only useful for tests, and the ``DB`` object
itself does not expose any operations that use the storage in a way
that would require thread safety.

The remaining thread safety support has been removed. This
simplifies the code and reduces overhead.

If you were previously using the ``ZODB.DB.storage`` object, or a
``RelStorage`` instance you constructed manually, from multiple
threads, instead make sure each thread has a distinct
``RelStorage.new_instance()`` object.

- A ``RelStorage`` instance now only implements the appropriate subset
of ZODB storage interfaces according to its configuration. For
example, if there is no configured ``blob-dir``, it won't implement
``IBlobStorage``, and if ``keep-history`` is false, it won't
implement ``IStorageUndoable``.

- Refactor RelStorage internals for a cleaner separation of concerns.
This includes how (some) queries are written and managed, making it
easier to prepare statements, but only those actually used.

MySQL
-----

- On MySQL, move allocating a TID into the database. On benchmarks
of a local machine this can be a scant few percent faster, but it's
primarily intended to reduce the number of round-trips to the
database. This is a step towards :issue:`281`. See :pr:`286`.

- On MySQL, set the connection timezone to be UTC. This is necessary
to get values consistent between ``UTC_TIMESTAMP``,
``UNIX_TIMESTAMP``, ``FROM_UNIXTIME``, and Python's ``time.gmtime``,
as used for comparing TIDs.

- On MySQL, move most steps of finishing a transaction into a stored
procedure. Together with the TID allocation changes, this reduces
the number of database queries from::

1 to lock
+ 1 to get TID
+ 1 to store transaction (0 in history free)
+ 1 to move states
+ 1 for blobs (2 in history free)
+ 1 to set current (0 in history free)
+ 1 to commit
= 7 or 6 (in history free)

down to 1. This is expected to be especially helpful for gevent
deployments, as the database lock is held, the transaction finalized
and committed, and the database lock released, all without involving
greenlets or greenlet switches. By allowing the GIL to be released
longer it may also be helpful for threaded environments. See
:issue:`281` and :pr:`287` for benchmarks and specifics.

.. caution::

MySQL 5.7.18 and earlier contain a severe bug that causes the
server to crash when the stored procedure is executed.

- Make PyMySQL use the same precision as mysqlclient when sending
floating point parameters.

- Automatically detect when MySQL stored procedures in the database
are out of date with the current source in this package and replace
them.

PostgreSQL
----------

- As for MySQL, move allocating a TID into the database.

- As for MySQL, move most steps of finishing a transaction into a
stored procedure. On psycopg2 and psycopg2cffi this is done in a
single database call. With pg8000, however, it still takes two, with
the second call being the COMMIT call that releases locks.

- Speed up getting the approximate number of objects
(``len(storage)``) in a database by using the estimates collected by
the autovacuum process or analyzing tables, instead of asking for a
full table scan.

3.0a5

Not secure

==================

- Reduce the time that MySQL will wait to perform OID garbage
collection on startup. See :issue:`271`.

- Fix several instances where RelStorage could attempt to perform
operations on a database connection with outstanding results on a
cursor. Some database drivers can react badly to this, depending on
the exact circumstances. For example, mysqlclient can raise
``ProgrammingError: (2014, "Commands out of sync; you can't run this
command now")``. See :issue:`270`.

- Fix the "gevent MySQLdb" driver to be cooperative during ``commit``
and ``rollback`` operations. Previously, it would block the event
loop for the entire time it took to send the commit or rollback
request, the server to perform the request, and the result to be
returned. Now, it frees the event loop after sending the request.
See :issue:`272`.

- Call ``set_min_oid`` less often if a storage is just updating
existing objects, not creating its own.

- Fix an occasional possible deadlock in MySQL's ``set_min_oid``. See
:pr:`276`.

3.0a4

Not secure

==================

- Add support for the ZODB 5 ``connection.prefetch(*args)`` API. This
takes either OIDs (``obj._p_oid``) or persistent ghost objects, or
an iterator of those things, and asks the storage to load them into
its cache for use in the future. In RelStorage, this uses the shared
cache and so may be useful for more than one thread. This can be
3x or more faster than loading objects on-demand. See :issue:`239`.

- Stop chunking blob uploads on PostgreSQL. All supported PostgreSQL
versions natively handle blobs greater than 2GB in size, and the
server was already chunking the blobs for storage, so our layer of
extra chunking has become unnecessary.

.. important::

The first time a storage is opened with this version,
blobs that have multiple chunks will be collapsed into a single
chunk. If there are many blobs larger than 2GB, this could take
some time.

It is recommended you have a backup before installing this
version.

To verify that the blobs were correctly migrated, you should
clean or remove your configured blob-cache directory, forcing new
blobs to be downloaded.

- Fix a bug that left large objects behind if a PostgreSQL database
containing any blobs was ever zapped (with ``storage.zap_all()``).
The ``zodbconvert`` command, the ``zodbshootout`` command, and the
RelStorage test suite could all zap databases. Running the
``vacuumlo`` command included with PostgreSQL will free such
orphaned large objects, after which a regular ``vacuumdb`` command
can be used to reclaim space. See :issue:`260`.

- Conflict resolution can use data from the cache, thus potentially
eliminating a database hit during a very time-sensitive process.
Please file issues if you encounter any strange behaviour when
concurrently packing to the present time and also resolving
conflicts, in case there are corner cases.

- Packing a storage now invalidates the cached values that were packed
away. For the global caches this helps reduce memory pressure; for
the local cache this helps reduce memory pressure and ensure a more
useful persistent cache (this probably matters most when running on
a single machine).

- Make MySQL use ``ON DUPLICATE KEY UPDATE`` rather than ``REPLACE``.
This can be friendlier to the storage engine as it performs an
in-place ``UPDATE`` rather than a ``DELETE`` followed by an
``INSERT``. See :issue:`189`.

- Make PostgreSQL use an upsert query for moving rows into place on
history-preserving databases.

- Support ZODB 5's parallel commit feature. This means that the
database-wide commit lock is taken much later in the process, and
held for a much shorter time than before.

Previously, the commit lock was taken during the ``tpc_vote`` phase,
and held while we checked ``Connection.readCurrent`` values, and
checked for (and hopefully resolved) conflicts. Other transaction
resources (such as other ZODB databases in a multi-db setup) then
got to vote while we held this lock. Finally, in ``tpc_finally``,
objects were moved into place and the lock was released. This
prevented any other storage instances from checking for
``readCurrent`` or conflicts while we were doing that.

Now, ``tpc_vote`` is (usually) able to check
``Connection.readCurrent`` and check and resolve conflicts without
taking the commit lock. Only in ``tpc_finish``, when we need to
finally allocate the transaction ID, is the commit lock taken, and
only held for the duration needed to finally move objects into
place. This allows other storages for this database, and other
transaction resources for this transaction, to proceed with voting,
conflict resolution, etc, in parallel.

Consistent results are maintained by use of object-level row
locking. Thus, two transactions that attempt to modify the same
object will now only block each other.

There are two exceptions. First, if the ``storage.restore()`` method
is used, the commit lock must be taken very early (before
``tpc_vote``). This is usually only done as part of copying one
database to another. Second, if the storage is configured with a
shared blob directory instead of a blob cache (meaning that blobs
are *only* stored on the filesystem) and the transaction has added
or mutated blobs, the commit lock must be taken somewhat early to
ensure blobs can be saved (after conflict resolution, etc, but
before the end of ``tpc_vote``). It is recommended to store blobs on
the RDBMS server and use a blob cache. The shared blob layout can be
considered deprecated for this reason).

In addition, the new locking scheme means that packing no longer
needs to acquire a commit lock and more work can proceed in parallel
with regular commits. (Though, there may have been some regressions
in the deletion phase of packing speed MySQL; this has not been
benchmarked.)

.. note::

If the environment variable ``RELSTORAGE_LOCK_EARLY`` is
set when RelStorage is imported, then parallel commit will not be
enabled, and the commit lock will be taken at the beginning of
the tpc_vote phase, just like before: conflict resolution and
readCurrent will all be handled with the lock held.

This is intended for use diagnosing and temporarily working
around bugs, such as the database driver reporting a deadlock
error. If you find it necessary to use this setting, please
report an issue at https://github.com/zodb/relstorage/issues.

See :issue:`125`.

- Deprecate the option ``shared-blob-dir``. Shared blob dirs prevent
using parallel commits when blobs are part of a transaction.

- Remove the 'umysqldb' driver option. This driver exhibited failures
with row-level locking used for parallel commits. See :issue:`264`.

- Migrate all remaining MySQL tables to InnoDB. This is primarily the
tables used during packing, but also the table used for allocating
new OIDs.

Tables will be converted the first time a storage is opened that is
allowed to create the schema (``create-schema`` in the
configuration; default is true). For large tables, this may take
some time, so it is recommended to finish any outstanding packs
before upgrading RelStorage.

If schema creation is not allowed, and required tables are not using
InnoDB, an exception will be raised. Please contact the RelStorage
maintainers on GitHub if you have a need to use a storage engine
besides InnoDB.

This allows for better error detection during packing with parallel
commits. It is also required for `MySQL Group Replication
<https://dev.mysql.com/doc/refman/8.0/en/group-replication-requirements.html>`_.
Benchmarking also shows that creating new objects can be up to 15%
faster due to faster OID allocation.

Things to be aware of:

- MySQL's `general conversion notes
<https://dev.mysql.com/doc/refman/8.0/en/converting-tables-to-innodb.html>`_
suggest that if you had tuned certain server parameters for
MyISAM tables (which RelStorage only used during packing) it
might be good to evaluate those parameters again.
- InnoDB tables may take more disk space than MyISAM tables.
- The ``new_oid`` table may temporarily have more rows in it at one
time than before. They will still be garbage collected
eventually. The change in strategy was necessary to handle
concurrent transactions better.

See :issue:`188`.

- Fix an ``OperationalError: database is locked`` that could occur on
startup if multiple processes were reading or writing the cache
database. See :issue:`266`.

Page 7 of 14

Releases

Has known vulnerabilities

Previous Next

Relstorage

Page 7 of 14

3.0a9

3.0a8

3.0a7

3.0a6

3.0a5

3.0a4

Page 7 of 14

Links

Releases