-------------
Released 2019-11-15
**New features**
- Added ``PATRONICTL_CONFIG_FILE`` environment variable (msvechla)
It allows configuring the ``--config-file`` argument for ``patronictl`` from the environment.
- Implement ``patronictl history`` (Alexander Kukushkin)
It shows the history of failovers/switchovers.
- Pass ``-c statement_timeout=0`` in ``PGOPTIONS`` when doing ``pg_rewind`` (Alexander Kukushkin)
It protects from the case when ``statement_timeout`` on the server is set to some small value and one of the statements executed by pg_rewind is canceled.
- Allow lower values for PostgreSQL configuration (Soulou)
Patroni didn't allow some of the PostgreSQL configuration parameters be set smaller than some hardcoded values. Now the minimal allowed values are smaller, default values have not been changed.
- Allow for certificate-based authentication (Jonathan S. Katz)
This feature enables certificate-based authentication for superuser, replication, rewind accounts and allows the user to specify the ``sslmode`` they wish to connect with.
- Use the ``passfile`` in the ``primary_conninfo`` instead of password (Alexander Kukushkin)
It allows to avoid setting ``600`` permissions on postgresql.conf
- Perform ``pg_ctl reload`` regardless of config changes (Alexander Kukushkin)
It is possible that some config files are not controlled by Patroni. When somebody is doing a reload via the REST API or by sending SIGHUP to the Patroni process, the usual expectation is that Postgres will also be reloaded. Previously it didn't happen when there were no changes in the ``postgresql`` section of Patroni config.
- Compare all recovery parameters, not only ``primary_conninfo`` (Alexander Kukushkin)
Previously the ``check_recovery_conf()`` method was only checking whether ``primary_conninfo`` has changed, never taking into account all other recovery parameters.
- Make it possible to apply some recovery parameters without restart (Alexander Kukushkin)
Starting from PostgreSQL 12 the following recovery parameters could be changed without restart: ``archive_cleanup_command``, ``promote_trigger_file``, ``recovery_end_command``, and ``recovery_min_apply_delay``. In future Postgres releases this list will be extended and Patroni will support it automatically.
- Make it possible to change ``use_slots`` online (Alexander Kukushkin)
Previously it required restarting Patroni and removing slots manually.
- Remove only ``PATRONI_`` prefixed environment variables when starting up Postgres (Cody Coons)
It will solve a lot of problems with running different Foreign Data Wrappers.
**Stability improvements**
- Use LIST + WATCH when working with K8s API (Alexander Kukushkin)
It allows to efficiently receive object changes (pods, endpoints/configmaps) and makes less stress on K8s master nodes.
- Improve the workflow when PGDATA is not empty during bootstrap (Alexander Kukushkin)
According to the ``initdb`` source code it might consider a PGDATA empty when there are only ``lost+found`` and ``.dotfiles`` in it. Now Patroni does the same. If ``PGDATA`` happens to be non-empty, and at the same time not valid from the ``pg_controldata`` point of view, Patroni will complain and exit.
- Avoid calling expensive ``os.listdir()`` on every HA loop (Alexander Kukushkin)
When the system is under IO stress, ``os.listdir()`` could take a few seconds (or even minutes) to execute, badly affecting the HA loop of Patroni. This could even cause the leader key to disappear from DCS due to the lack of updates. There is a better and less expensive way to check that the PGDATA is not empty. Now we check the presence of the ``global/pg_control`` file in the PGDATA.
- Some improvements in logging infrastructure (Alexander Kukushkin)
Previously there was a possibility to loose the last few log lines on shutdown because the logging thread was a ``daemon`` thread.
- Use ``spawn`` multiprocessing start method on python 3.4+ (Maciej Kowalczyk)
It is a known `issue <https://bugs.python.org/issue6721>`__ in Python that threading and multiprocessing do not mix well. Switching from the default method ``fork`` to the ``spawn`` is a recommended workaround. Not doing so might result in the Postmaster starting process hanging and Patroni indefinitely reporting ``INFO: restarting after failure in progress``, while Postgres is actually up and running.
**Improvements in REST API**
- Make it possible to check client certificates in the REST API (Alexander Kukushkin)
If the ``verify_client`` is set to ``required``, Patroni will check client certificates for all REST API calls. When it is set to ``optional``, client certificates are checked for all unsafe REST API endpoints.
- Return the response code 503 for the ``GET /replica`` health check request if Postgres is not running (Alexander Anikin)
Postgres might spend significant time in recovery before it starts accepting client connections.
- Implement ``/history`` and ``/cluster`` endpoints (Alexander Kukushkin)
The ``/history`` endpoint shows the content of the ``history`` key in DCS. The ``/cluster`` endpoint shows all cluster members and some service info like pending and scheduled restarts or switchovers.
**Improvements in Etcd support**
- Retry on Etcd RAFT internal error (Alexander Kukushkin)
When the Etcd node is being shut down, it sends ``response code=300, data='etcdserver: server stopped'``, which was causing Patroni to demote the primary.
- Don't give up on Etcd request retry too early (Alexander Kukushkin)
When there were some network problems, Patroni was quickly exhausting the list of Etcd nodes and giving up without using the whole ``retry_timeout``, potentially resulting in demoting the primary.
**Bugfixes**
- Disable ``synchronous_commit`` when granting execute permissions to the ``pg_rewind`` user (kremius)
If the bootstrap is done with ``synchronous_mode_strict: true`` the `GRANT EXECUTE` statement was waiting indefinitely due to the non-synchronous nodes being available.
- Fix memory leak on python 3.7 (Alexander Kukushkin)
Patroni is using ``ThreadingMixIn`` to process REST API requests and python 3.7 made threads spawn for every request non-daemon by default.
- Fix race conditions in asynchronous actions (Alexander Kukushkin)
There was a chance that ``patronictl reinit --force`` could be overwritten by the attempt to recover stopped Postgres. This ended up in a situation when Patroni was trying to start Postgres while basebackup was running.
- Fix race condition in ``postmaster_start_time()`` method (Alexander Kukushkin)
If the method is executed from the REST API thread, it requires a separate cursor object to be created.
- Fix the problem of not promoting the sync standby that had a name containing upper case letters (Alexander Kukushkin)
We converted the name to the lower case because Postgres was doing the same while comparing the ``application_name`` with the value in ``synchronous_standby_names``.
- Kill all children along with the callback process before starting the new one (Alexander Kukushkin)
Not doing so makes it hard to implement callbacks in bash and eventually can lead to the situation when two callbacks are running at the same time.
- Fix 'start failed' issue (Alexander Kukushkin)
Under certain conditions the Postgres state might be set to 'start failed' despite Postgres being up and running.