This release improves management of Patroni cluster by bring in pause mode, improves maintenance with scheduled and conditional restarts, makes Patroni interaction with Etcd or Zookeeper more resilient and greatly enhances patronictl.
**Upgrade notice**
When upgrading from releases below 1.0 read about changing of credentials and configuration format at 1.0 release notes.
**Pause mode**
- Introduce pause mode to temporary detach Patroni from managing PostgreSQL instance (Murat Kabilov, Alexander Kukushkin, Oleksii Kliukin).
Previously, one had to send SIGKILL signal to Patroni to stop it without terminating PostgreSQL. The new pause mode detaches Patroni from PostgreSQL cluster-wide without terminating Patroni. It is similar to the maintenance mode in Pacemaker. Patroni is still responsible for updating member and leader keys in DCS, but it will not start, stop or restart PostgreSQL server in the process. There are a few exceptions, for instance, manual failovers, reinitializes and restarts are still allowed. You can read a [detailed description of this feature](https://github.com/zalando/patroni/blob/master/docs/pause.rst).
In addition, patronictl supports new `pause` and `resume` commands to toggle the pause mode.
**Scheduled and conditional restarts**
- Add conditions to the restart API command (Oleksii)
This change enhances Patroni restarts by adding a couple of conditions that can be verified in order to do the restart. Among the conditions are restarting when PostgreSQL role is either a master or a replica, checking the PostgreSQL version number or restarting only when restart is necessary in order to apply configuration changes.
- Add scheduled restarts (Oleksii)
It is now possible to schedule a restart in the future. Only one scheduled restart per node is supported. It is possible to clear the scheduled restart if it is not needed anymore. A combination of scheduled and conditional restarts is supported, making it possible, for instance, to scheduled minor PostgreSQL upgrades in the night, restarting only the instances that are running the outdated minor version without adding postgres-specific logic to administration scripts.
- Add support for conditional and scheduled restarts to patronictl (Murat).
patronictl restart supports several new options. There is also patronictl flush command to clean the scheduled actions.
**Robust DCS interaction**
- Set Kazoo timeouts depending on the loop_wait (Alexander)
Originally, ping_timeout and connect_timeout values were calculated from the negotiated session timeout. Patroni loop_wait was not taken into account. As
a result, a single retry could take more time than the session timeout, forcing Patroni to release the lock and demote.
This change set ping and connect timeout to half of the value of loop_wait, speeding up detection of connection issues and leaving enough time to retry the connection attempt before loosing the lock.
- Update Etcd topology only after original request succeed (Alexander)
Postpone updating the Etcd topology known to the client until after the original request. When retrieving the cluster topology, implement the retry timeouts depending on the known number of nodes in the Etcd cluster. This makes our client prefer to get the results of the request to having the up-to-date list of nodes.
Both changes make Patroni connections to DCS more robust in the face of network issues.
**Patronictl, monitoring and configuration**
- Return information about streaming replicas via the API (Feike Steenbergen)
Previously, there was no reliable way to query Patroni about PostgreSQL instances that fail to stream changes (for instance, due to connection issues). This change exposes the contents of pg_stat_replication via the /patroni endpoint.
- Add patronictl scaffold command (Oleksii)
Add a command to create cluster structure in Etcd. The cluster is created with user-specified sysid and leader, and both leader and member keys are made persistent. This command is useful to create so-called master-less configurations, where Patroni cluster consisting of only replicas replicate from the external master node that is unaware of Patroni. Subsequently, one
may remove the leader key, promoting one of the Patroni nodes and replacing
the original master with the Patroni-based HA cluster.
- Add configuration option `bin_dir` to locate PostgreSQL binaries (Ants Aasma)
It is useful to be able to specify the location of PostgreSQL binaries explicitly when Linux distros that support installing multiple PostgreSQL versions at the same time.
- Allow configuration file path to be overridden using `custom_conf` of (Alejandro Martínez)
Allows for custom configuration file paths, which will be unmanaged by Patroni, details:
https://github.com/zalando/patroni/blob/master/docs/SETTINGS.rst#postgresql
**Bug fixes and code improvements**
- Make Patroni compatible with new version schema in PostgreSQL 10 and above (Feike)
Make sure that Patroni understand 2-digits version numbers when doing conditional restarts based on the PostgreSQL version.
- Use pkgutil to find DCS modules (Alexander)
Use the dedicated python module instead of traversing directories manually in order to find DCS modules.
- Always call on_start callback when starting Patroni (Alexander)
Previously, Patroni did not call any callbacks when attaching to the already running node with the correct role. Since callbacks are often used to route
client connections that could result in the failure to register the running
node in the connection routing scheme. With this fix, Patroni calls on_start
callback even when attaching to the already running node.
- Do not drop active replication slots (Murat, Oleksii)
Avoid dropping active physical replication slots on master. PostgreSQL cannot
drop such slots anyway. This change makes possible to run non-Patroni managed
replicas/consumers on the master.
- Close Patroni connections during start of the PostgreSQL instance (Alexander)
Forces Patroni to close all former connections when PostgreSQL node is started. Avoids the trap of reusing former connections if postmaster was killed with SIGKILL.
- Replace invalid characters when constructing slot names from member names (Ants)
Make sure that standby names that do not comply with the slot naming rules don't cause the slot creation and standby startup to fail. Replace the dashes in the slot names with underscores and all other characters not allowed in slot names with their unicode codepoints.