Patroni

Latest version: v3.3.1

Safety actively analyzes 641082 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 10

2.1.2

Not secure
-------------

Released 2021-12-03

**New features**

- Compatibility with ``psycopg>=3.0`` (Alexander Kukushkin)

By default ``psycopg2`` is preferred. `psycopg>=3.0` will be used only if ``psycopg2`` is not available or its version is too old.

- Add ``dcs_last_seen`` field to the REST API (Michael Banck)

This field notes the last time (as unix epoch) a cluster member has successfully communicated with the DCS. This is useful to identify and/or analyze network partitions.

- Release the leader lock when ``pg_controldata`` reports "shut down" (Alexander Kukushkin)

To solve the problem of slow switchover/shutdown in case ``archive_command`` is slow/failing, Patroni will remove the leader key immediately after ``pg_controldata`` started reporting PGDATA as ``shut down`` cleanly and it verified that there is at least one replica that received all changes. If there are no replicas that fulfill this condition the leader key is not removed and the old behavior is retained, i.e. Patroni will keep updating the lock.

- Add ``sslcrldir`` connection parameter support (Kostiantyn Nemchenko)

The new connection parameter was introduced in the PostgreSQL v14.

- Allow setting ACLs for ZNodes in Zookeeper (Alwyn Davis)

Introduce a new configuration option ``zookeeper.set_acls`` so that Kazoo will apply a default ACL for each ZNode that it creates.


**Stability improvements**

- Delay the next attempt of recovery till next HA loop (Alexander Kukushkin)

If Postgres crashed due to out of disk space (for example) and fails to start because of that Patroni is too eagerly trying to recover it flooding logs.

- Add log before demoting, which can take some time (Michael Banck)

It can take some time for the demote to finish and it might not be obvious from looking at the logs what exactly is going on.

- Improve "I am" status messages (Michael Banck)

``no action. I am a secondary ({0})`` vs ``no action. I am ({0}), a secondary``

- Cast to int ``wal_keep_segments`` when converting to ``wal_keep_size`` (Jorge Solórzano)

It is possible to specify ``wal_keep_segments`` as a string in the global :ref:`dynamic configuration <dynamic_configuration>` and due to Python being a dynamically typed language the string was simply multiplied. Example: ``wal_keep_segments: "100"`` was converted to ``100100100100100100100100100100100100100100100100MB``.

- Allow switchover only to sync nodes when synchronous replication is enabled (Alexander Kukushkin)

In addition to that do the leader race only against known synchronous nodes.

- Use cached role as a fallback when Postgres is slow (Alexander Kukushkin)

In some extreme cases Postgres could be so slow that the normal monitoring query does not finish in a few seconds. The ``statement_timeout`` exception not being properly handled could lead to the situation where Postgres was not demoted on time when the leader key expired or the update failed. In case of such exception Patroni will use the cached ``role`` to determine whether Postgres is running as a primary.

- Avoid unnecessary updates of the member ZNode (Alexander Kukushkin)

If no values have changed in the members data, the update should not happen.

- Optimize checkpoint after promote (Alexander Kukushkin)

Avoid doing ``CHECKPOINT`` if the latest timeline is already stored in ``pg_control``. It helps to avoid unnecessary ``CHECKPOINT`` right after initializing the new cluster with ``initdb``.

- Prefer members without ``nofailover`` when picking sync nodes (Alexander Kukushkin)

Previously sync nodes were selected only based on the replication lag, hence the node with ``nofailover`` tag had the same chances to become synchronous as any other node. That behavior was confusing and dangerous at the same time because in case of a failed primary the failover could not happen automatically.

- Remove duplicate hosts from the etcd machine cache (Michael Banck)

Advertised client URLs in the etcd cluster could be misconfigured. Removing duplicates in Patroni in this case is a low-hanging fruit.


**Bugfixes**

- Skip temporary replication slots while doing slot management (Alexander Kukushkin)

Starting from v10 ``pg_basebackup`` creates a temporary replication slot for WAL streaming and Patroni was trying to drop it because the slot name looks unknown. In order to fix it, we skip all temporary slots when querying ``pg_stat_replication_slots`` view.

- Ensure ``pg_replication_slot_advance()`` doesn't timeout (Alexander Kukushkin)

Patroni was using the default ``statement_timeout`` in this case and once the call failed there are very high chances that it will never recover, resulting in increased size of ``pg_wal`` and ``pg_catalog`` bloat.

- The ``/status`` wasn't updated on demote (Alexander Kukushkin)

After demoting PostgreSQL the old leader updates the last LSN in DCS. Starting from ``2.1.0`` the new ``/status`` key was introduced, but the optime was still written to the ``/optime/leader``.

- Handle DCS exceptions when demoting (Alexander Kukushkin)

While demoting the master due to failure to update the leader lock it could happen that DCS goes completely down and the ``get_cluster()`` call raises an exception. Not being handled properly it results in Postgres remaining stopped until DCS recovers.

- The ``use_unix_socket_repl`` didn't work is some cases (Alexander Kukushkin)

Specifically, if ``postgresql.unix_socket_directories`` is not set. In this case Patroni is supposed to use the default value from ``libpq``.

- Fix a few issues with Patroni REST API (Alexander Kukushkin)

The ``clusters_unlocked`` sometimes could be not defined, what resulted in exceptions in the ``GET /metrics`` endpoint. In addition to that the error handling method was assuming that the ``connect_address`` tuple always has two elements, while in fact there could be more in case of IPv6.

- Wait for newly promoted node to finish recovery before deciding to rewind (Alexander Kukushkin)

It could take some time before the actual promote happens and the new timeline is created. Without waiting replicas could come to the conclusion that rewind isn't required.

- Handle missing timelines in a history file when deciding to rewind (Alexander Kukushkin)

If the current replica timeline is missing in the history file on the primary the replica was falsely assuming that rewind isn't required.

2.1.1

Not secure
-------------

Released 2021-08-19

**New features**

- Support for ETCD SRV name suffix (David Pavlicek)

Etcd allows to differentiate between multiple Etcd clusters under the same domain and from now on Patroni also supports it.

- Enrich history with the new leader (huiyalin525)

It adds the new column to the ``patronictl history`` output.

- Make the CA bundle configurable for in-cluster Kubernetes config (Aron Parsons)

By default Patroni is using ``/var/run/secrets/kubernetes.io/serviceaccount/ca.crt`` and this new feature allows specifying the custom ``kubernetes.cacert``.

- Support dynamically registering/deregistering as a Consul service and changing tags (Tommy Li)

Previously it required Patroni restart.

**Bugfixes**

- Avoid unnecessary reload of REST API (Alexander Kukushkin)

The previous release added a feature of reloading REST API certificates if changed on disk. Unfortunately, the reload was happening unconditionally right after the start.

- Don't resolve cluster members when ``etcd.use_proxies`` is set (Alexander Kukushkin)

When starting up Patroni checks the healthiness of Etcd cluster by querying the list of members. In addition to that, it also tried to resolve their hostnames, which is not necessary when working with Etcd via proxy and was causing unnecessary warnings.

- Skip rows with NULL values in the ``pg_stat_replication`` (Alexander Kukushkin)

It seems that the ``pg_stat_replication`` view could contain NULL values in the ``replay_lsn``, ``flush_lsn``, or ``write_lsn`` fields even when ``state = 'streaming'``.

2.1.0

Not secure
-------------

Released 2021-07-06

This version adds compatibility with PostgreSQL v14, makes logical replication slots to survive failover/switchover, implements support of allowlist for REST API, and also reducing the number of logs to one line per heart-beat.

**New features**

- Compatibility with PostgreSQL v14 (Alexander Kukushkin)

Unpause WAL replay if Patroni is not in a "pause" mode itself. It could be "paused" due to the change of certain parameters like for example ``max_connections`` on the primary.

- Failover logical slots (Alexander Kukushkin)

Make logical replication slots survive failover/switchover on PostgreSQL v11+. The replication slot if copied from the primary to the replica with restart and later the `pg_replication_slot_advance() <https://www.postgresql.org/docs/11/functions-admin.html#id-1.5.8.31.8.5.2.2.8.1.1>`__ function is used to move it forward. As a result, the slot will already exist before the failover and no events should be lost, but, there is a chance that some events could be delivered more than once.

- Implemented allowlist for Patroni REST API (Alexander Kukushkin)

If configured, only IP's that matching rules would be allowed to call unsafe endpoints. In addition to that, it is possible to automatically include IP's of members of the cluster to the list.

- Added support of replication connections via unix socket (Mohamad El-Rifai)

Previously Patroni was always using TCP for replication connection what could cause some issues with SSL verification. Using unix sockets allows exempt replication user from SSL verification.

- Health check on user-defined tags (Arman Jafari Tehrani)

Along with :ref:`predefined tags: <tags_settings>` it is possible to specify any number of custom tags that become visible in the ``patronictl list`` output and in the REST API. From now on it is possible to use custom tags in health checks.

- Added Prometheus ``/metrics`` endpoint (Mark Mercado, Michael Banck)

The endpoint exposing the same metrics as ``/patroni``.

- Reduced chattiness of Patroni logs (Alexander Kukushkin)

When everything goes normal, only one line will be written for every run of HA loop.


**Breaking changes**

- The old ``permanent logical replication slots`` feature will no longer work with PostgreSQL v10 and older (Alexander Kukushkin)

The strategy of creating the logical slots after performing a promotion can't guaranty that no logical events are lost and therefore disabled.

- The ``/leader`` endpoint always returns 200 if the node holds the lock (Alexander Kukushkin)

Promoting the standby cluster requires updating load-balancer health checks, which is not very convenient and easy to forget. To solve it, we change the behavior of the ``/leader`` health check endpoint. It will return 200 without taking into account whether the cluster is normal or the ``standby_cluster``.


**Improvements in Raft support**

- Reliable support of Raft traffic encryption (Alexander Kukushkin)

Due to the different issues in the ``PySyncObj`` the encryption support was very unstable

- Handle DNS issues in Raft implementation (Alexander Kukushkin)

If ``self_addr`` and/or ``partner_addrs`` are configured using the DNS name instead of IP's the ``PySyncObj`` was effectively doing resolve only once when the object is created. It was causing problems when the same node was coming back online with a different IP.


**Stability improvements**

- Compatibility with ``psycopg2-2.9+`` (Alexander Kukushkin)

In ``psycopg2`` the ``autocommit = True`` is ignored in the ``with connection`` block, which breaks replication protocol connections.

- Fix excessive HA loop runs with Zookeeper (Alexander Kukushkin)

Update of member ZNodes was causing a chain reaction and resulted in running the HA loops multiple times in a row.

- Reload if REST API certificate is changed on disk (Michael Todorovic)

If the REST API certificate file was updated in place Patroni didn't perform a reload.

- Don't create pgpass dir if kerberos auth is used (Kostiantyn Nemchenko)

Kerberos and password authentication are mutually exclusive.

- Fixed little issues with custom bootstrap (Alexander Kukushkin)

Start Postgres with ``hot_standby=off`` only when we do a PITR and restart it after PITR is done.


**Bugfixes**

- Compatibility with ``kazoo-2.7+`` (Alexander Kukushkin)

Since Patroni is handling retries on its own, it is relying on the old behavior of ``kazoo`` that requests to a Zookeeper cluster are immediately discarded when there are no connections available.

- Explicitly request the version of Etcd v3 cluster when it is known that we are connecting via proxy (Alexander Kukushkin)

Patroni is working with Etcd v3 cluster via gPRC-gateway and it depending on the cluster version different endpoints (``/v3``, ``/v3beta``, or ``/v3alpha``) must be used. The version was resolved only together with the cluster topology, but since the latter was never done when connecting via proxy.

2.0.2

Not secure
-------------

Released 2021-02-22

**New features**

- Ability to ignore externally managed replication slots (James Coleman)

Patroni is trying to remove any replication slot which is unknown to it, but there are certainly cases when replication slots should be managed externally. From now on it is possible to configure slots that should not be removed.

- Added support for cipher suite limitation for REST API (Gunnar "Nick" Bluth)

It could be configured via ``restapi.ciphers`` or the ``PATRONI_RESTAPI_CIPHERS`` environment variable.

- Added support for encrypted TLS keys for REST API (Jonathan S. Katz)

It could be configured via ``restapi.keyfile_password`` or the ``PATRONI_RESTAPI_KEYFILE_PASSWORD`` environment variable.

- Constant time comparison of REST API authentication credentials (Alex Brasetvik)

Use ``hmac.compare_digest()`` instead of ``==``, which is vulnerable to timing attack.

- Choose synchronous nodes based on replication lag (Krishna Sarabu)

If the replication lag on the synchronous node starts exceeding the configured threshold it could be demoted to asynchronous and/or replaced by the other node. Behaviour is controlled with ``maximum_lag_on_syncnode``.


**Stability improvements**

- Start postgres with ``hot_standby = off`` when doing custom bootstrap (Igor Yanchenko)

During custom bootstrap Patroni is restoring the basebackup, starting Postgres up, and waiting until recovery finishes. Some PostgreSQL parameters on the standby can't be smaller than on the primary and if the new value (restored from WAL) is higher than the configured one, Postgres panics and stops. In order to avoid such behavior we will do custom bootstrap without ``hot_standby`` mode.

- Warn the user if the required watchdog is not healthy (Nicolas Thauvin)

When the watchdog device is not writable or missing in required mode, the member cannot be promoted. Added a warning to show the user where to search for this misconfiguration.

- Better verbosity for single-user mode recovery (Alexander Kukushkin)

If Patroni notices that PostgreSQL wasn't shutdown clearly, in certain cases the crash-recovery is executed by starting Postgres in single-user mode. It could happen that the recovery failed (for example due to the lack of space on disk) but errors were swallowed.

- Added compatibility with ``python-consul2`` module (Alexander Kukushkin, Wilfried Roset)

The good old ``python-consul`` is not maintained since a few years, therefore someone created a fork with new features and bug-fixes.

- Don't use ``bypass_api_service`` when running ``patronictl`` (Alexander Kukushkin)

When a K8s pod is running in a non-``default`` namespace it does not necessarily have enough permissions to query the ``kubernetes`` endpoint. In this case Patroni shows the warning and ignores the ``bypass_api_service`` setting. In case of ``patronictl`` the warning was a bit annoying.

- Create ``raft.data_dir`` if it doesn't exists or make sure that it is writable (Mark Mercado)

Improves user-friendliness and usability.


**Bugfixes**

- Don't interrupt restart or promote if lost leader lock in pause (Alexander Kukushkin)

In pause it is allowed to run postgres as primary without lock.

- Fixed issue with ``shutdown_request()`` in the REST API (Nicolas Limage)

In order to improve handling of SSL connections and delay the handshake until thread is started Patroni overrides a few methods in the ``HTTPServer``. The ``shutdown_request()`` method was forgotten.

- Fixed issue with sleep time when using Zookeeper (Alexander Kukushkin)

There were chances that Patroni was sleeping up to twice longer between running HA code.

- Fixed invalid ``os.symlink()`` calls when moving data directory after failed bootstrap (Andrew L'Ecuyer)

If the bootstrap failed Patroni is renaming data directory, pg_wal, and all tablespaces. After that it updates symlinks so filesystem remains consistent. The symlink creation was failing due to the ``src`` and ``dst`` arguments being swapped.

- Fixed bug in the post_bootstrap() method (Alexander Kukushkin)

If the superuser password wasn't configured Patroni was failing to call the ``post_init`` script and therefore the whole bootstrap was failing.

- Fixed an issues with pg_rewind in the standby cluster (Alexander Kukushkin)

If the superuser name is different from Postgres, the ``pg_rewind`` in the standby cluster was failing because the connection string didn't contain the database name.

- Exit only if authentication with Etcd v3 explicitly failed (Alexander Kukushkin)

On start Patroni performs discovery of Etcd cluster topology and authenticates if it is necessarily. It could happen that one of etcd servers is not accessible, Patroni was trying to perform authentication on this server and failing instead of retrying with the next node.

- Handle case with psutil cmdline() returning empty list (Alexander Kukushkin)

Zombie processes are still postmasters children, but they don't have cmdline()

- Treat ``PATRONI_KUBERNETES_USE_ENDPOINTS`` environment variable as boolean (Alexander Kukushkin)

Not doing so was making impossible disabling ``kubernetes.use_endpoints`` via environment.

- Improve handling of concurrent endpoint update errors (Alexander Kukushkin)

Patroni will explicitly query the current endpoint object, verify that the current pod still holds the leader lock and repeat the update.

2.0.1

Not secure
-------------

Released 2020-10-01

**New features**

- Use ``more`` as pager in ``patronictl edit-config`` if ``less`` is not available (Pavel Golub)

On Windows it would be the ``more.com``. In addition to that, ``cdiff`` was changed to ``ydiff`` in ``requirements.txt``, but ``patronictl`` still supports both for compatibility.

- Added support of ``raft`` ``bind_addr`` and ``password`` (Alexander Kukushkin)

``raft.bind_addr`` might be useful when running behind NAT. ``raft.password`` enables traffic encryption (requires the ``cryptography`` module).

- Added ``sslpassword`` connection parameter support (Kostiantyn Nemchenko)

The connection parameter was introduced in PostgreSQL 13.

**Stability improvements**

- Changed the behavior in pause (Alexander Kukushkin)

1. Patroni will not call the ``bootstrap`` method if the ``PGDATA`` directory is missing/empty.
2. Patroni will not exit on sysid mismatch in pause, only log a warning.
3. The node will not try to grab the leader key in pause mode if Postgres is running not in recovery (accepting writes) but the sysid doesn't match with the initialize key.

- Apply ``master_start_timeout`` when executing crash recovery (Alexander Kukushkin)

If Postgres crashed on the leader node, Patroni does a crash-recovery by starting Postgres in single-user mode. During the crash-recovery the leader lock is being updated. If the crash-recovery didn't finish in ``master_start_timeout`` seconds, Patroni will stop it forcefully and release the leader lock.

- Removed the ``secure`` extra from the ``urllib3`` requirements (Alexander Kukushkin)

The only reason for adding it there was the ``ipaddress`` dependency for python 2.7.

**Bugfixes**

- Fixed a bug in the ``Kubernetes.update_leader()`` (Alexander Kukushkin)

An unhandled exception was preventing demoting the primary when the update of the leader object failed.

- Fixed hanging ``patronictl`` when RAFT is being used (Alexander Kukushkin)

When using ``patronictl`` with Patroni config, ``self_addr`` should be added to the ``partner_addrs``.

- Fixed bug in ``get_guc_value()`` (Alexander Kukushkin)

Patroni was failing to get the value of ``restore_command`` on PostgreSQL 12, therefore fetching missing WALs for ``pg_rewind`` didn't work.

2.0.0

Not secure
-------------

Released 2020-09-02

This version enhances compatibility with PostgreSQL 13, adds support of multiple synchronous standbys, has significant improvements in handling of ``pg_rewind``, adds support of Etcd v3 and Patroni on pure RAFT (without Etcd, Consul, or Zookeeper), and makes it possible to optionally call the ``pre_promote`` (fencing) script.

**PostgreSQL 13 support**

- Don't fire ``on_reload`` when promoting to ``standby_leader`` on PostgreSQL 13+ (Alexander Kukushkin)

When promoting to ``standby_leader`` we change ``primary_conninfo``, update the role and reload Postgres. Since ``on_role_change`` and ``on_reload`` effectively duplicate each other, Patroni will call only ``on_role_change``.

- Added support for ``gssencmode`` and ``channel_binding`` connection parameters (Alexander Kukushkin)

PostgreSQL 12 introduced ``gssencmode`` and 13 ``channel_binding`` connection parameters and now they can be used if defined in the ``postgresql.authentication`` section.

- Handle renaming of ``wal_keep_segments`` to ``wal_keep_size`` (Alexander Kukushkin)

In case of misconfiguration (``wal_keep_segments`` on 13 and ``wal_keep_size`` on older versions) Patroni will automatically adjust the configuration.

- Use ``pg_rewind`` with ``--restore-target-wal`` on 13 if possible (Alexander Kukushkin)

On PostgreSQL 13 Patroni checks if ``restore_command`` is configured and tells ``pg_rewind`` to use it.


**New features**

- [BETA] Implemented support of Patroni on pure RAFT (Alexander Kukushkin)

This makes it possible to run Patroni without 3rd party dependencies, like Etcd, Consul, or Zookeeper. For HA you will have to run either three Patroni nodes or two nodes with Patroni and one node with ``patroni_raft_controller``. For more information please check the :ref:`documentation <raft_settings>`.

- [BETA] Implemented support for Etcd v3 protocol via gPRC-gateway (Alexander Kukushkin)

Etcd 3.0 was released more than four years ago and Etcd 3.4 has v2 disabled by default. There are also chances that v2 will be completely removed from Etcd, therefore we implemented support of Etcd v3 in Patroni. In order to start using it you have to explicitly create the ``etcd3`` section is the Patroni configuration file.

- Supporting multiple synchronous standbys (Krishna Sarabu)

It allows running a cluster with more than one synchronous replicas. The maximum number of synchronous replicas is controlled by the new parameter ``synchronous_node_count``. It is set to 1 by default and has no effect when the ``synchronous_mode`` is set to ``off``.

- Added possibility to call the ``pre_promote`` script (Sergey Dudoladov)

Unlike callbacks, the ``pre_promote`` script is called synchronously after acquiring the leader lock, but before promoting Postgres. If the script fails or exits with a non-zero exitcode, the current node will release the leader lock.

- Added support for configuration directories (Floris van Nee)

YAML files in the directory loaded and applied in alphabetical order.

- Advanced validation of PostgreSQL parameters (Alexander Kukushkin)

In case the specific parameter is not supported by the current PostgreSQL version or when its value is incorrect, Patroni will remove the parameter completely or try to fix the value.

- Wake up the main thread when the forced checkpoint after promote completed (Alexander Kukushkin)

Replicas are waiting for checkpoint indication via member key of the leader in DCS. The key is normally updated only once per HA loop. Without waking the main thread up, replicas will have to wait up to ``loop_wait`` seconds longer than necessary.

- Use of ``pg_stat_wal_recevier`` view on 9.6+ (Alexander Kukushkin)

The view contains up-to-date values of ``primary_conninfo`` and ``primary_slot_name``, while the contents of ``recovery.conf`` could be stale.

- Improved handing of IPv6 addresses in the Patroni config file (Mateusz Kowalski)

The IPv6 address is supposed to be enclosed into square brackets, but Patroni was expecting to get it plain. Now both formats are supported.

- Added Consul ``service_tags`` configuration parameter (Robert Edström)

They are useful for dynamic service discovery, for example by load balancers.

- Implemented SSL support for Zookeeper (Kostiantyn Nemchenko)

It requires ``kazoo>=2.6.0``.

- Implemented ``no_params`` option for custom bootstrap method (Kostiantyn Nemchenko)

It allows calling ``wal-g``, ``pgBackRest`` and other backup tools without wrapping them into shell scripts.

- Move WAL and tablespaces after a failed init (Feike Steenbergen)

When doing ``reinit``, Patroni was already removing not only ``PGDATA`` but also the symlinked WAL directory and tablespaces. Now the ``move_data_directory()`` method will do a similar job, i.e. rename WAL directory and tablespaces and update symlinks in PGDATA.


**Improved in pg_rewind support**

- Improved timeline divergence check (Alexander Kukushkin)

We don't need to rewind when the replayed location on the replica is not ahead of the switchpoint or the end of the checkpoint record on the former primary is the same as the switchpoint. In order to get the end of the checkpoint record we use ``pg_waldump`` and parse its output.

- Try to fetch missing WAL if ``pg_rewind`` complains about it (Alexander Kukushkin)

It could happen that the WAL segment required for ``pg_rewind`` doesn't exist in the ``pg_wal`` directory anymore and therefore ``pg_rewind`` can't find the checkpoint location before the divergence point. Starting from PostgreSQL 13 ``pg_rewind`` could use ``restore_command`` for fetching missing WALs. For older PostgreSQL versions Patroni parses the errors of a failed rewind attempt and tries to fetch the missing WAL by calling the ``restore_command`` on its own.

- Detect a new timeline in the standby cluster and trigger rewind/reinitialize if necessary (Alexander Kukushkin)

The ``standby_cluster`` is decoupled from the primary cluster and therefore doesn't immediately know about leader elections and timeline switches. In order to detect the fact, the ``standby_leader`` periodically checks for new history files in ``pg_wal``.

- Shorten and beautify history log output (Alexander Kukushkin)

When Patroni is trying to figure out the necessity of ``pg_rewind``, it could write the content of the history file from the primary into the log. The history file is growing with every failover/switchover and eventually starts taking up too many lines, most of which are not so useful. Instead of showing the raw data, Patroni will show only 3 lines before the current replica timeline and 2 lines after.


**Improvements on K8s**

- Get rid of ``kubernetes`` python module (Alexander Kukushkin)

The official python kubernetes client contains a lot of auto-generated code and therefore very heavy. Patroni uses only a small fraction of K8s API endpoints and implementing support for them wasn't hard.

- Make it possible to bypass the ``kubernetes`` service (Alexander Kukushkin)

When running on K8s, Patroni is usually communicating with the K8s API via the ``kubernetes`` service, the address of which is exposed in the ``KUBERNETES_SERVICE_HOST`` environment variable. Like any other service, the ``kubernetes`` service is handled by ``kube-proxy``, which in turn, depending on the configuration, is either relying on a userspace program or ``iptables`` for traffic routing. Skipping the intermediate component and connecting directly to the K8s master nodes allows us to implement a better retry strategy and mitigate risks of demoting Postgres when K8s master nodes are upgraded.

- Sync HA loops of all pods of a Patroni cluster (Alexander Kukushkin)

Not doing so was increasing failure detection time from ``ttl`` to ``ttl + loop_wait``.

- Populate ``references`` and ``nodename`` in the subsets addresses on K8s (Alexander Kukushkin)

Some load-balancers are relying on this information.

- Fix possible race conditions in the ``update_leader()`` (Alexander Kukushkin)

The concurrent update of the leader configmap or endpoint happening outside of Patroni might cause the ``update_leader()`` call to fail. In this case Patroni rechecks that the current node is still owning the leader lock and repeats the update.

- Explicitly disallow patching non-existent config (Alexander Kukushkin)

For DCS other than ``kubernetes`` the PATCH call is failing with an exception due to ``cluster.config`` being ``None``, but on Kubernetes it was happily creating the config annotation and preventing writing bootstrap configuration after the bootstrap finished.

- Fix bug in ``pause`` (Alexander Kukushkin)

Replicas were removing ``primary_conninfo`` and restarting Postgres when the leader key was absent, but they should do nothing.


**Improvements in REST API**

- Defer TLS handshake until worker thread has started (Alexander Kukushkin, Ben Harris)

If the TLS handshake was done in the API thread and the client-side didn't send any data, the API thread was blocked (risking DoS).

- Check ``basic-auth`` independently from client certificate in REST API (Alexander Kukushkin)

Previously only the client certificate was validated. Doing two checks independently is an absolutely valid use-case.

- Write double ``CRLF`` after HTTP headers of the ``OPTIONS`` request (Sergey Burladyan)

HAProxy was happy with a single ``CRLF``, while Consul health-check complained about broken connection and unexpected EOF.

- ``GET /cluster`` was showing stale members info for Zookeeper (Alexander Kukushkin)

The endpoint was using the Patroni internal cluster view. For Patroni itself it didn't cause any issues, but when exposed to the outside world we need to show up-to-date information, especially replication lag.

- Fixed health-checks for standby cluster (Alexander Kukushkin)

The ``GET /standby-leader`` for a master and ``GET /master`` for a ``standby_leader`` were incorrectly responding with 200.

- Implemented ``DELETE /switchover`` (Alexander Kukushkin)

The REST API call deletes the scheduled switchover.

- Created ``/readiness`` and ``/liveness`` endpoints (Alexander Kukushkin)

They could be useful to eliminate "unhealthy" pods from subsets addresses when the K8s service is used with label selectors.

- Enhanced ``GET /replica`` and ``GET /async`` REST API health-checks (Krishna Sarabu, Alexander Kukushkin)

Checks now support optional keyword ``?lag=<max-lag>`` and will respond with 200 only if the lag is smaller than the supplied value. If relying on this feature please keep in mind that information about WAL position on the leader is updated only every ``loop_wait`` seconds!

- Added support for user defined HTTP headers in the REST API response (Yogesh Sharma)

This feature might be useful if requests are made from a browser.


**Improvements in patronictl**

- Don't try to call non-existing leader in ``patronictl pause`` (Alexander Kukushkin)

While pausing a cluster without a leader on K8s, ``patronictl`` was showing warnings that member "None" could not be accessed.

- Handle the case when member ``conn_url`` is missing (Alexander Kukushkin)

On K8s it is possible that the pod doesn't have the necessary annotations because Patroni is not yet running. It was making ``patronictl`` to fail.

- Added ability to print ASCII cluster topology (Maxim Fedotov, Alexander Kukushkin)

It is very useful to get overview of the cluster with cascading replication.

- Implement ``patronictl flush switchover`` (Alexander Kukushkin)

Before that ``patronictl flush`` only supported cancelling scheduled restarts.


**Bugfixes**

- Attribute error during bootstrap of the cluster with existing PGDATA (Krishna Sarabu)

When trying to create/update the ``/history`` key, Patroni was accessing the ``ClusterConfig`` object which wasn't created in DCS yet.

- Improved exception handling in Consul (Alexander Kukushkin)

Unhandled exception in the ``touch_member()`` method caused the whole Patroni process to crash.

- Enforce ``synchronous_commit=local`` for the ``post_init`` script (Alexander Kukushkin)

Patroni was already doing that when creating users (``replication``, ``rewind``), but missing it in the case of ``post_init`` was an oversight. As a result, if the script wasn't doing it internally on it's own the bootstrap in ``synchronous_mode`` wasn't able to finish.

- Increased ``maxsize`` in the Consul pool manager (ponvenkates)

With the default ``size=1`` some warnings were generated.

- Patroni was wrongly reporting Postgres as running (Alexander Kukushkin)

The state wasn't updated when for example Postgres crashed due to an out-of-disk error.

- Put ``*`` into ``pgpass`` instead of missing or empty values (Alexander Kukushkin)

If for example the ``standby_cluster.port`` is not specified, the ``pgpass`` file was incorrectly generated.

- Skip physical replication slot creation on the leader node with special characters (Krishna Sarabu)

Patroni appeared to be creating a dormant slot (when ``slots`` defined) for the leader node when the name contained special chars such as '-' (for e.g. "abc-us-1").

- Avoid removing non-existent ``pg_hba.conf`` in the custom bootstrap (Krishna Sarabu)

Patroni was failing if ``pg_hba.conf`` happened to be located outside of the ``pgdata`` dir after custom bootstrap.

Page 4 of 10

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.