Release Notes
This patch release introduces support for emitting run heartbeats for the entire duration of a run's lifecycle on Argo Workflows. Metaflow's UI relies on a complex state machine to ascertain the correct state of any run, step, task, and attempt. For runs executed on [Workflow Orchestrators like AWS Step Functions, Argo Workflows, and Airflow](https://docs.metaflow.org/production/scheduling-metaflow-flows/introduction), by design - there is no execution-wide supervisor process that can monitor and keep track of the state of the execution for Metaflow. We instead rely on a task-local process to exfiltrate this state to Metaflow - which aids in effortlessly scaling Metaflow to millions of tasks. However, if no task is scheduled for a while, Metaflow's UI might temporarily show the run as failed (red) before correcting it (green) when the task is scheduled, which can be confusing.
With the latest release, executions on Argo Workflows can kick off a daemon process that is alive for the entire duration of the execution. This daemon process will ensure that a liveness signal is emitted reliably throughout the entire duration of the execution. You can enable this by simply adding the `--enable-heartbeat-daemon` flag -
python flow.py argo-workflows create --enable-heartbeat-daemon
The next release will enable this functionality as default. If this turns out to be useful for you, or if you have any feedback, ping us at chat.metaflow.org!