Changelog
List of aperture PRs merged since 0.7.0 release. For the full list of changes, see [list of changes][changes]
Revamp workload and flux meter metrics and labels (843)
Description of change
* New label `attribute_found` in FluxMeter to denote if the attribute on
which the flux meter is based was found in the access log/span
* Removed label `decision_type` on summary `workload_latency_ms` since
it is now emitted only if response was received.
* New counter `workload_requests_total` to measure the workload
decisions count since the summary does not take into account the
scenarios where response is not received e.g. rejects or connection
resets.
* A new column `response_received` on OLAP Flow events to denote the
case when response is not received.
Ignore negative workload latency (839)
Issue
* Workload latency in case of Envoy is calculated as:
workload_latency = response_latency - aperture_latency
* Workload Latency can become negative in case of connection reset
* If the connection is aborted by Client or Server Envoy immediately
terminates the connection for the other endpoint.
* In the Access Log, status code is set as 0 and `response_latency` is
set as zero.
* If Authz call to Aperture Agent had succeeded for this request, then
aperture_latency is greater than zero.
* This would lead the `workload_latency` to be computed as negative.
![Screenshot from 2022-10-28
19-24-44](https://user-images.githubusercontent.com/3925292/198775984-661e8293-e2eb-4357-a616-8552946572a0.png)
Fix
* Ignore negative workload latency I.E. don't populate the workload
latency column
* Publish Prometheus metrics for flux-meter or workload latency only if
the metric column is found
TickInfo in LoadDecision (836)
Description of change
* Put `TickInfo` in LoadDecision` to re-trigger fill-rate evaluation at
Agent.
Re-structure protos (831)
Fix telemetry labels propagation (835)
Description of change
This fixes regression introduced in
https://github.com/fluxninja/aperture/pull/828.
Dynamic Telemetry Flow Labels were added before labels filtering, which
led them to be incorrectly filtered out.
Fix telemetry labels propagation (835)
Description of change
This fixes regression introduced in
https://github.com/fluxninja/aperture/pull/828.
Dynamic Telemetry Flow Labels were added before labels filtering, which
led them to be incorrectly filtered out.
Bump OTEL to 0.63.0 (834)
Description of change
Bumps OTEL and FN OTEL to 0.63.0. This removes Istio 1.15 compat hack as
it is included in the upstream OTEL.
Response status in telemetry (828)
Description of change
This introduces `aperture.response_status` column in telemetry. It
mirrors the implementation of `response_status` label for metrics.
This also extends above logic to include `1xx`, `2xx`, and `3xx` codes
as OK instead of only `2xx` codes.
Besides this, some cleanup is done:
1. Above logic is moved from `FluxMeter` to OTEL package. **This changes
FluxMeter interface!**
2. A log of logic is moved from `metricsprocessor` to
`metricsprocessor/internal` for better visibility and easier separation
of functions which are called directly in metricsprocessor and helpers,
3. The above made creating UT much easier, so this PR also includes
some.
Ref: https://github.com/fluxninja/cloud/issues/6788
Dry run mode for Load Actuator (826)
Description of change
* Dry run mode for Load Actuator. No traffic can get dropped due to this
Load Actuator in this mode. Useful for observing the behavior of Load
Actuator without any disruptions.
* Load Actuator has a new Pass through mode
* Default to Pass through mode in case multiplier is invalid and also
when there is no decision available at the Agent including
initialization
Rollup based on metrics (821)
Closes: GH-515
docs: playground doc updates (819)
Description of change
- Moved demo_app to playground
- Added more details to playground documentation
- Bump istio and other tools
[changes]: https://github.com/fluxninja/aperture/compare/releases/aperture-controller/v0.7.0...releases/aperture-controller/v0.8.0