Release Notes
[v0.38.0](https://github.com/determined-ai/determined/blob/v0.38.0/docs/release-notes.rst)
Changelog
* 715442484 chore: release notes 0.38.0 (10231)
* 13e49a7e3 [AUTO-BACKPORT release-0.38.0] 10226: chore: eliminate use of fury repo (10229)
* a554cd09e [AUTO-BACKPORT release-0.38.0] 10224: fix: make some k8s tests pass (10228)
* 0d373b162 [AUTO-BACKPORT release-0.38.0] 10221: fix: use new migration gist (10222)
* c93b84826 [AUTO-BACKPORT release-0.38.0] 10213: fix: port k8s perf fix (10220)
* 0cc57dfa7 chore: backport 10208 to release 0.38.0 (10219)
* 7d9c5ed71 [AUTO-BACKPORT release-0.38.0] 10216: fix: license check tests (10217)
* e2d8f4737 [AUTO-BACKPORT release-0.38.0] 10206: ci: remove datadog from ci (10214)
* 9619dcf1f [AUTO-BACKPORT release-0.38.0] 10211: chore: fix license check (10215)
* 332cefca4 [AUTO-BACKPORT release-0.38.0] 10207: fix: revert: fix: resolve indefinitely queued (STOPPING_COMPLETED) trials (10210)
* e693655b4 [AUTO-BACKPORT release-0.38.0] 10203: revert: log search (10205)
* 50b769048 chore: 0.38.0 environment images (10197)
* bb6f14057 [AUTO-BACKPORT 10160] fix: maxPoolSlotCapacity bug (10195)
* 7db183ef2 [AUTO-BACKPORT 10182] docs: docs changes for searcher context removal (10194)
* 23f97932c [AUTO-BACKPORT 10192] fix: keras continue from cloud checkpoint (10193)
* 508d400f0 [AUTO-BACKPORT 10174] docs: update docs for non-Trial-centric world (10186)
* 87f5ff853 [AUTO-BACKPORT 10188] fix: include max_length in continue expconf (10190)
* e72591837 [AUTO-BACKPORT 10183] docs: fix typos in the release note (10185)
* 23687dbaa [AUTO-BACKPORT 10178] docs: known issue of tb_plugin (10181)
* 5427a68be [AUTO-BACKPORT 10172] fix: ban archive columns in filter for experiment/search search (10176)
* 88c8887c0 [AUTO-BACKPORT 10173] fix: client.logout() re-enables client.login() (10177)
* 42f74e61b [AUTO-BACKPORT 10168] chore: ignore test_e2e_longrunning tests when merging auto-backports (10179)
* 020fc4369 [AUTO-BACKPORT 10161] fix: fix diffusion example [DET-10470] (10169)
* c69aa6888 [AUTO-BACKPORT 10140] fix: set max slots and checkpoint gc policy should comply with config policies (10167)
* b5e6315bb fix: set max slots and checkpoint gc policy should comply with config policies (10140)
* 8e6a65853 [AUTO-BACKPORT 10105] chore: change det deploy aws's default deployment type to simple-rds (10162)
* 6fc6710b4 [AUTO-BACKPORT 10153] docs: checkpoint storage note for config policies (10165)
* b366f80da [AUTO-BACKPORT 10138] feat: determined_master_host and friends helm support, better defaults (10159)
* d8afc5773 [AUTO-BACKPORT 10155] fix: fix iris example to use reported metric name (10156)
* 38ae54b67 [AUTO-BACKPORT 10149] fix: error message fix for duplicate model name (10154)
* 47ba6a934 build: INFENG-943: GoReleaser configure prerelease (10146)
* aad58c179 build: INFENG-942: Conditionally bypass build-react job checks (10145)
* d7f0bbfe3 chore: lock published urls to preserve redirects
* e3c31f0a1 Temporarily disable GitHub Actions credentials.
* 3be954b27 build: INFENG-938: Update version format in Makefiles (10142)
* 69b93b0d1 build: INFENG-940: Fix logic error in CircleCI config make-component job (10143)
* 00870f53c build: INFENG-937: Publish Helm chart release candidates (10141)
* 3910426dc feat: remove searcher context from harness and master [MD-498] (10131)
* 27bebdd49 build: INFENG-938: Tweak version string format (10139)
* 30ad3c078 feat: add master configurations for access token max and default lifespans [DET-10464] (10101)
* 782f7a09f revert: "chore: determined_master_host and friends helm support, better defaults" (10134)
* 233e095e0 chore: add checkpoint and max slots config policy enforcements in PATCH experiment (10125)
* b3f928bab chore: determined_master_host and friends helm support, better defaults (10092)
* 67554679b chore: bump Go version used by CI builds to 1.22.8 (10127)
* 834eeda6a feat: add actual select all to glide tables [ET-238] (10081)
* c7e0fb5e2 docs: add log signal release note and update docs (10126)
* 02fcc7402 test: Add test for filtering user by Role Id (10095)
* f97fb5a41 build: INFENG-933: add GitHub action to start a minor release (10112)
* 685918dad docs: Add aurora postgres release note (10115)
* a84f8c65b chore: SSO improvement feature requires Enterprise Edition. (10124)
* c71617c0a feat: Log Signal Exp Config and Monitoring (9947)
* 06b0b31b8 chore: fix merge exp flake (10122)
* 962810ab8 chore: improve messaging when workspace configs conflict with global … (10121)
* 6158ef7bb docs: Update postgres aurora info (10116)
* 4b0c0657e docs: log policies restore exp config (10120)
* 186962cd4 chore: add config policies to CLI reference docs (10118)
* 11ea6f46a chore: clarify version overrides during helm installs (10094)
* 4394f297e chore: standardize status api errors for task config policies (10119)
* e8343023b fix: Add on delete cascade to system_metrics (10113)
* 3c59233db chore: populate final merged config with defaults when merging invariant configs (10107)
* deb3772c7 feat: additional APIs to support "actual select all" functions [ET-238] (10102)
* fd9cd8a3f feat: Allow master configuration for ssh key type (10072)
* 5e9df7c43 docs: Update release notes (10114)
* c655f339a docs: fix internal link in multi-rm docs page. (10074)
* e7186fe00 docs: Update log policies (10098)
* 993296b11 fix: update copy in experiment and trial headers (10111)
* d74a462a4 docs: Describe sso improvements (10110)
* 24d3390e5 chore: conditionally create VolumeSnapshotClass (10103)
* f45ebb9e8 chore: improve documentation surrounding slot caps helm configuration (10090)
* 0013fd04e ci: shorten test_pending_hpc.py (10104)
* 22ad4572a fix: version upgrade notification bug [CM-411] (10069)
* 935fa664d fix: Log searche feedbacks (10088)
* 29a08ece7 Revert "docs: Describe arbitrary metadata logging" (10099)
* c6c476c18 chore: remove e2e_slurm_preemption test series (10053)
* e6182edc0 docs: Describe arbitrary metadata logging (10073)
* 539df5e8b chore: update CLI commands to work with global APIs (10089)
* 1f2bea0b5 feat: update ConfigPolicies with docs link [CM-558] (10055)
* 4afc15f5c build: INFENG-926: Fix version.sh version string output (10085)
* 04861ddeb chore: return error if workspace config violates global constraints (10076)
* 912f91ed7 docs: task config policies release note (10087)
* 6d5610154 fix: remove flake-inducing logretention global singleton (10016)
* b70a622e2 fix: correct token creation CLI to ensure it works with default expiry (10084)
* b15533237 docs: Describe task config policies (9969)
* 27a014b44 fix: Tensorboard broken on unified install [CM-578] (10080)
* bdb56a40f chore: INFENG-922: use correct gh_team tag for infrastructure (10077)
* 91e358aa7 INFENG-382: Release redesign (10002)
* 34e47490f chore: remove redundant rm.ExternalPreemptionPending interface (10071)
* 28bc072b8 feat: SSO Improvement - alter `user_sessions` table to include access token, implement CRUD ops, GET, POST, PATCH APIs and `det token` CLIs (9867)
* 472baf9bc feat: Add copy task id to task list (10058)
* 2e822b7f5 chore: fix update invariant config and constraints (10078)
* d69f7cc28 chore(deps): bump google.golang.org/grpc from 1.64.0 to 1.64.1 (9910)
* e796b921e fix: run checkpoint GC more aggressively to ensure tensorboards are GC'd (10017)
* a14525f14 fix: nil deref in usage of incomplete experiment config policies (10068)
* 6c46a465a refactor: remove annotations requiring search ids in bulk action js (ET-241) (10062)
* 3ca3418c4 Docs: describe data files apptainer (10020)
* 315f65d64 chore: ntsc config not supported (10056)
* 2e8de9bfe test: User Management test updates [CM-468] (10051)
* 3fc9fed86 chore: experiment config slots to comply with constraint max slots (10054)
* 1d5c984b7 chore: fix slices and maps merge test (10063)
* 219409b96 chore: fix helptext for det user (10060)
* 7d6a1a77f docs: add k8s RP example to the helm `values.yaml`. (10027)
* 9efd96df0 fix: apply config policy constraints to PATCH /experiments/:id (10048)
* dd6aedadf chore: change error code back (10042)
* 5a39ecb6e chore: check config policies on 'det notebook set priority' (10047)
* 2ef2f122f feat: bulk actions matching filters (ET-241) (9895)
* ac82b3ca8 chore: default priority earlier to ensure constraints are satisfied [CM-553] (10043)
* 34557ef98 feat: Extend LogViewer to support scrollable search (10005)
* dadf75ead chore: take invariant_config priority into account with manage job workflow (10025)
* 2356f9153 chore: remove e2e_slurm_misconfigured series tests (10023)
* b243c269f ci: deflake test_disable_agent_zero_slots (10040)
* 4e0f1c4f2 chore: validate global, admin input against task config policies & constraints (10028)
* 3c1630f3c test: add e2e tests to the "move project" functionality on the "List View" (10037)
* 0613cc646 docs: revise postgres permission setup instructions. (10039)
* 2594d9042 chore: remove e2e_slurm_gpu series tests (10021)
* 1f7ccad13 chore: exp invariant config silent override during add or update (10019)
* 30b197d59 feat: Global Config Policies UI [CM-522] (10022)
* c27054deb feat: add e2e tests for multi-sort filter on experiments lista (9992)
* 9faa0cbe2 chore: wait_for_task_state shows logs on failure (10029)
* a16682611 fix: Workspace Projects and Tasks test flakes [CM-554] (10026)
* 33dfdafea test: Workspace Models tests [CM-538] (9998)
* 7e8dbac5b fix: Update action bar row layout in UserManagement page (9862)
* 5b1380cf6 chore: check experiment constraints (10018)
* f609a2d06 fix: remove `formatDatetime` (10011)
* 9b6f0ac4c docs: Update release notes date (9999)
* f5400eadc feat: Add regex search to task logs API (9994)
* ddca76682 fix: correct expToWebhookConfig cache locking (10014)
* 80b29fa1c feat: Config Policies UI, Workspaces Experiments [CM-521] (10009)
* 262b4a9b1 chore: check task config policies against slots and max_slots (10015)
* a0cc81827 ci: replace no_op fixture with a noop api (9997)
* 987b2a508 test: add e2e experiment list pagination test (9993)
* 129789958 fix: use UID not username to set HOME dir (10010)
* 49e72a89c chore: reword jsonschema extension docs (9965)
* 63d728cf0 fix: display archived column for runs and searches (9987)
* 83a779eba feat: check task config policy constraints before scheduling NTSC wor… (9991)
* 0083d7e8d feat: add CLI commands for config policies [CM-423] (9911)
* ac54cf85f ci: delete pointless test (10004)
* 7f88390d4 fix: reset settings not working properly due to url encoding (10000)
* 25ca6d057 fix: import missing time module (9985)
* 8ab2145e1 chore: bump version: 0.37.0-dev0 -> 0.37.1-dev0
* 0760f7436 chore: add docs dropdown link for new version
* 23f1f30a2 docs: add release notes for 0.37.0 (9995)
* 99894756a test: Workspace Task tests [CM-476] (9982)
* ad66d3f47 chore: implement PUT APIs for task config policies (9983)
* 036336b0e docs: fix broken links (9996)
* ac8fbf6d8 chore: check task config policy priority limit for [CM-490] (9958)
* 8bc08e5f8 feat: Read and display log signal from DB (9959)
* c8b1910e2 ci: increase datagrid rightclick timeout/ reduce worker count (9951)
* e92c47460 fix: fix default id search for runs (9988)
* 3ca3d3089 test: increase Reactivate test step timeout (9986)
* bc3b2a6cd fix: Reactivate User test flake (9979)
* f2277f192 fix: fix hf on_save raise exception (9977)
* dbeea9984 fix: Cluster page height (9975)
* d02495b69 fix: Deactivate User test flake (9974)
* a8effe8a9 fix: show search progress in run table (9976)
* cf9bdc8e3 feat: workspace task config policies UI [CM-478] (9950)
* 924f66375 ci: remove default arg from utils.run_command() (9973)
* a96c5afa4 docs: add docstring for PyTorchContext.current_train_epoch (9972)
* 66f7a70b5 fix: grid hp samping ignored empty nests (9966)
* 8c4f7a0ad fix: correct `dataPath` for hyperparameters (9971)
* 5c4be96bd feat: add database snapshot functionality to Helm chart (9956)
* 31d9573cc fix: show `-` for empty data in searches table [ET-749] (9963)
* f758303ad chore: lock published urls to preserve redirects
* 2a8e7ddca chore: lock api state for backward compatibility check
* 3f54d073b chore: bump version: 0.36.1-dev0 -> 0.37.0-dev0
* baf451f20 chore: do not log error for resource pools with zero agents (9960)
* 6a8606e63 docs: Add hpc installation guide (9945)
* 3241edb1d fix: fix flaky generic task pause test (9962)
* 43556e99b fix: Remove CSS rule for hiding the Form.Item error message (9872)
* 590600172 perf: improve the initial page load speed (9939)
* eb1b0de39 docs: Add workload alerting (9938)
* cedfcfe01 chore: refactor and test RBAC config policies work [CM-530] (9943)
* 2d884b9ce docs: Add cluster overview (9936)
* e17d12c4a feat: release notes and improvements for workload alerting (9944)
* 0db2e3bbd ci: deflake make slurmcluster, hopefully (9957)
* 95f079d4e feat: add GET global config policies API (9952)
* d943d852e chore: fix global PUT for task config policies (9941)
* 410edf6a8 fix: broken MNIST download in e2e tests (9937)
* 004c194fe ci: fix flaky test_allocation_csv tests (9953)
* 88a4c679b feat: add Config Policies GET API and modify CRUD functions to accept both Workload types (9946)
* a73c8db9a test: debug auth [TESTENG-95] (9942)
* 13db674b5 test: experiment list show archived filter [ET-753] (9932)
* 02e302fc8 chore: remove unused languages from code editor (9898)
* f6d874da1 docs: Replace slack links (9919)
* 26b0954dc chore: implement Delete config policies API handlers (9927)
* 2d12be1b8 test: add projects tests [CM-467] (9928)
* 062cb52a0 fix: use different modules for Trial and Cluster topology (9917)
* 092895818 chore: change log level for log retention policies (9935)
* b559467f6 chore: bump coverage target (9920)
* 3a2ea5629 fix: do not filter slots for mixed-slot-type pools (9902)
* a58ed7c3d chore: reassign RM code to CM in CODEOWNERS (9926)
* cb3515e02 fix: update LogRetentionDays from master config when master starts/upgrades (9930)
* 13b7b3f02 ci: increase timeout for k8s intg tests (9929)
* 6f36969c7 fix: flaky workspace test (9931)
* 867eb3162 fix: update huggingface example (9925)
* 5b2275fe0 fix: Refactor sorting logic in WorkspaceProjects for filtering projects (9903)
* fd7f77abf fix: move validation dataloader check in PyTorchTrial [MD-515] (9923)
* db2881f31 chore: fix config policy unmarshal tests (9924)
* 3900742d4 chore: update test log pattern webhook cache (9922)
* f44687dd7 chore: create config policies table and add NTSC CRUD operations (9915)
* de89f6891 feat: support updating web hook url [MD-482] (9890)
* 02fbdbbe5 fix: huggingface callback raise process preempted exception (9913)
* 8c799b84e chore: prune cruft out of no_op fixture (9912)
* 11de11984 chore(deps): bump path-to-regexp and express in /webui/react (9909)
* 03961b50e test: add workspace tests (9905)
* c877383bb fix: GetTrialRemainingLogRetentionDays should take global log retention days into account [CM-518] (9914)
* fb0d5f910 fix: change workspace name and set resource quota simultaneously (9847)
* 8fb9f6b65 docs: Update ROCM support (9893)
* 481bddb04 chore(deps): bump github.com/docker/docker from 24.0.9+incompatible to 25.0.6+incompatible (9780)
* c1499ac3b chore: removing model_hub references from Makefile (9901)
* c961dbd8f feat: new run object for Run Centric API (9897)
* bfeb418f1 feat: Implement custom trigger for webhooks (9879)
* b6eb05e33 chore: Remove model hub (9869)
* 4a28c10f2 chore: add unmarshal functions for task config policies (9896)
* d842383e5 fix: timezone handling error in queued allocation time update (9892)
* 55b3f9b6b test: cover project id filtering on bulk actions [ET-138] (9870)
* 036477be3 chore: stub new APIs for task config policies [CM-485] (9880)
* be2622a51 test: Delete workspace after webhook test (9891)
* a30bc2562 feat: Add rbac for config policies (9873)
* 8c83d311f chore: create WorkloadType enum and Go config + constraints structs (9885)
* 0a18c5ae6 fix: add backwards compatibility for Pods to Jobs for k8s <v1.27 [CM-461] (9878)
* 8e6bba818 ci: fix master-config syntax (9889)
* d5d647ae8 fix: inconsistent timezone handling in daily allocation aggregation (9888)
* b4209efbc test: login redirect with nested route (9881)
* 8cacba635 ci: add e2e bulk kill test (9868)
* 590c3625b fix: Hf callback metric naming (9887)
* 61fd26b9a fix: reset Model Registry page number on pageload [ET-640] (9876)
* ce27f81cd fix: show `-` for empty data in run table (9871)
* b1c08145b fix: prevent `hyperparameter search modal` submitting the same request multiple times (9883)
* d54713c3d fix: use new ruamel yaml APIs (9882)
* ad5fe5a40 fix: prevent out of bounds navigation on new list views (9875)
* a605f006c fix: reject reconnecting agents with different resource pool configuration (9815)
* db92bad1e feat: Support RBAC in webhook (9859)
* 0ef81aa2f fix: sorting by arbitrary metadata (9874)
* c1b776778 feat: Auto-Populate POSIX Information on sign in using SSO [CM-399] (9755)
* 54b61653f feat: Logic of different modes for webhook (9865)
* a77355127 fix: allow for objects inside array metadata to be typed properly (9864)
* ee269c896 test: successful login with weak or strong password (9858)
* e21fc6f8c ci: pin chromadb version to avoid incompatibility (9849)
* a1234a12c chore: bump version: 0.36.0-dev0 -> 0.36.1-dev0
* d79c90dc5 chore: add docs dropdown link for new version
* ce6da7409 docs: add release notes for 0.36.0 (9854)
* a55af7418 fix: use task sessions in Core API [MD-509] (9860)
* 3ee88bb00 fix: replace tree with code mirror for metadata view (9853)
* 8dd46d5bc chore: Improve CompareTrials perfomance (9807)
* 6e0830394 fix: fix error toast popping up in Workpace Creator view (9855)
* fb95df8c5 chore: add backport github action (9835)
* a37e6e720 fix: prevent loading issues with ipynb files (9850)
* 9de4f72b7 feat: configurable preemption timeout [MD-500] (9833)
* 640126ba7 feat: Add workspaceId, mode, name to webhook (9820)
* d436c2373 fix: reset pinned column state when resetting columns (9852)
* 3a91552ac fix: fix fallback logic for partially provided custom logos (9842)
* 707ad0772 Revert "chore: add tracing info to some backend APIs" (9843)
* 73a756adf fix: update broken tensorflow & certbot links (9846)
* 771bbe4d1 ci: sequential metric count sweep test [Scale-35] (9791)
* 32fafdd89 perf: remove duplicate ids in `ExpMetricNames` api (9848)
* a8fa0155f docs: Fix broken links (9845)
* 2b1856a83 fix: model version name overflow on mobile [ET-384] (9827)
* e13de2042 docs: Document rbac editorprojectrestricted role (9844)
* 2838af41c chore: add tracing info to some backend APIs (9841)
* e3dfb0a70 fix: change filter form to say "Show runs" in flat runs view [ET-740] (9840)
* 52f2b9ff2 chore: add release notes for PR 9822 (9837)
* a37d48216 fix: experiment single trial tabs don't scroll on load (9831)
* aff486c14 feat: Rocm bumpenvs (9830)
* 13622adc2 feat: Add `report_progress` to `TrainContext` (9826)
* d8314611b fix: replace rawsource attribute with node directly, due to removal of rawsource in Docutil 2.0 (9838)
* 7ed9e8309 feat: add EOL notice regarding Aurora V1 & Postgres 12 along with Master Log warnings for Postgres <=12 [CM-413] [CM-416] (9832)
* 5c5f107dd docs: Minor docs enhancements (9836)
* e11629be5 chore: lock published urls to preserve redirects
* 6e0b9d1d3 chore: lock api state for backward compatibility check
* e1a227382 chore: bump version: 0.35.1-dev0 -> 0.36.0-dev0
* 42c2efae4 docs: Docs cleanup (9834)
* 3ed0a3973 docs: Make docs consistent with run centric ux (9824)
* a367cd0f0 chore: deprecate Custom Searcher [MD-504] (9829)
* f7846cb9b feat: allow users with role Viewer and above to view resource quotas (9822)
* 97353c95a fix: Group and User management (CM-436) (9825)
* 358ed28a4 fix: hide metadata section if there's no metadata (9823)
* 287f3be36 chore: unskip flaky test (9819)
* e85ac893a Clarify basic data lineage to mldm (9828)
* c0ca6590b fix: checkpoint table action menu shouldn't vanish on polling [ET-277] (9812)
* 740b0e748 docs: Describe basic lineage steps (9813)
* e5d4b7f43 chore: initial k8s rocm support [CM-367] (9794)
* 9548790e7 chore: fix torch version to 2.2.2 for intel mac (9821)
* b2a82e896 chore: deprecate kubernetes priority w/ preemption scheduler (9763)
* 2002bf02d docs: Getting a list of files in a checkpoint (9818)
* 91d0b6779 docs: Fix broken links (9816)
* e3578490b fix: don't ignore failures during experiment shutdown (9693)
* 9b9641631 test: add go unit tests for experiment bulk actions [ET-138] (9658)
* 92a7ff5b2 feat: support filter by metadata with string type (9810)
* 9da5620ed feat: exclude `Array` type columns (9808)
* 79ffa5255 chore: bump version: 0.35.0-dev0 -> 0.35.1-dev0
* 9949ab0b6 chore: add docs dropdown link for new version
* 261e2e780 docs: add release notes for 0.35.0 (9786)
* a11e9e83b chore(deps): bump torch from 1.11.0 to 2.3.0 (9726)
* bebaf17b7 fix: make navigation sidebar scrollable [ET-633] (9803)
* f7e18fc74 fix: prevent multiple calls to time-series on compare view select (9805)
* db98c4fd9 ci: Add a portable testing framework and scalability tests [SCALE-29] (9762)
* 9702d2283 fix: prevent extra initial calls to search endpoints (9782)
* 4e47a1ee9 chore: change the comment for defaultNamespace in values.yaml (9793)
* d3f3e76e5 test: datagrid action pause flake (9802)
* 1f7473c85 fix: return proper error message when moving a project with a matching names (9795)
* 15d1a6085 ci: fix scripting for `make slurmcluster` job (9801)
* 8173cabb2 fix: forked from link (9798)
* c3400dff0 feat: add editor project restricted role and testing [DET-10428] (9796)
* 2cb102271 test: base model package dependency update [TESTENG-59] (9777)
* 4f319422f test: omnibar tree-extension tests [ET-203] (9783)
* cdbbedda0 fix: don't filter single runs in the comparison view (9789)
* 80822ebe4 ci: label `make slurmcluster` instances for cloud spend [CM-405] (9792)
* ea589d860 chore: fix readme typo (9797)
* 7b4f01cbd fix: Add loading indicator when creating HP search (9774)
* a4d74af3b chore: readme should include codecov (9787)
* 786f25896 fix: uncomment helm values (9790)
* a0349640d fix: fixed helm chart values and master-config.yaml (9788)
* fe14062e0 feat: show metadata in run table (9776)
* 2b589c434 feat: add array column type for abitrary metadata (9759)
* 094c58be8 test: skip flaky test (9784)
* 49c3fa081 chore: add a utility for connecting devcluster to remote k8s clusters (9739)
* 13ebf47d8 chore: add Cluster Name title and change helm value (9775)
* 61aad7838 fix: fix contains filter for hyperparameters and metadata (9779)
* 15226b756 feat: add master config option to provide custom logo (9664)
* f42daca26 feat: make groups scope optional to support azure with OIDC (9773)
* 6105b3f24 docs: fix insecure link to systemd docs (9772)
* 068b9595a feat: checkpoint view for flat runs [ET-658] (9769)
* dab697886 feat: add code tab to run page [ET-657] (9771)
* 2c9109896 test: use previously created experiment for pause test (9727)
* 935799d2f fix: use run checkpoint data instead of experiment for run table filter (9767)
* 30d6e7902 fix: extract searcher metric from experiment payload (9768)
* b8c677370 fix: fix missing task_stats start_time on restored allocation (9745)
* a094ea1b3 chore: pin numpy version and upgrade sphinx [MD-468] (9736)
* 08065978b feat: add Metadata section to TrialDetailsOverview (ET-224) (9639)
* 287faf78c chore: bumpenv pin numpy to 1.x [MD-470] (9748)
* becd8b6ae chore: remove RM Name from RP descriptions (9758)
* fc8ac0baa chore: undo test skip after fix was merged (9754)
* de898c9dd Revert "chore: add configurable posix claims fields to master config [RM-398]" (9753)
* 623c945b3 fix: load trial data for single run searches in search view 9742 (9752)
* 41a512e10 fix: debounce searches column width settings 9700 (9751)
* bc721bf53 refactor: change 'close' to 'save' on button in ManageJob modal [DET-10446] (9750)
* 0ce2ff149 fix: change external_run_id to string type in FlatRun proto (9749)
* 20ed1268d fix: reduce the number of api calls from Workspace Create Modal (9735)
* 61bc7bbf5 chore: add configurable posix claims fields to master config [RM-398] (9690)
* 2cdfdf9f4 fix: change external_run_id to string type in FlatRun proto (9744)
* 36aaed77a chore: fine-tune error and help messages of CLI commands for slot caps (9743)
* 0df7ad346 test: workspace and project tests [TESTENG-60] (9740)
* e00d9f4b2 chore: add release note for ComparisonView bugfix (9741)
* 35ec0773e chore: add 'masterService.annotations' to Helm (9697)
* 5f8dae383 chore: fix exp delete log msg (9716)
* 9dc0afada fix: deadlock issue (9728)
* f8067ba21 chore: skip failing Deactivate and reactivate user test (9723)
* 9efb2162c feat: CLI command to list the members of a Workspace [RM-388] (9686)
* dc1233685 chore: lengthen abbreviation to avoid ambiguity (9733)
* e3524b7d9 chore: add release notes for metrics fetching UI bug (9737)
* 719f8beb0 chore: update copy when f_flat_runs is on (9642)
* 6c4f69b4e test: workspace and project api [TESTENG-46] (9731)
* 4aa6ffa67 docs: Add release docs for continue trial, edit hp search, resource a… (9729)
* d46d77613 fix: use before/after search params for historic allocation CSV download endpoint [DET-10442] (9730)
* a32b0104d fix: show selections in ComparisonView on any page (ET-189) (9694)
* 7260f046c chore: default flat runs to on (9709)
* 202ab6212 fix: Endless fetching for cancelled experiment without metrics (9714)
* 4466c3322 feat: change search-experiments from GET to POST [ET-602] (9717)
* 787a2f377 docs: Fix workspace cli doc (9720)
* c3ca1d4b3 docs: Describe link to mldm data (9718)
v0.38.0-ee
Release Notes
[v0.38.0-ee](https://github.com/determined-ai/determined/blob/v0.38.0-ee/docs/release-notes.rst)
Changelog
* 715442484 chore: release notes 0.38.0 (10231)
* 13e49a7e3 [AUTO-BACKPORT release-0.38.0] 10226: chore: eliminate use of fury repo (10229)
* a554cd09e [AUTO-BACKPORT release-0.38.0] 10224: fix: make some k8s tests pass (10228)
* 0d373b162 [AUTO-BACKPORT release-0.38.0] 10221: fix: use new migration gist (10222)
* c93b84826 [AUTO-BACKPORT release-0.38.0] 10213: fix: port k8s perf fix (10220)
* 0cc57dfa7 chore: backport 10208 to release 0.38.0 (10219)
* 7d9c5ed71 [AUTO-BACKPORT release-0.38.0] 10216: fix: license check tests (10217)
* e2d8f4737 [AUTO-BACKPORT release-0.38.0] 10206: ci: remove datadog from ci (10214)
* 9619dcf1f [AUTO-BACKPORT release-0.38.0] 10211: chore: fix license check (10215)
* 332cefca4 [AUTO-BACKPORT release-0.38.0] 10207: fix: revert: fix: resolve indefinitely queued (STOPPING_COMPLETED) trials (10210)
* e693655b4 [AUTO-BACKPORT release-0.38.0] 10203: revert: log search (10205)
* 50b769048 chore: 0.38.0 environment images (10197)
* bb6f14057 [AUTO-BACKPORT 10160] fix: maxPoolSlotCapacity bug (10195)
* 7db183ef2 [AUTO-BACKPORT 10182] docs: docs changes for searcher context removal (10194)
* 23f97932c [AUTO-BACKPORT 10192] fix: keras continue from cloud checkpoint (10193)
* 508d400f0 [AUTO-BACKPORT 10174] docs: update docs for non-Trial-centric world (10186)
* 87f5ff853 [AUTO-BACKPORT 10188] fix: include max_length in continue expconf (10190)
* e72591837 [AUTO-BACKPORT 10183] docs: fix typos in the release note (10185)
* 23687dbaa [AUTO-BACKPORT 10178] docs: known issue of tb_plugin (10181)
* 5427a68be [AUTO-BACKPORT 10172] fix: ban archive columns in filter for experiment/search search (10176)
* 88c8887c0 [AUTO-BACKPORT 10173] fix: client.logout() re-enables client.login() (10177)
* 42f74e61b [AUTO-BACKPORT 10168] chore: ignore test_e2e_longrunning tests when merging auto-backports (10179)
* 020fc4369 [AUTO-BACKPORT 10161] fix: fix diffusion example [DET-10470] (10169)
* c69aa6888 [AUTO-BACKPORT 10140] fix: set max slots and checkpoint gc policy should comply with config policies (10167)
* b5e6315bb fix: set max slots and checkpoint gc policy should comply with config policies (10140)
* 8e6a65853 [AUTO-BACKPORT 10105] chore: change det deploy aws's default deployment type to simple-rds (10162)
* 6fc6710b4 [AUTO-BACKPORT 10153] docs: checkpoint storage note for config policies (10165)
* b366f80da [AUTO-BACKPORT 10138] feat: determined_master_host and friends helm support, better defaults (10159)
* d8afc5773 [AUTO-BACKPORT 10155] fix: fix iris example to use reported metric name (10156)
* 38ae54b67 [AUTO-BACKPORT 10149] fix: error message fix for duplicate model name (10154)
* 47ba6a934 build: INFENG-943: GoReleaser configure prerelease (10146)
* aad58c179 build: INFENG-942: Conditionally bypass build-react job checks (10145)
* d7f0bbfe3 chore: lock published urls to preserve redirects
* e3c31f0a1 Temporarily disable GitHub Actions credentials.
* 3be954b27 build: INFENG-938: Update version format in Makefiles (10142)
* 69b93b0d1 build: INFENG-940: Fix logic error in CircleCI config make-component job (10143)
* 00870f53c build: INFENG-937: Publish Helm chart release candidates (10141)
* 3910426dc feat: remove searcher context from harness and master [MD-498] (10131)
* 27bebdd49 build: INFENG-938: Tweak version string format (10139)
* 30ad3c078 feat: add master configurations for access token max and default lifespans [DET-10464] (10101)
* 782f7a09f revert: "chore: determined_master_host and friends helm support, better defaults" (10134)
* 233e095e0 chore: add checkpoint and max slots config policy enforcements in PATCH experiment (10125)
* b3f928bab chore: determined_master_host and friends helm support, better defaults (10092)
* 67554679b chore: bump Go version used by CI builds to 1.22.8 (10127)
* 834eeda6a feat: add actual select all to glide tables [ET-238] (10081)
* c7e0fb5e2 docs: add log signal release note and update docs (10126)
* 02fcc7402 test: Add test for filtering user by Role Id (10095)
* f97fb5a41 build: INFENG-933: add GitHub action to start a minor release (10112)
* 685918dad docs: Add aurora postgres release note (10115)
* a84f8c65b chore: SSO improvement feature requires Enterprise Edition. (10124)
* c71617c0a feat: Log Signal Exp Config and Monitoring (9947)
* 06b0b31b8 chore: fix merge exp flake (10122)
* 962810ab8 chore: improve messaging when workspace configs conflict with global … (10121)
* 6158ef7bb docs: Update postgres aurora info (10116)
* 4b0c0657e docs: log policies restore exp config (10120)
* 186962cd4 chore: add config policies to CLI reference docs (10118)
* 11ea6f46a chore: clarify version overrides during helm installs (10094)
* 4394f297e chore: standardize status api errors for task config policies (10119)
* e8343023b fix: Add on delete cascade to system_metrics (10113)
* 3c59233db chore: populate final merged config with defaults when merging invariant configs (10107)
* deb3772c7 feat: additional APIs to support "actual select all" functions [ET-238] (10102)
* fd9cd8a3f feat: Allow master configuration for ssh key type (10072)
* 5e9df7c43 docs: Update release notes (10114)
* c655f339a docs: fix internal link in multi-rm docs page. (10074)
* e7186fe00 docs: Update log policies (10098)
* 993296b11 fix: update copy in experiment and trial headers (10111)
* d74a462a4 docs: Describe sso improvements (10110)
* 24d3390e5 chore: conditionally create VolumeSnapshotClass (10103)
* f45ebb9e8 chore: improve documentation surrounding slot caps helm configuration (10090)
* 0013fd04e ci: shorten test_pending_hpc.py (10104)
* 22ad4572a fix: version upgrade notification bug [CM-411] (10069)
* 935fa664d fix: Log searche feedbacks (10088)
* 29a08ece7 Revert "docs: Describe arbitrary metadata logging" (10099)
* c6c476c18 chore: remove e2e_slurm_preemption test series (10053)
* e6182edc0 docs: Describe arbitrary metadata logging (10073)
* 539df5e8b chore: update CLI commands to work with global APIs (10089)
* 1f2bea0b5 feat: update ConfigPolicies with docs link [CM-558] (10055)
* 4afc15f5c build: INFENG-926: Fix version.sh version string output (10085)
* 04861ddeb chore: return error if workspace config violates global constraints (10076)
* 912f91ed7 docs: task config policies release note (10087)
* 6d5610154 fix: remove flake-inducing logretention global singleton (10016)
* b70a622e2 fix: correct token creation CLI to ensure it works with default expiry (10084)
* b15533237 docs: Describe task config policies (9969)
* 27a014b44 fix: Tensorboard broken on unified install [CM-578] (10080)
* bdb56a40f chore: INFENG-922: use correct gh_team tag for infrastructure (10077)
* 91e358aa7 INFENG-382: Release redesign (10002)
* 34e47490f chore: remove redundant rm.ExternalPreemptionPending interface (10071)
* 28bc072b8 feat: SSO Improvement - alter `user_sessions` table to include access token, implement CRUD ops, GET, POST, PATCH APIs and `det token` CLIs (9867)
* 472baf9bc feat: Add copy task id to task list (10058)
* 2e822b7f5 chore: fix update invariant config and constraints (10078)
* d69f7cc28 chore(deps): bump google.golang.org/grpc from 1.64.0 to 1.64.1 (9910)
* e796b921e fix: run checkpoint GC more aggressively to ensure tensorboards are GC'd (10017)
* a14525f14 fix: nil deref in usage of incomplete experiment config policies (10068)
* 6c46a465a refactor: remove annotations requiring search ids in bulk action js (ET-241) (10062)
* 3ca3418c4 Docs: describe data files apptainer (10020)
* 315f65d64 chore: ntsc config not supported (10056)
* 2e8de9bfe test: User Management test updates [CM-468] (10051)
* 3fc9fed86 chore: experiment config slots to comply with constraint max slots (10054)
* 1d5c984b7 chore: fix slices and maps merge test (10063)
* 219409b96 chore: fix helptext for det user (10060)
* 7d6a1a77f docs: add k8s RP example to the helm `values.yaml`. (10027)
* 9efd96df0 fix: apply config policy constraints to PATCH /experiments/:id (10048)
* dd6aedadf chore: change error code back (10042)
* 5a39ecb6e chore: check config policies on 'det notebook set priority' (10047)
* 2ef2f122f feat: bulk actions matching filters (ET-241) (9895)
* ac82b3ca8 chore: default priority earlier to ensure constraints are satisfied [CM-553] (10043)
* 34557ef98 feat: Extend LogViewer to support scrollable search (10005)
* dadf75ead chore: take invariant_config priority into account with manage job workflow (10025)
* 2356f9153 chore: remove e2e_slurm_misconfigured series tests (10023)
* b243c269f ci: deflake test_disable_agent_zero_slots (10040)
* 4e0f1c4f2 chore: validate global, admin input against task config policies & constraints (10028)
* 3c1630f3c test: add e2e tests to the "move project" functionality on the "List View" (10037)
* 0613cc646 docs: revise postgres permission setup instructions. (10039)
* 2594d9042 chore: remove e2e_slurm_gpu series tests (10021)
* 1f7ccad13 chore: exp invariant config silent override during add or update (10019)
* 30b197d59 feat: Global Config Policies UI [CM-522] (10022)
* c27054deb feat: add e2e tests for multi-sort filter on experiments lista (9992)
* 9faa0cbe2 chore: wait_for_task_state shows logs on failure (10029)
* a16682611 fix: Workspace Projects and Tasks test flakes [CM-554] (10026)
* 33dfdafea test: Workspace Models tests [CM-538] (9998)
* 7e8dbac5b fix: Update action bar row layout in UserManagement page (9862)
* 5b1380cf6 chore: check experiment constraints (10018)
* f609a2d06 fix: remove `formatDatetime` (10011)
* 9b6f0ac4c docs: Update release notes date (9999)
* f5400eadc feat: Add regex search to task logs API (9994)
* ddca76682 fix: correct expToWebhookConfig cache locking (10014)
* 80b29fa1c feat: Config Policies UI, Workspaces Experiments [CM-521] (10009)
* 262b4a9b1 chore: check task config policies against slots and max_slots (10015)
* a0cc81827 ci: replace no_op fixture with a noop api (9997)
* 987b2a508 test: add e2e experiment list pagination test (9993)
* 129789958 fix: use UID not username to set HOME dir (10010)
* 49e72a89c chore: reword jsonschema extension docs (9965)
* 63d728cf0 fix: display archived column for runs and searches (9987)
* 83a779eba feat: check task config policy constraints before scheduling NTSC wor… (9991)
* 0083d7e8d feat: add CLI commands for config policies [CM-423] (9911)
* ac54cf85f ci: delete pointless test (10004)
* 7f88390d4 fix: reset settings not working properly due to url encoding (10000)
* 25ca6d057 fix: import missing time module (9985)
* 8ab2145e1 chore: bump version: 0.37.0-dev0 -> 0.37.1-dev0
* 0760f7436 chore: add docs dropdown link for new version
* 23f1f30a2 docs: add release notes for 0.37.0 (9995)
* 99894756a test: Workspace Task tests [CM-476] (9982)
* ad66d3f47 chore: implement PUT APIs for task config policies (9983)
* 036336b0e docs: fix broken links (9996)
* ac8fbf6d8 chore: check task config policy priority limit for [CM-490] (9958)
* 8bc08e5f8 feat: Read and display log signal from DB (9959)
* c8b1910e2 ci: increase datagrid rightclick timeout/ reduce worker count (9951)
* e92c47460 fix: fix default id search for runs (9988)
* 3ca3d3089 test: increase Reactivate test step timeout (9986)
* bc3b2a6cd fix: Reactivate User test flake (9979)
* f2277f192 fix: fix hf on_save raise exception (9977)
* dbeea9984 fix: Cluster page height (9975)
* d02495b69 fix: Deactivate User test flake (9974)
* a8effe8a9 fix: show search progress in run table (9976)
* cf9bdc8e3 feat: workspace task config policies UI [CM-478] (9950)
* 924f66375 ci: remove default arg from utils.run_command() (9973)
* a96c5afa4 docs: add docstring for PyTorchContext.current_train_epoch (9972)
* 66f7a70b5 fix: grid hp samping ignored empty nests (9966)
* 8c4f7a0ad fix: correct `dataPath` for hyperparameters (9971)
* 5c4be96bd feat: add database snapshot functionality to Helm chart (9956)
* 31d9573cc fix: show `-` for empty data in searches table [ET-749] (9963)
* f758303ad chore: lock published urls to preserve redirects
* 2a8e7ddca chore: lock api state for backward compatibility check
* 3f54d073b chore: bump version: 0.36.1-dev0 -> 0.37.0-dev0
* baf451f20 chore: do not log error for resource pools with zero agents (9960)
* 6a8606e63 docs: Add hpc installation guide (9945)
* 3241edb1d fix: fix flaky generic task pause test (9962)
* 43556e99b fix: Remove CSS rule for hiding the Form.Item error message (9872)
* 590600172 perf: improve the initial page load speed (9939)
* eb1b0de39 docs: Add workload alerting (9938)
* cedfcfe01 chore: refactor and test RBAC config policies work [CM-530] (9943)
* 2d884b9ce docs: Add cluster overview (9936)
* e17d12c4a feat: release notes and improvements for workload alerting (9944)
* 0db2e3bbd ci: deflake make slurmcluster, hopefully (9957)
* 95f079d4e feat: add GET global config policies API (9952)
* d943d852e chore: fix global PUT for task config policies (9941)
* 410edf6a8 fix: broken MNIST download in e2e tests (9937)
* 004c194fe ci: fix flaky test_allocation_csv tests (9953)
* 88a4c679b feat: add Config Policies GET API and modify CRUD functions to accept both Workload types (9946)
* a73c8db9a test: debug auth [TESTENG-95] (9942)
* 13db674b5 test: experiment list show archived filter [ET-753] (9932)
* 02e302fc8 chore: remove unused languages from code editor (9898)
* f6d874da1 docs: Replace slack links (9919)
* 26b0954dc chore: implement Delete config policies API handlers (9927)
* 2d12be1b8 test: add projects tests [CM-467] (9928)
* 062cb52a0 fix: use different modules for Trial and Cluster topology (9917)
* 092895818 chore: change log level for log retention policies (9935)
* b559467f6 chore: bump coverage target (9920)
* 3a2ea5629 fix: do not filter slots for mixed-slot-type pools (9902)
* a58ed7c3d chore: reassign RM code to CM in CODEOWNERS (9926)
* cb3515e02 fix: update LogRetentionDays from master config when master starts/upgrades (9930)
* 13b7b3f02 ci: increase timeout for k8s intg tests (9929)
* 6f36969c7 fix: flaky workspace test (9931)
* 867eb3162 fix: update huggingface example (9925)
* 5b2275fe0 fix: Refactor sorting logic in WorkspaceProjects for filtering projects (9903)
* fd7f77abf fix: move validation dataloader check in PyTorchTrial [MD-515] (9923)
* db2881f31 chore: fix config policy unmarshal tests (9924)
* 3900742d4 chore: update test log pattern webhook cache (9922)
* f44687dd7 chore: create config policies table and add NTSC CRUD operations (9915)
* de89f6891 feat: support updating web hook url [MD-482] (9890)
* 02fbdbbe5 fix: huggingface callback raise process preempted exception (9913)
* 8c799b84e chore: prune cruft out of no_op fixture (9912)
* 11de11984 chore(deps): bump path-to-regexp and express in /webui/react (9909)
* 03961b50e test: add workspace tests (9905)
* c877383bb fix: GetTrialRemainingLogRetentionDays should take global log retention days into account [CM-518] (9914)
* fb0d5f910 fix: change workspace name and set resource quota simultaneously (9847)
* 8fb9f6b65 docs: Update ROCM support (9893)
* 481bddb04 chore(deps): bump github.com/docker/docker from 24.0.9+incompatible to 25.0.6+incompatible (9780)
* c1499ac3b chore: removing model_hub references from Makefile (9901)
* c961dbd8f feat: new run object for Run Centric API (9897)
* bfeb418f1 feat: Implement custom trigger for webhooks (9879)
* b6eb05e33 chore: Remove model hub (9869)
* 4a28c10f2 chore: add unmarshal functions for task config policies (9896)
* d842383e5 fix: timezone handling error in queued allocation time update (9892)
* 55b3f9b6b test: cover project id filtering on bulk actions [ET-138] (9870)
* 036477be3 chore: stub new APIs for task config policies [CM-485] (9880)
* be2622a51 test: Delete workspace after webhook test (9891)
* a30bc2562 feat: Add rbac for config policies (9873)
* 8c83d311f chore: create WorkloadType enum and Go config + constraints structs (9885)
* 0a18c5ae6 fix: add backwards compatibility for Pods to Jobs for k8s <v1.27 [CM-461] (9878)
* 8e6bba818 ci: fix master-config syntax (9889)
* d5d647ae8 fix: inconsistent timezone handling in daily allocation aggregation (9888)
* b4209efbc test: login redirect with nested route (9881)
* 8cacba635 ci: add e2e bulk kill test (9868)
* 590c3625b fix: Hf callback metric naming (9887)
* 61fd26b9a fix: reset Model Registry page number on pageload [ET-640] (9876)
* ce27f81cd fix: show `-` for empty data in run table (9871)
* b1c08145b fix: prevent `hyperparameter search modal` submitting the same request multiple times (9883)
* d54713c3d fix: use new ruamel yaml APIs (9882)
* ad5fe5a40 fix: prevent out of bounds navigation on new list views (9875)
* a605f006c fix: reject reconnecting agents with different resource pool configuration (9815)
* db92bad1e feat: Support RBAC in webhook (9859)
* 0ef81aa2f fix: sorting by arbitrary metadata (9874)
* c1b776778 feat: Auto-Populate POSIX Information on sign in using SSO [CM-399] (9755)
* 54b61653f feat: Logic of different modes for webhook (9865)
* a77355127 fix: allow for objects inside array metadata to be typed properly (9864)
* ee269c896 test: successful login with weak or strong password (9858)
* e21fc6f8c ci: pin chromadb version to avoid incompatibility (9849)
* a1234a12c chore: bump version: 0.36.0-dev0 -> 0.36.1-dev0
* d79c90dc5 chore: add docs dropdown link for new version
* ce6da7409 docs: add release notes for 0.36.0 (9854)
* a55af7418 fix: use task sessions in Core API [MD-509] (9860)
* 3ee88bb00 fix: replace tree with code mirror for metadata view (9853)
* 8dd46d5bc chore: Improve CompareTrials perfomance (9807)
* 6e0830394 fix: fix error toast popping up in Workpace Creator view (9855)
* fb95df8c5 chore: add backport github action (9835)
* a37e6e720 fix: prevent loading issues with ipynb files (9850)
* 9de4f72b7 feat: configurable preemption timeout [MD-500] (9833)
* 640126ba7 feat: Add workspaceId, mode, name to webhook (9820)
* d436c2373 fix: reset pinned column state when resetting columns (9852)
* 3a91552ac fix: fix fallback logic for partially provided custom logos (9842)
* 707ad0772 Revert "chore: add tracing info to some backend APIs" (9843)
* 73a756adf fix: update broken tensorflow & certbot links (9846)
* 771bbe4d1 ci: sequential metric count sweep test [Scale-35] (9791)
* 32fafdd89 perf: remove duplicate ids in `ExpMetricNames` api (9848)
* a8fa0155f docs: Fix broken links (9845)
* 2b1856a83 fix: model version name overflow on mobile [ET-384] (9827)
* e13de2042 docs: Document rbac editorprojectrestricted role (9844)
* 2838af41c chore: add tracing info to some backend APIs (9841)
* e3dfb0a70 fix: change filter form to say "Show runs" in flat runs view [ET-740] (9840)
* 52f2b9ff2 chore: add release notes for PR 9822 (9837)
* a37d48216 fix: experiment single trial tabs don't scroll on load (9831)
* aff486c14 feat: Rocm bumpenvs (9830)
* 13622adc2 feat: Add `report_progress` to `TrainContext` (9826)
* d8314611b fix: replace rawsource attribute with node directly, due to removal of rawsource in Docutil 2.0 (9838)
* 7ed9e8309 feat: add EOL notice regarding Aurora V1 & Postgres 12 along with Master Log warnings for Postgres <=12 [CM-413] [CM-416] (9832)
* 5c5f107dd docs: Minor docs enhancements (9836)
* e11629be5 chore: lock published urls to preserve redirects
* 6e0b9d1d3 chore: lock api state for backward compatibility check
* e1a227382 chore: bump version: 0.35.1-dev0 -> 0.36.0-dev0
* 42c2efae4 docs: Docs cleanup (9834)
* 3ed0a3973 docs: Make docs consistent with run centric ux (9824)
* a367cd0f0 chore: deprecate Custom Searcher [MD-504] (9829)
* f7846cb9b feat: allow users with role Viewer and above to view resource quotas (9822)
* 97353c95a fix: Group and User management (CM-436) (9825)
* 358ed28a4 fix: hide metadata section if there's no metadata (9823)
* 287f3be36 chore: unskip flaky test (9819)
* e85ac893a Clarify basic data lineage to mldm (9828)
* c0ca6590b fix: checkpoint table action menu shouldn't vanish on polling [ET-277] (9812)
* 740b0e748 docs: Describe basic lineage steps (9813)
* e5d4b7f43 chore: initial k8s rocm support [CM-367] (9794)
* 9548790e7 chore: fix torch version to 2.2.2 for intel mac (9821)
* b2a82e896 chore: deprecate kubernetes priority w/ preemption scheduler (9763)
* 2002bf02d docs: Getting a list of files in a checkpoint (9818)
* 91d0b6779 docs: Fix broken links (9816)
* e3578490b fix: don't ignore failures during experiment shutdown (9693)
* 9b9641631 test: add go unit tests for experiment bulk actions [ET-138] (9658)
* 92a7ff5b2 feat: support filter by metadata with string type (9810)
* 9da5620ed feat: exclude `Array` type columns (9808)
* 79ffa5255 chore: bump version: 0.35.0-dev0 -> 0.35.1-dev0
* 9949ab0b6 chore: add docs dropdown link for new version
* 261e2e780 docs: add release notes for 0.35.0 (9786)
* a11e9e83b chore(deps): bump torch from 1.11.0 to 2.3.0 (9726)
* bebaf17b7 fix: make navigation sidebar scrollable [ET-633] (9803)
* f7e18fc74 fix: prevent multiple calls to time-series on compare view select (9805)
* db98c4fd9 ci: Add a portable testing framework and scalability tests [SCALE-29] (9762)
* 9702d2283 fix: prevent extra initial calls to search endpoints (9782)
* 4e47a1ee9 chore: change the comment for defaultNamespace in values.yaml (9793)
* d3f3e76e5 test: datagrid action pause flake (9802)
* 1f7473c85 fix: return proper error message when moving a project with a matching names (9795)
* 15d1a6085 ci: fix scripting for `make slurmcluster` job (9801)
* 8173cabb2 fix: forked from link (9798)
* c3400dff0 feat: add editor project restricted role and testing [DET-10428] (9796)
* 2cb102271 test: base model package dependency update [TESTENG-59] (9777)
* 4f319422f test: omnibar tree-extension tests [ET-203] (9783)
* cdbbedda0 fix: don't filter single runs in the comparison view (9789)
* 80822ebe4 ci: label `make slurmcluster` instances for cloud spend [CM-405] (9792)
* ea589d860 chore: fix readme typo (9797)
* 7b4f01cbd fix: Add loading indicator when creating HP search (9774)
* a4d74af3b chore: readme should include codecov (9787)
* 786f25896 fix: uncomment helm values (9790)
* a0349640d fix: fixed helm chart values and master-config.yaml (9788)
* fe14062e0 feat: show metadata in run table (9776)
* 2b589c434 feat: add array column type for abitrary metadata (9759)
* 094c58be8 test: skip flaky test (9784)
* 49c3fa081 chore: add a utility for connecting devcluster to remote k8s clusters (9739)
* 13ebf47d8 chore: add Cluster Name title and change helm value (9775)
* 61aad7838 fix: fix contains filter for hyperparameters and metadata (9779)
* 15226b756 feat: add master config option to provide custom logo (9664)
* f42daca26 feat: make groups scope optional to support azure with OIDC (9773)
* 6105b3f24 docs: fix insecure link to systemd docs (9772)
* 068b9595a feat: checkpoint view for flat runs [ET-658] (9769)
* dab697886 feat: add code tab to run page [ET-657] (9771)
* 2c9109896 test: use previously created experiment for pause test (9727)
* 935799d2f fix: use run checkpoint data instead of experiment for run table filter (9767)
* 30d6e7902 fix: extract searcher metric from experiment payload (9768)
* b8c677370 fix: fix missing task_stats start_time on restored allocation (9745)
* a094ea1b3 chore: pin numpy version and upgrade sphinx [MD-468] (9736)
* 08065978b feat: add Metadata section to TrialDetailsOverview (ET-224) (9639)
* 287faf78c chore: bumpenv pin numpy to 1.x [MD-470] (9748)
* becd8b6ae chore: remove RM Name from RP descriptions (9758)
* fc8ac0baa chore: undo test skip after fix was merged (9754)
* de898c9dd Revert "chore: add configurable posix claims fields to master config [RM-398]" (9753)
* 623c945b3 fix: load trial data for single run searches in search view 9742 (9752)
* 41a512e10 fix: debounce searches column width settings 9700 (9751)
* bc721bf53 refactor: change 'close' to 'save' on button in ManageJob modal [DET-10446] (9750)
* 0ce2ff149 fix: change external_run_id to string type in FlatRun proto (9749)
* 20ed1268d fix: reduce the number of api calls from Workspace Create Modal (9735)
* 61bc7bbf5 chore: add configurable posix claims fields to master config [RM-398] (9690)
* 2cdfdf9f4 fix: change external_run_id to string type in FlatRun proto (9744)
* 36aaed77a chore: fine-tune error and help messages of CLI commands for slot caps (9743)
* 0df7ad346 test: workspace and project tests [TESTENG-60] (9740)
* e00d9f4b2 chore: add release note for ComparisonView bugfix (9741)
* 35ec0773e chore: add 'masterService.annotations' to Helm (9697)
* 5f8dae383 chore: fix exp delete log msg (9716)
* 9dc0afada fix: deadlock issue (9728)
* f8067ba21 chore: skip failing Deactivate and reactivate user test (9723)
* 9efb2162c feat: CLI command to list the members of a Workspace [RM-388] (9686)
* dc1233685 chore: lengthen abbreviation to avoid ambiguity (9733)
* e3524b7d9 chore: add release notes for metrics fetching UI bug (9737)
* 719f8beb0 chore: update copy when f_flat_runs is on (9642)
* 6c4f69b4e test: workspace and project api [TESTENG-46] (9731)
* 4aa6ffa67 docs: Add release docs for continue trial, edit hp search, resource a… (9729)
* d46d77613 fix: use before/after search params for historic allocation CSV download endpoint [DET-10442] (9730)
* a32b0104d fix: show selections in ComparisonView on any page (ET-189) (9694)
* 7260f046c chore: default flat runs to on (9709)
* 202ab6212 fix: Endless fetching for cancelled experiment without metrics (9714)
* 4466c3322 feat: change search-experiments from GET to POST [ET-602] (9717)
* 787a2f377 docs: Fix workspace cli doc (9720)
* c3ca1d4b3 docs: Describe link to mldm data (9718)