Changelog
* d214a34df chore: bump version: 0.18.2-rc2 -> 0.18.2
* 10b172bcc docs: add release notes for 0.18.2 (4318)
* 38e0efb15 chore: bump version: 0.18.2-rc1 -> 0.18.2-rc2
* c5cb23fc9 fix: harness azure dependencies (4319)
* f623c1c1e chore: bump version: 0.18.2-rc0 -> 0.18.2-rc1
* 350de3838 chore: cleanup container terminations resent on agent exit (4309) [DET-7533]
* f02f11bc5 fix: skip allocation check on checkpoint save to fix pulling preemption (4308)
* cfc0f8108 fix: remove limit of length 4 for task id (4304)
* 0645ba968 fix: harness dependency class fore azure (4302)
* 9991528a2 chore: bump version: 0.18.2-dev0 -> 0.18.2-rc0
* 86b6c28a4 chore: lock api state for backward compatibility check
* 271ae41c7 chore: ensure container terminations aren't lost in the event of network failures [DET-7440, DET-7441] (4272)
* 2d5a11865 fix: check for errors from actor asks in createResourcePoolSummary [DET-7494] (4294)
* daee03661 fix: stale agent state crashes resource pool [DET-7493] (4295)
* 4722caa95 fix: limit waiting for logger to 30 seconds (4289)
* b8f9463c6 fix: jupyterlab modal stale values, not sending values to api
* 42615b4b1 feat: add cli and api for delete checkpoints [DET-7119], [DET-7120] (4246)
* ad9bcd809 chore: restore determined_version in Trial checkpoint metadata (4288)
* 0b7dcdbd2 chore: widen node version detection to 16.29 (4290)
* 118a850da fix: pass signals through wrapper processes (4286)
* 0089ede8c fix: enable Full Configuration in appropriate modals [DET-7495] (4280)
* 45c62f20c feat: show only selected trials in learning curve
* 922b11349 chore: reduce use of any in InteractiveTable
* 5b03672a4 chore: update docs to accurate node support (4279)
* 59f686c96 fix: make trials table sort properly (4275)
* 695b30117 feat: incrementally release resources (4278)
* c0f53fa26 perf: avoid Seq Scanning raw steps (4244)
* b36596f13 fix: backslashes in show_ssh_command for windows (4260)
* ec503e06d fix: correct error message when command list fails (4235)
* 15b0525fc feat: add master-side verification for agent mTLS (4220)
* c6e824937 fix: don't hardcode /bin/which in entrypoints (4257)
* 236c24117 refactor: migrating tables to use InteractiveTable component [DET-7382] (4229)
* 156ff504b fix: reshow drop targets for customize columns modal (4199)
* 5e9962686 docs: delete mnist_tf_layers example (4263)
* 9d8223113 Revert "add autosync action"
* 8288461f5 add autosync action
* 70e534f9e fix: det task logs unable to use trial task IDs and checkpoint GC task IDs [DET-7424] (4258)
* 12d21c2e0 fix: gcs storage upstream test failure (4253)
* b09ec407e feat: improve trial logs to have some system events [DET-5885] (4215)
* c20ad4334 ci: Lints migrations to ensure new migrations have higher timestamps than old ones [DET-7146] (4250)
* da83e6509 chore: small cleanups for slurm (4211)
* b71f179aa feat: det deploy now can use --yes to skip prompts [DET-7408] (4255)
* 7b3fee44c chore: better slurm option override support (4254)
* 7af3cfe48 feat: add google cloud storage (gcs) prefix support [DET-6883] (4238)
* b5c2b14f8 docs: add enhanced launcher user guide (4248)
* eb1d9aaac chore: add dev server support for embedded tasks view (4243)
* 9d19a8704 perf: dont repeatedly reprocess profiler data
* b6582c02b ci: add check to prevent ssh git url (4240)
* 50bdbc3a8 chore: bumpenvs (4239)
* 1065314ea chore: add node_modules to eslintignore (4237)
* c74dee646 feat: Add theme toggle to user settings [DET-7321] (4204)
* 3016fc103 chore: remove legacy code/docs for NCCL/Gloo port range config (4187)
* f1e8d3cdc ci: check ulimit before 4x4 distributed test on macs (4234)
* 7d9e999cd fix: adjust page to preserve props.children (4231)
* 35258efd7 feat: Enable sending empty string for displayname with fallback to username [DET-7031] (4140)
* 2e77ec58c fix: det shell start/open, in windows (4227)
* 7217cfd94 feat: k8s detect non-det tasks (4154)
* a88f5c450 chore: share webui base page (4218)
* 15bd75895 fix: use custom image for tensor board [DET-7242] (4123)
* de926b1eb chore: fix rendezvous timeout logic (4226)
* c29e97a45 chore: base Dockerfile TensorFlow 2.6, 2.7, 2.8 security patches [DET-7325] (4223)
* 7882381cd fix: authenticate pprof endpoints [DET-7402]
* 2b0ccb830 chore: bump version: 0.18.1-dev0 -> 0.18.2-dev0
* bcbab4d92 docs: add release notes for 0.18.1 (4216)
* 034b9570c chore: revert rename of `RestoreResourcesFailure -> ResourcesFailure`. (4210)
* a7c4c2afe feat: enable agent-side mTLS for connection to master (4212)
* c9e13b64a feat: save connection in context (4213)
* 15f65ab37 feat: pix2pix example (4125)
* db987de7d chore: delete "conditional" json-schema extension (4177)
* 9b33f825c fix: use bigint for checkpoint size in `proto_get_trials_plus` (4208)
* 2c4c847d7 docs: update release note instructions with important admonition (4207)
* 138caf8f3 fix: pool detail page tab count when loading (4200)
* c5f685f18 feat: move task logs to embedded view [DET-7169] (4179)
* 54982c9be perf: tweak proto_get_trials_plus plan (4206)
* e585ebbb1 refactor: cleanup task logging shell scripts (4113)
* 4e2913f4a chore: update entrypoint in expconf docs (4198)
* fcff1c2aa fix: agent panic on commands with unusal formatted environment variables [DET-6649] (4202)
* 4172a4690 refactor: pull in user service code changes from EE (4183)
* 1c48fa618 docs: improve OpenTelemetry docs slightly (4182)
* 7f508e462 fix: allow `internal: null` for pre-0.15.6 experiments (4197)
* 55957fe06 fix: add restarts back to get_trial_ids for sorting
* dd8d3f377 feat: add det experiment logs <EXP_ID> [DET-7145] (4190)
* 3ef36d56b chore: refactor action dropdown comp to be reused [DET-7171] (4164)
* 623e60d37 ci: bust circleci cache (4189)
* af56e0103 docs: document using AWS Load Balancer on EKS [DET-6669] (4174)
* 36e56670d feat: allow enabling Prometheus monitoring through helm [DET-6993] (4158)
* 3bb7bb1d9 style: minor theme fixes and style adjustments [DET-7349] (4161)
* 2166dfc88 docs: update screen shots for cluster UI (4188)
* b7a327819 style: address new `flake8-comprehensions`, `pyzmq==23.0.0`. (4185)
* 5e1a81cb2 feat: allow setting of checkpointStorage.prefix through helm [DET-7152] (4152)
* 526e1dcc1 feat: display trial restarts [DET-7347] (4160)
* 2fdc6d710 fix: agent can now be control-C while connecting to master [DET-6287] (4178)
* d339f7be2 chore: migrate det a list to new api and bindings (4186)
* ed257d38e refactor: rip out `UseFluentLogging`. (4184)
* 21b85907b docs: update fluent-bit version. (4181)
* 4b325a558 docs: document database SSL options (4169)
* 5d228f8a5 chore: make core-api tutorial Windows-friendly (4176)
* 187505148 fix: sync slot usage for k8s [DET-7350] (4172)
* 92a944f96 chore: add .dccache to .gitignore (4173)
* 5e7a30c43 docs: fix typo in release note (4170)
* a52210f9e feat: chart sync provider [DET-7309] (4139)
* a7fafbb9a fix: enable currently active side nav item (4167)
* 0a5d54d45 chore: fix hardcoded url in schema logic (4171)
* 7df89b293 chore: allow deleting delete failed experiments (4141) [DET-7070]
* 4ef2b6771 perf: fixup query for latest training per trial (4166) [DET-7352]
* 3d3fe1cd0 fix: include both old and new checkpoints in total checkpoint size (4165)
* 4d98cee3d fix: replace carriage returns with newlines in task output [DET-5302] (3945)
* a2f878aaf chore: only warn on invalid calls to daemonize resources for slurm (4108)
* c00ce0a4c chore: check git state in lock-api-state.sh (4163)
* ec743d754 ci: turn off github annotations. (4146)
* 0efd44d08 doc: fix a broken file reference (4131)
* b911d85ed fix: avoid potential race between AllocationReady and Running state (4159)
* cc59985dc revert: partial revert of 96e0e584 (4162)
* f07633b4e fix: port collisions for multiple shared-non distributed jobs (4120) [HAL-2894]
* 80d1bb152 feat: Add embedded experience for JupyterLab and TensorBoard [DET-7162] (4134)
* 83cc1ec8f fix: prevent experiment name in header from flowing entire vertical space of screen during resize (4157)
Docker images
- `docker pull determinedai/determined-master:0.18.2`
- `docker pull determinedai/determined-master:d214a34df`
- `docker pull determinedai/determined-master:d214a34df0c0eb2e5e38ae63d1359862fd2af8f1`
- `docker pull determinedai/determined-dev:determined-master-d214a34df`
- `docker pull determinedai/determined-dev:determined-master-d214a34df0c0eb2e5e38ae63d1359862fd2af8f1`
- `docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.18.2`
- `docker pull nvcr.io/isv-ngc-partner/determined/determined-master:d214a34df`
- `docker pull nvcr.io/isv-ngc-partner/determined/determined-master:d214a34df0c0eb2e5e38ae63d1359862fd2af8f1`