Torchx

Latest version: v0.6.0

Safety actively analyzes 625891 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.6.0

* Breaking changes
* Drop support for python 3.7.
* Upgrade docker base image python version to 2.0

* `torchx.schedulers`
* Add support for options in create_schedulers factory method that allows scheduler configuration in runner
* Kubernetes MCAD Scheduler
* Add support for retrying
* Test, formatting and documentation updates
* AWS Batch Scheduler
* Fix logging rank attribution
* Ray Scheduler
* Add ability to programmatically define ray job client

* `torchx.tracker`
* Fix adding artifacts to MLFlowTracker by multiple ranks

* `torchx.components`
* dst.ddp
* Add ability to specify rendezvous backend and use c10d as a default mechanism
* Add node_rank parameter value for static rank setup

* `torchx.runner`
* Resolve run_opts when passing to `torchx.workpsace` and for dry-run to correctly populate the values

* `torchx.runner.events`
* Add support to log CPU and wall times

* `torchx.cli`
* Wait for app to start when logging

* `torchx.specs`
* Role.resources uses default_factory method to initialize its value

0.5.0

* Milestone: https://github.com/pytorch/torchx/milestone/7

* `torchx.schedulers`
* Kubernetes MCAD Scheduler (Prototype)
* Newly added integration for easily scheduling jobs on Multi-Cluster-Application-Dispatcher (MCAD).
* Features include:
* scheduling different types of components including DDP components
* scheduling on different compute resources (CPU, GPU)
* support for docker workspace
* support for bind, volume and device mounts
* getting logs for jobs
* describing, listing and cancelling jobs
* can be used with a secondary scheduler on Kubernetes
* AWS Batch
* Add privileged option to enable running containers on EFA enabled instances with elevated networking permissions

* `torchx.tracker`
* MLflow backend (Prototype)
* New support for MLFlow backend for torchx tracker
* Add ability for fsspec tracker to read nested kwargs
* Support for tracking apps not launched by torchx
* Load tracker config from .torchxconfig

* `torchx.components`
* Add dist.spmd component to support Single-Process-Multiple-Data style applications

* `torchx.workspace`
* Add ability to access image and workspace path from Dockerfile while building docker workspace

* Usability imporvements
* Fix entrypoint loading to deal with deferred loading of modules to enable component registration to work properly

* Changes to ease maintenance
* Add ability to run integration tests for AWS Batch, Slurm, and Kubernetes, instead of running in a remote dedicated clusters. This makes the environment reproducible, reduces maintenance, and makes it easier for more users to contribute.

* Additional changes
* Bug fixes: Make it possible to launch jobs with more than 5 nodes on AWS Batch

0.4.0

* Milestone: https://github.com/pytorch/torchx/milestone/6

* `torchx.schedulers`
* GCP Batch (Prototype)
* Newly added integration for easily scheduling jobs on GCP Batch.
* Features include:
* scheduling different types of components including DDP components
* scheduling on different compute resources (CPU, GPU)
* describing jobs including getting job status
* getting logs for jobs
* listing jobs
* cancelling jobs
* AWS Batch
* Listing jobs now returns just jobs launched on AWS Batch by TorchX and uses pagination to enable listing all jobs in all queues.
* Named resources now account for ECS and EC2 memtax, and suggests closest match when resource is not found.
* Named resources expanded to include all instance types for g4d, g5, p4d, p3 and trn1.

* `torchx.workspace`
* Improve docker push logging to prevent log spamming when pushing for the first time

* Additional Changes
* Remove classyvision from examples since it's no longer supported in OSS. Uses torchvision/torch dataset APIs instead of ClassyDataset.

0.3.0

* Milestone: https://github.com/pytorch/torchx/milestone/5

* `torchx.schedulers`
* List API (Prototype)
* New list API to list jobs and their statuses for all schedulers which removes the need to use secondary tools to list jobs
* AWS Batch (promoted to Beta)
* Get logs for running jobs
* Added configs for job priorities and queue policies
* Easily access job UI via ui_url
* Ray
* Add elasticity to jobs launched on ray cluster to automatically scale jobs up as resources become available
* Kubernetes
* Add elasticity to jobs launched on Kubernetes
* LSF Scheduler (Prototype)
* Newly added support for scheduling on IBM Spectrum LSF scheduler
* Local Scheduler
* Better formatting when using pdb

* `torchx.tracker` (Prototype)
* TorchX Tracker is a new lightweight experiment and artifact tracking tool
* Add tracker API that can track any inputs and outputs to your model in any infrastructure
* FSSpec based Torchx tracking implementation and sample app

* `torchx.runner`
* Allow overriding TORCHX_IMAGE via entrypoints
* Capture the image used when logging schedule calls

* `torchx.components`
* Add debug flag to dist component

* `torchx.cli`
* New list feature also available as a subcommand to list jobs and their statuses on a given scheduler
* New tracker feature also available as a subcommand to track experiments and artifacts
* Defer loading schedulers until used

* `torchx.workspace`
* Preserve Unix file mode when patching files into docker image.

* Docs
* Add airflow example

* Additional changes
* Bug fixes for Python 3.10 support

0.2.0

* Milestone: https://github.com/pytorch/torchx/milestone/4

* `torchx.schedulers`
* DeviceMounts
* New mount type 'DeviceMount' that allows mounting a host device into a container in the supported schedulers (Docker, AWS Batch, K8). Custom accelerators and network devices such as Infiniband or Amazon EFA are now supported.
* Slurm
* Scheduler integration now supports "max_retries" the same way that our other schedulers do. This only handles whole job level retries and doesn't support per replica retries.
* Autodetects "nomem" setting by using `sinfo` to get the "Memory" setting for the specified partition
* More robust slurmint script
* Kubernetes
* Support for k8s device plugins/resource limits
* Added "devices" list of (str, int) tuples to role/resource
* Added devices.py to map from named devices to DeviceMounts
* Added logic in kubernetes_scheduler to add devices from resource to resource limits
* Added logic in aws_batch_scheduler and docker_scheduler to add DeviceMounts for any devices from resource
* Added "priority_class" argument to kubernetes scheduler to set the priorityClassName of the volcano job.
* Ray
* fixes for distributed training, now supported in Beta

* `torchx.specs`
* Moved factory/builder methods from datastruct specific "specs.api" to "specs.factory" module

* `torchx.runner`
* Renamed "stop" method to "cancel" for consistency. `Runner.stop` is now deprecated
* Added warning message when "name" parameter is specified. It is used as part of Session name, which is deprecated so makes "name" obsolete.
* New env variable TORCHXCONFIG for specified config

* `torchx.components`
* Removed "base" + "torch_dist_role" since users should prefer to use the `dist.ddp` components instead
* Removed custom components for example apps in favor of using builtins.
* Added "env", "max_retries" and "mounts" arguments to utils.sh

* `torchx.cli`
* Better parsing of configs from a string literal
* Added support to delimit kv-pairs and list values with "," and ";" interchangeably
* allow the default scheduler to be specified via .torchxconfig
* better invalid scheduler messaging
* Log message about how to disable workspaces
* Job cancellation support via `torchx cancel <job>`

`torchx.workspace`
* Support for .dockerignore files used as include lists to fixe some behavioral differences between how .dockerignore files are interpreted by torchx and docker

* Testing
* Component tests now run sequentially
* Components can be tested with a runner using `components.components_test_base.ComponentTestCaserun_component()` method.

* Additional Changes
* Updated Pyre configuration to preemptively guard again upcoming semantic changes
* Formatting changes from black 22.3.0
* Now using pyfmt with usort 1.0 and the new import merging behavior.
* Added script to automatically get system diagnostics for reporting purposes

0.1.2

Milestone: https://github.com/pytorch/torchx/milestones/3

* PyTorch 1.11 Support
* Python 3.10 Support
* `torchx.workspace`
* TorchX now supports a concept of workspaces. This enables seamless launching
of jobs using changes present in your local workspace. For Docker based
schedulers, we automatically build a new docker container on job launch
making it easier than ever to run experiments. 333
* `torchx.schedulers`
* Ray 329
* Newly added Ray scheduler makes it easy to launch jobs on Ray.
* https://pytorch.medium.com/large-scale-distributed-training-with-torchx-and-ray-1d09a329aacb
* AWS Batch 381
* Newly added AWS Batch scheduler makes it easy to launch jobs in AWS with minimal infrastructure setup.
* Slurm
* Slurm jobs will by default launch in the current working directory to match `local_cwd` and workspace behavior. 372
* Replicas now have their own log files and can be accessed programmatically. 373
* Support for `comment`, `mail-user` and `constraint` fields. 391
* WorkspaceMixin support (prototype) - Slurm jobs can now be launched in isolated experiment directories. 416
* Kubernetes
* Support for running jobs under service accounts. 408
* Support for specifying instance types. 433
* All Docker-based Schedulers (Kubernetes, Batch, Docker)
* Added bind mount and volume supports 420, 426
* Bug fix: Better shm support for large dataloader 429
* Support for `.dockerignore` and custom Dockerfiles 401
* Local Scheduler
* Automatically set `CUDA_VISIBLE_DEVICES` 383
* Improved log ordering 366
* `torchx.components`
* `dist.ddp`
* Rendezvous works out of the box on all schedulers 400
* Logs are now prefixed with local ranks 412
* Can specify resources via the CLI 395
* Can specify environment variables via the CLI 399
* HPO
* Ax runner now lives in the Ax repo https://github.com/facebook/Ax/commit/8e2e68f21155e918996bda0b7d97b5b9ef4e0cba
* `torchx.cli`
* `.torchxconfig`
* You can now specify component argument defaults `.torchxconfig` https://github.com/pytorch/torchx/commit/c37cfd7846d5a0cb527dd19c8c95e881858f8f0a
* `~/.torchxconfig` can now be used to set user level defaults. 378
* `--workspace` can be configured 397
* Color change and bug fixes 419
* `torchx.runner`
* Now supports workspace interfaces. 360
* Returned lines now preserve whitespace to provide support for progress bars 425
* Events are now logged to `torch.monitor` when available. 379
* `torchx.notebook` (prototype)
* Added new workspace interface for developing models and launching jobs via a Jupyter Notebook. 356
* Docs
* Improvements to clarify TorchX usage w/ workspaces and general cleanups.
* 374, 402, 404, 407, 434

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.