Torchx

Latest version: v0.7.0

Safety actively analyzes 688532 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.7.0

* `torchx.schedulers`
* AWS Batch Scheduler
* Add job_role_arn and execution_role_arn as run options for AWS permission
* Add instance type for aws_batch_scheduler multinode jobs
* Add neuron device mount for aws trn instances
* Update EFA_DEVICE details for AWS resources
* Add aws_sagemaker_scheduler
* Torchx scheduler for AWS SageMaker
* Docker Scheduler
* Add support for setting environment variables
* GCP Batch Scheduler
* Fix log

* `torchx.tracker`
* Add module lookup for building trackers

* `torchx.cli`
* Throw error if there are multiple scheduler arguments

* `torchx.specs.api`
* Add macro support to metadata variables
* Add NamedTuple for Tuple from parsing AppHandle

* Changes to ease maintenance
* Migrate all active AWS resources to US-west-1 region with Terraform
* Fix all Github Actions
* Better linter error message.
* Migrate Kubernetes Integration test to MiniKube
* Deprecate KFP Integration Tests

* Additional changes
* Add verbose flag to docker mixin

0.6.0

* Breaking changes
* Drop support for python 3.7.
* Upgrade docker base image python version to 2.0

* `torchx.schedulers`
* Add support for options in create_schedulers factory method that allows scheduler configuration in runner
* Kubernetes MCAD Scheduler
* Add support for retrying
* Test, formatting and documentation updates
* AWS Batch Scheduler
* Fix logging rank attribution
* Ray Scheduler
* Add ability to programmatically define ray job client

* `torchx.tracker`
* Fix adding artifacts to MLFlowTracker by multiple ranks

* `torchx.components`
* dst.ddp
* Add ability to specify rendezvous backend and use c10d as a default mechanism
* Add node_rank parameter value for static rank setup

* `torchx.runner`
* Resolve run_opts when passing to `torchx.workpsace` and for dry-run to correctly populate the values

* `torchx.runner.events`
* Add support to log CPU and wall times

* `torchx.cli`
* Wait for app to start when logging

* `torchx.specs`
* Role.resources uses default_factory method to initialize its value

0.5.0

* Milestone: https://github.com/pytorch/torchx/milestone/7

* `torchx.schedulers`
* Kubernetes MCAD Scheduler (Prototype)
* Newly added integration for easily scheduling jobs on Multi-Cluster-Application-Dispatcher (MCAD).
* Features include:
* scheduling different types of components including DDP components
* scheduling on different compute resources (CPU, GPU)
* support for docker workspace
* support for bind, volume and device mounts
* getting logs for jobs
* describing, listing and cancelling jobs
* can be used with a secondary scheduler on Kubernetes
* AWS Batch
* Add privileged option to enable running containers on EFA enabled instances with elevated networking permissions

* `torchx.tracker`
* MLflow backend (Prototype)
* New support for MLFlow backend for torchx tracker
* Add ability for fsspec tracker to read nested kwargs
* Support for tracking apps not launched by torchx
* Load tracker config from .torchxconfig

* `torchx.components`
* Add dist.spmd component to support Single-Process-Multiple-Data style applications

* `torchx.workspace`
* Add ability to access image and workspace path from Dockerfile while building docker workspace

* Usability imporvements
* Fix entrypoint loading to deal with deferred loading of modules to enable component registration to work properly

* Changes to ease maintenance
* Add ability to run integration tests for AWS Batch, Slurm, and Kubernetes, instead of running in a remote dedicated clusters. This makes the environment reproducible, reduces maintenance, and makes it easier for more users to contribute.

* Additional changes
* Bug fixes: Make it possible to launch jobs with more than 5 nodes on AWS Batch

0.4.0

* Milestone: https://github.com/pytorch/torchx/milestone/6

* `torchx.schedulers`
* GCP Batch (Prototype)
* Newly added integration for easily scheduling jobs on GCP Batch.
* Features include:
* scheduling different types of components including DDP components
* scheduling on different compute resources (CPU, GPU)
* describing jobs including getting job status
* getting logs for jobs
* listing jobs
* cancelling jobs
* AWS Batch
* Listing jobs now returns just jobs launched on AWS Batch by TorchX and uses pagination to enable listing all jobs in all queues.
* Named resources now account for ECS and EC2 memtax, and suggests closest match when resource is not found.
* Named resources expanded to include all instance types for g4d, g5, p4d, p3 and trn1.

* `torchx.workspace`
* Improve docker push logging to prevent log spamming when pushing for the first time

* Additional Changes
* Remove classyvision from examples since it's no longer supported in OSS. Uses torchvision/torch dataset APIs instead of ClassyDataset.

0.3.0

* Milestone: https://github.com/pytorch/torchx/milestone/5

* `torchx.schedulers`
* List API (Prototype)
* New list API to list jobs and their statuses for all schedulers which removes the need to use secondary tools to list jobs
* AWS Batch (promoted to Beta)
* Get logs for running jobs
* Added configs for job priorities and queue policies
* Easily access job UI via ui_url
* Ray
* Add elasticity to jobs launched on ray cluster to automatically scale jobs up as resources become available
* Kubernetes
* Add elasticity to jobs launched on Kubernetes
* LSF Scheduler (Prototype)
* Newly added support for scheduling on IBM Spectrum LSF scheduler
* Local Scheduler
* Better formatting when using pdb

* `torchx.tracker` (Prototype)
* TorchX Tracker is a new lightweight experiment and artifact tracking tool
* Add tracker API that can track any inputs and outputs to your model in any infrastructure
* FSSpec based Torchx tracking implementation and sample app

* `torchx.runner`
* Allow overriding TORCHX_IMAGE via entrypoints
* Capture the image used when logging schedule calls

* `torchx.components`
* Add debug flag to dist component

* `torchx.cli`
* New list feature also available as a subcommand to list jobs and their statuses on a given scheduler
* New tracker feature also available as a subcommand to track experiments and artifacts
* Defer loading schedulers until used

* `torchx.workspace`
* Preserve Unix file mode when patching files into docker image.

* Docs
* Add airflow example

* Additional changes
* Bug fixes for Python 3.10 support

0.2.0

* Milestone: https://github.com/pytorch/torchx/milestone/4

* `torchx.schedulers`
* DeviceMounts
* New mount type 'DeviceMount' that allows mounting a host device into a container in the supported schedulers (Docker, AWS Batch, K8). Custom accelerators and network devices such as Infiniband or Amazon EFA are now supported.
* Slurm
* Scheduler integration now supports "max_retries" the same way that our other schedulers do. This only handles whole job level retries and doesn't support per replica retries.
* Autodetects "nomem" setting by using `sinfo` to get the "Memory" setting for the specified partition
* More robust slurmint script
* Kubernetes
* Support for k8s device plugins/resource limits
* Added "devices" list of (str, int) tuples to role/resource
* Added devices.py to map from named devices to DeviceMounts
* Added logic in kubernetes_scheduler to add devices from resource to resource limits
* Added logic in aws_batch_scheduler and docker_scheduler to add DeviceMounts for any devices from resource
* Added "priority_class" argument to kubernetes scheduler to set the priorityClassName of the volcano job.
* Ray
* fixes for distributed training, now supported in Beta

* `torchx.specs`
* Moved factory/builder methods from datastruct specific "specs.api" to "specs.factory" module

* `torchx.runner`
* Renamed "stop" method to "cancel" for consistency. `Runner.stop` is now deprecated
* Added warning message when "name" parameter is specified. It is used as part of Session name, which is deprecated so makes "name" obsolete.
* New env variable TORCHXCONFIG for specified config

* `torchx.components`
* Removed "base" + "torch_dist_role" since users should prefer to use the `dist.ddp` components instead
* Removed custom components for example apps in favor of using builtins.
* Added "env", "max_retries" and "mounts" arguments to utils.sh

* `torchx.cli`
* Better parsing of configs from a string literal
* Added support to delimit kv-pairs and list values with "," and ";" interchangeably
* allow the default scheduler to be specified via .torchxconfig
* better invalid scheduler messaging
* Log message about how to disable workspaces
* Job cancellation support via `torchx cancel <job>`

`torchx.workspace`
* Support for .dockerignore files used as include lists to fixe some behavioral differences between how .dockerignore files are interpreted by torchx and docker

* Testing
* Component tests now run sequentially
* Components can be tested with a runner using `components.components_test_base.ComponentTestCaserun_component()` method.

* Additional Changes
* Updated Pyre configuration to preemptively guard again upcoming semantic changes
* Formatting changes from black 22.3.0
* Now using pyfmt with usort 1.0 and the new import merging behavior.
* Added script to automatically get system diagnostics for reporting purposes

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.