Torch-xla

Latest version: v2.6.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 3

2.0.0

Cloud TPUs now support the [PyTorch 2.0 release](https://github.com/pytorch/pytorch/releases), via PyTorch/XLA integration. On top of the underlying improvements and bug fixes in PyTorch's 2.0 release, this release introduces several features, and PyTorch/XLA specific bug fixes.

Beta Features
PJRT runtime
* Checkout our newest [document](https://github.com/pytorch/xla/blob/r2.0/docs/pjrt.md); PjRt is the default runtime in 2.0.
* New Implementation of xm.rendezvous with XLA collective communication which scales better ([4181](https://github.com/pytorch/xla/pull/4181))
* New PJRT TPU backend through the C-API ([4077](https://github.com/pytorch/xla/pull/4077))
* Use PJRT to default if no runtime is configured ([4599](https://github.com/pytorch/xla/pull/4599))
* Experimental support for torch.distributed and DDP on TPU v2 and v3 ([4520](https://github.com/pytorch/xla/pull/4520))

FSDP
* Add auto_wrap_policy into XLA FSDP for automatic wrapping ([4318](https://github.com/pytorch/xla/pull/4318))

Stable Features
Lazy Tensor Core Migration
* Migration is completed, checkout this [dev discussion](https://dev-discuss.pytorch.org/t/pytorch-xla-2022-q4-dev-update/961) for more detail.
* Naively inherits LazyTensor ([4271](https://github.com/pytorch/xla/pull/4271))
* Adopt even more LazyTensor interfaces ([4317](https://github.com/pytorch/xla/pull/4317))
* Introduce XLAGraphExecutor ([4270](https://github.com/pytorch/xla/pull/4270))
* Inherits LazyGraphExecutor ([4296](https://github.com/pytorch/xla/pull/4296))
* Adopt more LazyGraphExecutor virtual interfaces ([4314](https://github.com/pytorch/xla/pull/4314))
* Rollback to use xla::Shape instead of torch::lazy::Shape ([4111](https://github.com/pytorch/xla/pull/4111))
* Use TORCH_LAZY_COUNTER/METRIC ([4208](https://github.com/pytorch/xla/pull/4208))

Improvements & Additions
* Add an option to increase the worker thread efficiency for data loading ([4727](https://github.com/pytorch/xla/pull/4727))
* Improve numerical stability of torch.sigmoid ([4311](https://github.com/pytorch/xla/pull/4311))
* Add an api to clear counter and metrics ([4109](https://github.com/pytorch/xla/pull/4109))
* Add met.short_metrics_report to display more concise metrics report ([4148](https://github.com/pytorch/xla/pull/4148))
* Document environment variables ([4273](https://github.com/pytorch/xla/pull/4273))
* Op Lowering
* _linalg_svd ([4537](https://github.com/pytorch/xla/pull/4537))
* Upsample_bilinear2d with scale ([4464](https://github.com/pytorch/xla/pull/4464))

Experimental Features
TorchDynamo (torch.compile) support
* Checkout our newest [doc](https://github.com/pytorch/xla/blob/r2.0/docs/dynamo.md).
* Dynamo bridge python binding ([4119](https://github.com/pytorch/xla/pull/4119))
* Dynamo bridge backend implementation ([4523](https://github.com/pytorch/xla/pull/4523))
* Training optimization: make execution async ([4425](https://github.com/pytorch/xla/pull/4425))
* Training optimization: reduce graph execution per step ([4523](https://github.com/pytorch/xla/pull/4523))

PyTorch/XLA GSPMD on single host
* Preserve parameter sharding with sharded data placeholder ([4721)](https://github.com/pytorch/xla/pull/4721)
* Transfer shards from server to host ([4508](https://github.com/pytorch/xla/pull/4508))
* Store the sharding annotation within XLATensor([4390](https://github.com/pytorch/xla/pull/4390))
* Use d2d replication for more efficient input sharding ([4336](https://github.com/pytorch/xla/pull/4336))
* Mesh to support custom device order. ([4162](https://github.com/pytorch/xla/pull/4162))
* Introduce virtual SPMD device to avoid unpartitioned data transfer ([4091](https://github.com/pytorch/xla/pull/4091))

Ongoing development
Ongoing Dynamic Shape implementation
* Implement missing `XLASymNodeImpl::Sub` ([4551](https://github.com/pytorch/xla/pull/4551))
* Make empty_symint support dynamism. ([4550](https://github.com/pytorch/xla/pull/4550))
* Add dynamic shape support to SigmoidBackward ([4322](https://github.com/pytorch/xla/pull/4322))
* Add a forward pass NN model with dynamism test ([4256](https://github.com/pytorch/xla/pull/4256))
Ongoing SPMD multi host execution ([4573](https://github.com/pytorch/xla/pull/4573))

Bug fixes & improvements
* Support int as index type ([4602](https://github.com/pytorch/xla/pull/4602))
* Only alias inputs and outputs when force_ltc_sync == True ([4575](https://github.com/pytorch/xla/pull/4575))
* Fix race condition between execution and buffer tear down on GPU when using bfc_allocator ([4542](https://github.com/pytorch/xla/pull/4542))
* Release the GIL during TransferFromServer ([4504](https://github.com/pytorch/xla/pull/4504))
* Fix type annotations in FSDP ([4371](https://github.com/pytorch/xla/pull/4371))

1.13.0

Cloud TPUs now support the [PyTorch 1.13 release](https://github.com/pytorch/pytorch/releases), via PyTorch/XLA integration. The release has daily automated testing for the supported models: [Torchvision ResNet](https://cloud.google.com/tpu/docs/tutorials/resnet-pytorch), [FairSeq Transformer](https://cloud.google.com/tpu/docs/tutorials/transformer-pytorch) and [RoBERTa](https://cloud.google.com/tpu/docs/tutorials/roberta-pytorch), [HuggingFace GLUE and LM](https://github.com/huggingface/transformers), and [Facebook Research DLRM](https://cloud.google.com/tpu/docs/tutorials/pytorch-dlrm).

On top of the underlying improvements and bug fixes in PyTorch's 1.13 release, this release adds several features and PyTorch/XLA specified bug fixes.

New Features
- GPU enhancement
- Add upsample_nearest/bilinear implementation for CPU and GPU ([3990](https://github.com/pytorch/xla/pull/3990))
- Set three_fry as the default RNG for GPU ([3951](https://github.com/pytorch/xla/pull/3951))
- FSDP enhancement
- allow FSDP wrapping and sharding over modules on CPU devices ([3992](https://github.com/pytorch/xla/pull/3992))
- Support param sharding dim and pinning memory ([3830](https://github.com/pytorch/xla/pull/3830))
- Lower torch::einsum using xla::einsum which provide significant speedup ([3843](https://github.com/pytorch/xla/pull/3843))
- Support large models with >3200 graph input on TPU + PJRT ([3920](https://github.com/pytorch/xla/pull/3920))

Experimental Features
- PJRT experimental support on Cloud TPU v4
- Check the instruction and example code in [here](https://github.com/pytorch/xla/blob/r1.13/docs/pjrt.md)
- DDP experimental support on Cloud TPU and GPU
- Check the instruction, analysis and example code in [here](https://github.com/pytorch/xla/blob/r1.13/docs/ddp.md)

Ongoing development
- Ongoing Dynamic Shape implementation (POC completed)
- Ongoing SPMD implementation (POC completed)
- Ongoing LTC migration

Bug fixes and improvements
- Make XLA_HLO_DEBUG populate the scope metadata ([3985](https://github.com/pytorch/xla/pull/3985))

1.12.0

Cloud TPUs now support the [PyTorch 1.12 release](https://github.com/pytorch/pytorch/releases), via PyTorch/XLA integration. The release has daily automated testing for the supported models: [Torchvision ResNet](https://cloud.google.com/tpu/docs/tutorials/resnet-pytorch), [FairSeq Transformer](https://cloud.google.com/tpu/docs/tutorials/transformer-pytorch) and [RoBERTa](https://cloud.google.com/tpu/docs/tutorials/roberta-pytorch), [HuggingFace GLUE and LM](https://github.com/huggingface/transformers), and [Facebook Research DLRM](https://cloud.google.com/tpu/docs/tutorials/pytorch-dlrm).

On top of the underlying improvements and bug fixes in PyTorch's 1.12 release, this release adds several features and PyTorch/XLA specified bug fixes.

New feature
- FSDP
- Check the instruction and example code in [here](https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md)
- FSDP support for PyTorch/XLA (https://github.com/pytorch/xla/pull/3431)
- Bfloat 16 and float 16 support in FSDP (https://github.com/pytorch/xla/pull/3617)
- PyTorch/XLA gradident checkpoint api (https://github.com/pytorch/xla/pull/3524)
- Optimization_barrier which enables gradient checkpointing (https://github.com/pytorch/xla/pull/3482)
- Ongoing LTC migration
- Device lock position optimization to speed up tracing (https://github.com/pytorch/xla/pull/3457)
- Experimental support for PJRT TPU client (https://github.com/pytorch/xla/pull/3550)
- Send/Recv CC op support (https://github.com/pytorch/xla/pull/3494)
- Performance profiling tool enhancement (https://github.com/pytorch/xla/pull/3498)
- TPU-V4 pod official support (https://github.com/pytorch/xla/pull/3440)
- Roll lowering (https://github.com/pytorch/xla/pull/3505)
- Celu, celu_, selu, selu_ lowering (https://github.com/pytorch/xla/pull/3547)

Bug fixes and improvements
- Fixed a view bug which will create unnecessary IR graph (https://github.com/pytorch/xla/pull/3411)

1.11.0

Cloud TPUs now support the [PyTorch 1.11 release](https://github.com/pytorch/pytorch/releases), via PyTorch/XLA integration. The release has daily automated testing for the supported models: [Torchvision ResNet](https://cloud.google.com/tpu/docs/tutorials/resnet-pytorch), [FairSeq Transformer](https://cloud.google.com/tpu/docs/tutorials/transformer-pytorch) and [RoBERTa](https://cloud.google.com/tpu/docs/tutorials/roberta-pytorch), [HuggingFace GLUE and LM](https://github.com/huggingface/transformers), and [Facebook Research DLRM](https://cloud.google.com/tpu/docs/tutorials/pytorch-dlrm).

On top of the underlying improvements and bug fixes in PyTorch's 1.11 release, this release adds several features and PyTorch/XLA specified bug fixes.

New feature

- Enable [asynchronous RNG seed sending](https://github.com/pytorch/xla/pull/3292) by environment variable `XLA_TRANSFER_SEED_ASYNC `
- Add a [native torch.distributed backend](https://github.com/pytorch/xla/pull/3339)
- Introduce a [Eager debug mode](https://github.com/pytorch/xla/pull/3306) by environment variable `XLA_USE_EAGER_DEBUG_MODE `
- Add synchronous free [Adam and AdamW optimizers](https://github.com/pytorch/xla/pull/3294) for PyTorch/XLA:GPU AMP
- Add synchronous free [SGD](https://github.com/pytorch/xla/pull/3145) optimizers for PyTorch/XLA:GPU AMP
- [linspace](https://github.com/pytorch/xla/pull/3335) lowering
- [mish](https://github.com/pytorch/xla/pull/3236) lowering
- [prelu](https://github.com/pytorch/xla/pull/3222) lowering
- [slogdet](https://github.com/pytorch/xla/pull/3183) lowering
- [stable sort](https://github.com/pytorch/xla/pull/3246) lowering
- [index_add with alpha scaling](https://github.com/pytorch/xla/pull/3227) lowering

Bug fixes && improvements
- Improve [`torch.var`](https://github.com/pytorch/xla/pull/3262) performance and numerical stability on TPU
- Improve [`torch.pow`](https://github.com/pytorch/xla/pull/3251) performance
- Fix the [incorrect output dtype](https://github.com/pytorch/xla/pull/3229) when divide a f32 by a f64
- Fix the [incorrect result](https://github.com/pytorch/xla/pull/3144) of `nll_loss` when reduction = "mean" and whole target is equal to ignore_index

1.10.0

Cloud TPUs now support the [PyTorch 1.10 release](https://github.com/pytorch/pytorch/releases), via PyTorch/XLA integration. The release has daily automated testing for the supported models: [Torchvision ResNet](https://cloud.google.com/tpu/docs/tutorials/resnet-pytorch), [FairSeq Transformer](https://cloud.google.com/tpu/docs/tutorials/transformer-pytorch) and [RoBERTa](https://cloud.google.com/tpu/docs/tutorials/roberta-pytorch), [HuggingFace GLUE and LM](https://github.com/huggingface/transformers), and [Facebook Research DLRM](https://cloud.google.com/tpu/docs/tutorials/pytorch-dlrm).

On top of the underlying improvements and bug fixes in PyTorch's 1.10 release, this release adds several PyTorch/XLA-specific bug fixes:

- Add support for [reduce_scatter](https://github.com/pytorch/xla/pull/3075)
- Introduce the [AMP Zero gradients optimization](https://github.com/pytorch/xla/pull/3119) for XLA:GPU
- Introduce the environment variable [XLA_DOWN_CAST_BF16](https://github.com/pytorch/xla/pull/2999) and [XLA_DOWNCAST_FP16](https://github.com/pytorch/xla/pull/2999) to downcast input tensors
- [adaptive_max_pool2d](https://github.com/pytorch/xla/pull/3083) lowering
- [nan_to_num](https://github.com/pytorch/xla/pull/3093) lowering
- [sgn](https://github.com/pytorch/xla/pull/3045) lowering
- [logical_not](https://github.com/pytorch/xla/pull/3084)/[logical_xor](https://github.com/pytorch/xla/pull/3084)/[logical_or](https://github.com/pytorch/xla/pull/3084)/[logical_and](https://github.com/pytorch/xla/pull/3054) lowering
- [amax](https://github.com/pytorch/xla/pull/3027) lowering
- [amin](https://github.com/pytorch/xla/pull/3034) lowering
- [std_mean](https://github.com/pytorch/xla/pull/3004) lowering
- [var_mean](https://github.com/pytorch/xla/pull/3014) lowering
- [lerp](https://github.com/pytorch/xla/pull/2972) lowering
- [isnan](https://github.com/pytorch/xla/pull/2969) lowering

1.9.0

Cloud TPUs now support the [PyTorch 1.9 release](https://github.com/pytorch/pytorch/releases/tag/v1.9), via PyTorch/XLA integration. The release has daily automated testing for the supported models: [Torchvision ResNet](https://cloud.google.com/tpu/docs/tutorials/resnet-pytorch), [FairSeq Transformer](https://cloud.google.com/tpu/docs/tutorials/transformer-pytorch) and [RoBERTa](https://cloud.google.com/tpu/docs/tutorials/roberta-pytorch), [HuggingFace GLUE and LM](https://github.com/huggingface/transformers), and [Facebook Research DLRM](https://cloud.google.com/tpu/docs/tutorials/pytorch-dlrm).

On top of the underlying improvements and bug fixes in PyTorch's 1.9 release, this release adds several PyTorch/XLA-specific bug fixes:

* [Floor division fix](https://github.com/pytorch/xla/pull/2950)
* [Clamp fix](https://github.com/pytorch/xla/pull/2929)
* [Softplus fix](https://github.com/pytorch/xla/pull/2902)
* [as_strided_](https://github.com/pytorch/xla/pull/2912)
* [std abd var lowering](https://github.com/pytorch/xla/pull/2891)

Page 2 of 3

Releases

Has known vulnerabilities

Previous Next

Torch-xla

Page 2 of 3

2.0.0

1.13.0

1.12.0

1.11.0

1.10.0

1.9.0

Page 2 of 3

Links

Releases