Triformer

Latest version: v3.0.7

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

3.0.2

Triformer
`pip install -U triformer`

TritonCrossEntropyLoss (New! 🎉)
The Triton implementation of Cross Entropy Loss is optimized for both performance and memory efficiency. It combines the forward and backward passes into a single CUDA kernel, which reduces memory overhead. A key feature is its in-place gradient computation, reusing the logits tensor instead of allocating new memory, resulting in about 2x memory savings compared to standard implementations. The code also supports chunked processing, allowing it to handle large batches by processing data in smaller pieces. For numerical stability, it implements the log-sum-exp trick, and it properly handles padding tokens through an ignore_index parameter. This makes it particularly efficient for large vocabulary sizes (30k-50k tokens) commonly found in language models.

python
from triformer import TritonCrossEntropyLoss

3.0.1

Triformer

`pip install -U triformer`

TritonDropout (New! 🎉)
A fast and memory-efficient dropout implementation using Triton's parallel processing capabilities. Features deterministic dropout patterns with seed control and optimized memory usage through block processing. Benchmarks show comparable or better training convergence compared to PyTorch's native dropout implementation.

Example usage:
python
from triformer import TritonDropout

Basic usage
output = TritonDropout.apply(x, p=0.5)

With deterministic seed
output = TritonDropout.apply(x, p=0.5, seed=42)

3.0.0

TritonLayerNorm:
Implemented Layer Normalization in Triton, designed to improve the stability and performance of transformer models. This implementation leverages Triton’s capabilities for optimized computation, allowing for faster training and inference times.

TritonSoftmax:
Introduced an efficient Softmax implementation in Triton. This addition enables more effective processing of output layers in neural networks, particularly in tasks requiring probabilistic outputs.

Usage
python
from triformer import TritonLayerNorm

python
from triformer import TritonSoftmax

1.3.4

change the data type to float32

1.1.0

This release completes the implementation of our custom linear layer by adding a fully-functional backward pass, enabling end-to-end training.

What's New
- **Complete Backward Pass Implementation**: Added three specialized Triton kernels for efficient backpropagation:
- `backward_input_kernel`: Computes gradients with respect to input
- `backward_weight_kernel`: Computes gradients with respect to weights
- `fused_relu_bias_backward_kernel`: Fused computation of bias gradients and ReLU backward pass

- **Performance Optimizations**:
- Kernel fusion to minimize memory operations
- Autotuned configurations for optimal performance
- Block-based computation patterns for efficient GPU utilization
- Mixed precision (float32 for computation, float16 for storage)

Previous Release
- Forward pass implementation with fused linear transformation and ReLU activation

Technical Details
The backward pass maintains the same performance philosophy as the forward pass:
- Leverages Triton for GPU acceleration
- Uses autotuning to optimize kernel configurations
- Implements efficient memory access patterns
- Maintains numerical stability through careful handling of data types

Usage
The layer can now be used as a drop-in replacement for `nn.Linear` in training scenarios:

python
layer = TritonLinear(in_features=512, out_features=256)
optimizer = torch.optim.Adam(layer.parameters())
Forward pass
output = layer(input_tensor)
loss = criterion(output, target)
Backward pass (now supported!)
loss.backward()
optimizer.step()

Releases

Has known vulnerabilities