Longnet

Latest version: v0.5.7

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 8

0.4.8

Changelog for DilatedAttention with ParallelWrapper:

**1. Added `ParallelWrapper` Class**
- Introduced a `ParallelWrapper` class to simplify the usage of data parallelism.
- The `ParallelWrapper` class:
- Takes a neural network model as input.
- Allows the user to specify a device ("cuda" or "cpu").
- Contains a flag `use_data_parallel` to enable or disable data parallelism.
- Checks if multiple GPUs are available and applies `nn.DataParallel` to the model accordingly.
- Redirects attribute accesses to the internal model for seamless usage.

**2. Modified Usage of `DilatedAttention` Model**
- Wrapped the `DilatedAttention` model using the `ParallelWrapper` class.
- Enabled the model to be run on multiple GPUs if available.

**3. Device Assignment**
- Explicitly defined a device and used it to specify where the `DilatedAttention` model should be loaded.
- The device defaults to GPU (`cuda:0`) if CUDA is available; otherwise, it defaults to CPU.

**4. Example Usage**
- Provided an example of how to initialize and use the `ParallelWrapper` with the `DilatedAttention` model.

Summary:
The key addition was the `ParallelWrapper` class to facilitate easy and configurable usage of data parallelism with the provided `DilatedAttention` model. This ensures scalability across multiple GPUs without any significant change in the existing workflow. The user can now enable or disable data parallelism using a single flag.

0.4.3

Changelog:

1. **Tensor Shape Adjustments**:
- Ensured the consistent shape of tensors across all operations.
- Squeezed `a_indices` to 2D to match dimensions of `att_denom_sums`.

python
a_indices = a_indices[:, :, 0].squeeze(-1).squeeze(-1)

- Sliced `a_indices` to the unpadded sequence length before scattering.

python
a_indices = a_indices[:, :unpadded_seq_len]

2. **Scatter and Gather Operations**:
- Scatter with squeezed 2D `a_indices` and gather sparse sums with these indices.

python
att_denom_sums.scatter_add_(1, a_indices, a_denoms)
sparse_att_denom_sum = torch.gather(att_denom_sums, 1, a_indices)

3. **DataType Handling**:
- Converted the 'sparse indices' tensors to `torch.int64` (or `torch.long`) to ensure compatibility with PyTorch's indexing operations.
- Retained the `torch.float16` dtype for the 'X' tensor to make it memory-efficient.

4. **Code Cleaning**:
- Removed repeated lines that print the shape and datatype of "sparse indices" to declutter the code.
- Standardized debug print statements to have a consistent format.
- Print shapes of tensors before scattering to verify dimensions match.
- Added comments explaining dimension squeezing, slicing, and other adjustments for clarity.

5. **Validation Checks**:
- Added checks to ensure tensors are on the same device (either all on CPU or all on CUDA).
- Checked whether the size of the tensor 'X' matches the expected shape before operations.

6. **Enhanced Error Messages**:
- Improved the debug error messages to be more descriptive.

7. **Optimizations**:
- Removed unnecessary tensor operations that don't contribute to the final result.
- Optimized tensor slicing and indexing operations to be more memory efficient.

8. **Edge Case Handling**:
- Handled the edge case of negative `head_idx`.

9. **Other Minor Fixes**:
- Ensured that the code uses math or memory-efficient attention only if the input tensor is on CUDA and a non-A100 GPU is detected.
- Made sure tensor operations are consistent with PyTorch best practices.

10. **Documentation**:
- Added comments to highlight important changes and to explain certain decisions in the code.

0.4.2

* New sparsify function
* New Improved concat ops

0.4.1

Changelog

Bug Fixes

1. **Bug:** The size mismatch in tensor operations in the forward method of the `DilatedAttentionLLAMA` class.
- **Root Cause:** The tensors that are being operated upon did not have matching dimensions due to incorrect striding operations.
- **Resolution:** We modified the dilation process by introducing an inner loop over split tensors to handle each part separately, which resolved the dimension mismatch issues.

2. **Bug:** Index out of range error while transposing tensors.
- **Root Cause:** The index provided to the transpose operation was larger than the total number of dimensions in the tensor.
- **Resolution:** Corrected the index passed to the transpose operation to fit within the number of dimensions in the tensor.

Improvements

1. **Optimized Tensor Operations:** The tensor operations in the forward method were optimized to ensure they all operate on tensors with matching dimensions, improving the efficiency of the model.

2. **Added Error Handling:** We added checks for dimension mismatches in tensor operations to throw useful error messages when the input data does not match the expected shape.

Features

1. **DilatedAttentionLLAMA Class:** Introduced a new DilatedAttentionLLAMA class that uses dilated attention mechanism for the forward method. This new implementation is designed to be more efficient for larger sequence lengths.

2. **Performance Testing:** Added a simple performance test to benchmark the speed of the forward method in the DilatedAttentionLLAMA class.

0.4.0

Changelog
Bug Fixes
Issue: ValueError: too many values to unpack (expected 3)

Root Cause: The attention function was returning more than three values, but the code was trying to unpack its return values into only three variables.
Resolution: Modified the line where the attention function is called to collect all additional return values into a list using the * operator.
Issue: RuntimeError: The size of tensor a (64) must match the size of tensor b (2) at non-singleton dimension 1

Root Cause: The code was trying to add two tensors of different sizes in the forward method of the DynamicDilatedAttention class.
Resolution: Modified the line where the tensors are added to ensure that attn_output has the same size as the corresponding slice of outputs before trying to add them.
Issue: ValueError: not enough values to unpack (expected 7, got 6)

Root Cause: The flash_attn function in the FlashAttention class was trying to unpack the shape of the q tensor into seven variables, but the q tensor only had six dimensions.
Resolution: Modified the forward method of the DilatedAttention class to reshape the x tensor correctly before passing it to the attention function.
Improvements
Improvement: Added assertions to check the types and values of the parameters in the __init__ method of the DilatedAttention class to prevent incorrect usage.

Improvement: Added a check for the Distributed parameter in the __init__ method of the DilatedAttention class to decide whether to use the DataParallel wrapper for the FlashAttention modules.

Improvement: Modified the forward method of the DilatedAttention class to process each segment of the input separately for each attention head, allowing the attention heads to share information between different segments.

Improvement: Modified the forward method of the DilatedAttention class to use a buffer to store the attn_output_resized tensor instead of creating a new tensor of zeros in every forward pass, improving efficiency.

0.3.9 Page 1 of 8

Releases

Has known vulnerabilities

Longnet

Page 1 of 8

0.4.8

0.4.3

0.4.2

0.4.1

0.4.0

0.3.9

Page 1 of 8

Links

Releases