Product Research Enterprise Plans Docs

Longnet

Latest version: v0.5.7

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 8

0.0.7

* Training file,
* Solved alot of file path errors with torchscale,
* added causal masking to DilatedAttention

0.0.6

0.0.5

0.0.4

Changelog:

**2023-07-07:**

- **Bug 1:** Initial model `LongNet` was not able to differentiate between multimodal and language only models.

- **Solution:** Introduced an additional class `LongNetSelector` that uses a factory method `get_model` to create either `LongNet` (multimodal) or `LongNetLanguage` (language only) model based on the user's input.

- **Bug 2:** The original model architecture lacked the capability to handle language only scenarios effectively.

- **Solution:** Introduced a new `LongNetLanguage` class, derived from `LongNet`, specifically designed to handle language-only tasks.

- **Bug 3:** The `LongNet` and `LongNetLanguage` classes were not tested thoroughly, which could lead to unnoticed bugs and failures.

- **Solution:** Developed a `LongNetTest` class that can simulate a forward pass through both types of models using dummy data, thus validating their architectures.

0.0.3

flash attention module

0.0.2

Changelog

1. **Flash Multi-head Attention Integration**
- Initially, the code was using `torch.nn.MultiheadAttention`. It has been changed to use FlashMultiHeadAttention from the `flash_attn` library. This change is expected to improve the efficiency of the model's attention mechanism.

2. **GPU Support**
- All computations are now done on a specified GPU device, defined at the beginning of the script. This significantly improves the model's speed and efficiency.

3. **Use of 16-bit Floating Point Precision**
- The data type for computations was changed from default (32-bit floating-point) to 16-bit floating-point (torch.float16). This reduces memory usage and improves the speed of computations on modern GPUs.

4. **Added Dropout**
- A dropout layer has been added after the attention operation in the DilatedAttention class. Dropout is a regularization technique that prevents overfitting by randomly setting a fraction of input units to 0 at each update during training time.

The changes were made in the following lines of code:

python
Initialize dropout layer in the constructor
self.dropout = nn.Dropout(dropout)

Apply dropout after performing attention in the forward function
attn_output = self.dropout(attn_output)

5. **Added Unit Tests and Benchmarks**
- Unit tests and benchmarking code have been added to ensure the correctness and efficiency of the DilatedAttention class.

6. **Documentation and Example Updates**
- Updated the documentation and usage examples for the DilatedAttention class to reflect the above changes.

7. **Twitter Thread**
- Created a Twitter thread in the style of Richard Feynman to promote the project and its new updates.

Page 7 of 8

Releases

Has known vulnerabilities

Previous Next

Longnet

Page 7 of 8

0.0.7

0.0.6

0.0.5

0.0.4

0.0.3

0.0.2

Page 7 of 8

Links

Releases