Mariannmt

Latest version: v0.0.1

Safety actively analyzes 682471 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 3

1.1.3

Added
- Added back gradient-dropping

Fixed
- Fixed parameters initialization for `--tied-embeddings` during translation

1.1.2

Fixed
- Fixed ensembling with language model and batched decoding
- Fixed attention reduction kernel with large matrices (added missing
`syncthreads()`), which should fix stability with large batches and beam-size
during batched decoding

1.1.1

Added
- Option `--max-length-crop` to be used together with `--max-length N` to crop
sentences to length N rather than omitting them.
- Experimental model with convolution over input characters

Fixed
- Fixed a number of bugs for vocabulary and directory handling

1.1.0

Added
- Batched translation for all model types, significant translation speed-up
- Batched translation during validation with translation
- `--maxi-batch-sort` option for `marian-decoder`
- Support for CUBLAS_TENSOR_OP_MATH mode for cublas in cuda 9.0
- The "marian-vocab" tool to create vocabularies

1.0.0

Added
- Multi-gpu validation, scorer and in-training translation
- summary-mode for scorer
- New "transformer" model based on [Attention is all you
need](https://arxiv.org/abs/1706.03762)
- Options specific for the transformer model
- Linear learning rate warmup with and without initial value
- Cyclic learning rate warmup
- More options for learning rate decay, including: optimizer history reset,
repeated warmup
- Continuous inverted square root decay of learning (`--lr-decay-inv-sqrt`)
rate based on number of updates
- Exposed optimizer parameters (e.g. momentum etc. for Adam)
- Version of deep RNN-based models compatible with Nematus (`--type nematus`)
- Synchronous SGD training for multi-gpu (enable with `--sync-sgd`)
- Dynamic construction of complex models with different encoders and decoders,
currently only available through the C++ API
- Option `--quiet` to suppress output to stderr
- Option to choose different variants of optimization criterion: mean
cross-entropy, perplexity, cross-entropy sum
- In-process translation for validation, uses the same memory as training
- Label Smoothing
- CHANGELOG.md
- CONTRIBUTING.md
- Swish activation function default for Transformer
(https://arxiv.org/pdf/1710.05941.pdf)

Changed
- Changed shape organization to follow numpy.
- Changed option `--moving-average` to `--exponential-smoothing` and inverted
formula to `s_t = (1 - \alpha) * s_{t-1} + \alpha * x_t`, `\alpha` is now
`1-e4` by default
- Got rid of thrust for compile-time mathematical expressions
- Changed boolean option `--normalize` to `--normalize [arg=1] (=0)`. New
behaviour is backwards-compatible and can also be specified as
`--normalize=0.6`
- Renamed "s2s" binary to "marian-decoder"
- Renamed "rescorer" binary to "marian-scorer"
- Renamed "server" binary to "marian-server"
- Renamed option name `--dynamic-batching` to `--mini-batch-fit`
- Unified cross-entropy-based validation, supports now perplexity and other CE
- Changed `--normalize (bool)` to `--normalize (float)arg`, allow to change
length normalization weight as `score / pow(length, arg)`

Removed
- Temporarily removed gradient dropping (`--drop-rate X`) until refactoring.

Page 3 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.