Aws-parallelcluster

Latest version: v3.11.1

Safety actively analyzes 682387 Python packages for vulnerabilities to keep your Python projects secure.

Page 8 of 16

2.9.0

Not secure

-----

**ENHANCEMENTS**
- Add support for multiple queues and multiple instance types feature with the Slurm scheduler.
- Extend NICE DCV support to ARM instances.
- Extend support to disable hyperthreading on instances (like \*.metal) that don't support CpuOptions in
LaunchTemplate.
- Enable support for NFS 4 for the filesystems shared from the head node.
- Add CLI utility to convert configuration files with Slurm scheduler to new format to support multiple queues
configuration.
- Add script wrapper to support Torque-like commands with the Slurm scheduler.
- Remove dependency on cfn-init in compute nodes bootstrap in order to avoid throttling and delays caused by CloudFormation when a large number of compute nodes join the cluster.

**CHANGES**
- Introduce new configuration sections and parameters to support multiple queues and multiple instance types.
- Optimize scaling logic with Slurm scheduler, no longer based on Auto Scaling groups.
- A Route53 private hosted zone is now created together with the cluster and used in DNS resolution inside cluster nodes
when using Slurm scheduler.
- Upgrade EFA installer to version 1.9.5:
- EFA configuration: ``efa-config-1.4`` (from efa-config-1.3)
- EFA profile: ``efa-profile-1.0.0``
- EFA kernel module: ``efa-1.6.0`` (no change)
- RDMA core: ``rdma-core-28.amzn0`` (no change)
- Libfabric: ``libfabric-1.10.1amzn1.1`` (no change)
- Open MPI: ``openmpi40-aws-4.0.3`` (no change)
- Upgrade Slurm to version 20.02.4.
- Apply the following changes to Slurm configuration:
- Assign a range of 10 ports to Slurmctld in order to better perform with large cluster settings
- Configure cloud scheduling logic
- Set `ReconfigFlags=KeepPartState`
- Set `MessageTimeout=60`
- Set `TaskPlugin=task/affinity,task/cgroup` together with `TaskAffinity=no` and `ConstrainCores=yes` in cgroup.conf
- Upgrade NICE DCV to version 2020.1-9012.
- Use private IP instead of master node hostname when mounting shared NFS drives.
- Add new log streams to CloudWatch: chef-client, clustermgtd, computemgtd, slurm_resume, slurm_suspend.
- Add support for queue names in pre/post install scripts.
- Use PAY_PER_REQUEST billing mode for DynamoDb table in govcloud regions.
- Add limit of section names length to 30 characters in the configuration file.

**BUG FIXES**
- Solve dpkg lock issue with Ubuntu that prevented custom AMI creation in some cases.
- Add/improve sanity checks for some configuration parameters.
- Prevent ignored changes from being reported in ``pcluster update`` output.
- Fix incompatibility issues with python 2.7 for ``pcluster update``.
- Fix SNS Topic Subscriptions not being deleted with cluster's CloudFormation stack.

2.8.1

Not secure

-----

**CHANGES**

- Disable screen lock for DCV desktop sessions to prevent users from being locked out.

**BUG FIXES**

- Fix ``pcluster configure`` command to avoid writing unexpected configuration parameters.

2.8.0

Not secure

-----

**ENHANCEMENTS**

- Enable support for ARM instances on Ubuntu 18.04 and Amazon Linux 2.
- Add support for the automatic backup features of FSx file systems.
- Renewed user experience and robustness of cluster update functionality.
- Support DCV and EFS in China regions.
- Use DescribeInstanceTypes API to validate whether an instance type is EFA-enabled so that new EFA instances can
be used without requiring an update to the ParallelCluster configuration files.
- Enable Slurm to directly launch tasks and initialize communications through PMIx v3.1.5 on all supported
operating systems except for CentOS 6.
- Print a warning when using NICE DCV on micro or nano instances.

**CHANGES**

- Remove the client requirement to have Berkshelf to build a custom AMI.
- Upgrade EFA installer to version 1.9.4:
- Kernel module: ``efa-2.6.0`` (from efa-1.5.1)
- RDMA core: ``rdma-core-28.amzn0`` (from rdma-core-25.0)
- Libfabric: ``libfabric-1.10.1amzn1.1`` (updated from libfabric-aws-1.9.0amzn1.1)
- Open MPI: openmpi40-aws-4.0.3 (no change)
- Avoid unnecessary validation of IAM policies.
- Removed unused dependency on supervisor from the Batch Dockerfile.
- Move all LogGroup definitions in the CloudFormation templates into the CloudWatch substack.
- Disable libvirtd service on CentOS 7. Virtual bridge interfaces are incorrectly detected by Open MPI and
cause MPI applications to hang, see https://www.open-mpi.org/faq/?category=tcp#tcp-selection for details
- Use CINC instead of Chef for provisioning instances. See https://cinc.sh/about/ for details.
- Retry when mounting an NFS mount fails.
- Install the ``pyenv`` virtual environments used by ParallelCluster cookbook and node daemon code under
/opt/parallelcluster instead of under /usr/local.
- Use the new official CentOS 7 AMI as the base images for ParallelCluster AMI.
- Upgrade NVIDIA driver to Tesla version 440.95.01 on CentOS 6 and version 450.51.05 on all other distros.
- Upgrade CUDA library to version 11.0 on all distros besides CentOS 6.
- Install third-party cookbook dependencies via local source, rather than using the Chef supermarket.
- Use https wherever possible in download URLs.
- Install glibc-static, which is required to support certain options for the Intel MPI compiler.
- Require an initial cluster size greater than zero when the option to maintain the initial cluster size is used.

**BUG FIXES**

- Fix validator for CIDR-formatted IP range parameters.
- Fix issue that was preventing concurrent use of custom node and pcluster CLI packages.
- Use the correct domain name when contacting AWS services from the China partition.

2.7.0

Not secure

-----

**ENHANCEMENTS**

- ``sqswatcher``: The daemon is now compatible with VPC Endpoints so that SQS messages can be passed without traversing
the public internet.

**CHANGES**

- Upgrade NICE DCV to version 2020.0-8428.
- Upgrade Intel MPI to version U7.
- Upgrade NVIDIA driver to version 440.64.00.
- Upgrade EFA installer to version 1.8.4:
- Kernel module: ``efa-1.5.1`` (no change)
- RDMA core: ``rdma-core-25.0`` (no change)
- Libfabric: ``libfabric-aws-1.9.0amzn1.1`` (no change)
- Open MPI: openmpi40-aws-4.0.3 (updated from openmpi40-aws-4.0.2)
- Upgrade CentOS 7 AMI to version 7.8
- Configuration: base_os and scheduler parameters are now mandatory and they have no longer a default value.

**BUG FIXES**

- Fix recipes installation at runtime by adding the bootstrapped file at the end of the last chef run.
- Fix installation of FSx Lustre client on Centos 7
- FSx Lustre: Exit with error when failing to retrieve FSx mountpoint
- Fix sanity_check behavior when ``max queue_size`` > 1000

2.6.1

Not secure

-----

**ENHANCEMENTS**

- Improved management of S3 bucket that gets created when ``awsbatch`` scheduler is selected.
- Add validation for supported OSes when using FSx Lustre.
- Change ProctrackType from proctrack/gpid to proctrack/cgroup in Slurm in order to better handle termination of
stray processes when running MPI applications. This also includes the creation of a cgroup Slurm configuration in
in order to enable the cgroup plugin.
- Skip execution, at node bootstrap time, of all those install recipes that are already applied at AMI creation time.
- Start CloudWatch agent earlier in the node bootstrapping phase so that cookbook execution failures are correctly
uploaded and are available for troubleshooting.
- Improved the management of SQS messages and retries to speed-up recovery times when failures occur.

**CHANGES**

- FSx Lustre: remove ``x-systemd.requires=lnet.service`` from mount options in order to rely on default lnet setup
provided by Lustre.
- Enforce Packer version to be >= 1.4.0 when building an AMI. This is also required for customers using ``pcluster
createami`` command.
- Do not launch a replacement for an unhealthy or unresponsive node until this is terminated. This makes cluster slower
at provisioning new nodes when failures occur but prevents any temporary over-scaling with respect to the expected
capacity.
- Increase parallelism when starting ``slurmd`` on compute nodes that join the cluster from 10 to 30.
- Reduce the verbosity of messages logged by the node daemons.
- Do not dump logs to ``/home/logs`` when nodewatcher encounters a failure and terminates the node. CloudWatch can be
used to debug such failures.
- Reduce the number of retries for failed REMOVE events in sqswatcher.
- Omit cfn-init-cmd and cfn-wire from the files stored in CloudWatch logs.

**BUG FIXES**

- Configure proxy during cloud-init boothook in order for the proxy to be configured for all bootstrap actions.
- Fix installation of Intel Parallel Studio XE Runtime that requires yum4 since version 2019.5.
- Fix compilation of Torque scheduler on Ubuntu 18.04.
- Fixed a bug in the ordering and retrying of SQS messages that was causing, under certain circumstances of heavy load,
the scheduler configuration to be left in an inconsistent state.
- Delete from queue the REMOVE events that are discarded due to hostname collision with another event fetched as part
of the same ``sqswatcher`` iteration.

2.6.0

Not secure

-----

**ENHANCEMENTS**

- Add support for Amazon Linux 2
- Add support for NICE DCV on Ubuntu 18.04
- Add support for FSx Lustre on Ubuntu 18.04 and Ubuntu 16.04
- New CloudWatch logging capability to collect cluster and job scheduler logs to CloudWatch for cluster monitoring and inspection
- Add ``--keep-logs`` flag to ``pcluster delete`` command to preserve logs at cluster deletion
- Install and setup Amazon Time Sync on all OSs
- Enabling accounting plugin in Slurm for all OSes. Note: accounting is not enabled nor configured by default
- Add retry on throttling from CloudFormation API, happening when several compute nodes are being bootstrapped
concurrently
- Display detailed substack failures when ``pcluster create`` fails due to a substack error
- Create additional EFS mount target in the AZ of compute subnet, if needed
- Add validator for FSx Lustre Weekly Maintenance Start Time parameter
- Add validator to the KMS key provided for EBS, FSx, and EFS
- Add validator for S3 external resource
- Support two new FSx Lustre features, Scratch 2 and Persistent filesystems
- Add two new parameters ``deployment_type`` and ``per_unit_storage_throughput`` to the ``fsx`` section
- Add new storage sizes ``storage_capacity``, 1,200 GiB, 2,400 GiB and multiples of 2,400 are supported with ``SCRATCH_2``
- In transit encryption is available via ``fsx_kms_key_id`` parameter when ``deployment_type = PERSISTENT_1``
- New parameter ``per_unit_storage_throughput`` is available when ``deployment_type = PERSISTENT_1``

**CHANGES**

- Upgrade Slurm to version 19.05.5
- Upgrade Intel MPI to version U6
- Upgrade EFA installer to version 1.8.3:
- Kernel module: efa-1.5.1 (updated from efa-1.4.1)
- RDMA core: rdma-core-25.0 (distributed only) (no change)
- Libfabric: libfabric-aws-1.9.0amzn1.1 (updated from libfabric-aws-1.8.1amzn1.3)
- Open MPI: openmpi40-aws-4.0.2 (no change)
- Install Python 2.7.17 on CentOS 6 and set it as default through pyenv
- Install Ganglia from repository on Amazon Linux, Amazon Linux 2, CentOS 6 and CentOS 7
- Disable StrictHostKeyChecking for SSH client when target host is inside cluster VPC for all OSs except CentOS 6
- Pin Intel Python 2 and Intel Python 3 to version 2019.4
- Automatically disable ptrace protection on Ubuntu 18.04 and Ubuntu 16.04 compute nodes when EFA is enabled.
This is required in order to use local memory for interprocess communications in Libfabric provider
as mentioned here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-ptrace
- Packer version >= 1.4.0 is required for AMI creation
- Use version 5.2 of PyYAML for python 3 versions of 3.4 or earlier.

**BUG FIXES**

- Fix issue with slurmd daemon not being restarted correctly when a compute node is rebooted
- Fix errors causing Torque not able to locate jobs, setting server_name to fqdn on master node
- Fix Torque issue that was limiting the max number of running jobs to the max size of the cluster
- Fix OS validation depending on the configured scheduler

Page 8 of 16

Releases

Has known vulnerabilities

Previous Next

Aws-parallelcluster

Page 8 of 16

2.9.0

2.8.1

2.8.0

2.7.0

2.6.1

2.6.0

Page 8 of 16

Links

Releases