Changes
- **Updated metrics --clocks**.
Output for `amd-smi metric --clock` is updated to reflect each engine and bug fixes for the clock lock status and deep sleep status.
shell
$ amd-smi metric --clock
GPU: 0
CLOCK:
GFX_0:
CLK: 113 MHz
MIN_CLK: 500 MHz
MAX_CLK: 1800 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: ENABLED
GFX_1:
CLK: 113 MHz
MIN_CLK: 500 MHz
MAX_CLK: 1800 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: ENABLED
GFX_2:
CLK: 112 MHz
MIN_CLK: 500 MHz
MAX_CLK: 1800 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: ENABLED
GFX_3:
CLK: 113 MHz
MIN_CLK: 500 MHz
MAX_CLK: 1800 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: ENABLED
GFX_4:
CLK: 113 MHz
MIN_CLK: 500 MHz
MAX_CLK: 1800 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: ENABLED
GFX_5:
CLK: 113 MHz
MIN_CLK: 500 MHz
MAX_CLK: 1800 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: ENABLED
GFX_6:
CLK: 113 MHz
MIN_CLK: 500 MHz
MAX_CLK: 1800 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: ENABLED
GFX_7:
CLK: 113 MHz
MIN_CLK: 500 MHz
MAX_CLK: 1800 MHz
CLK_LOCKED: DISABLED
DEEP_SLEEP: ENABLED
MEM_0:
CLK: 900 MHz
MIN_CLK: 900 MHz
MAX_CLK: 1200 MHz
CLK_LOCKED: N/A
DEEP_SLEEP: DISABLED
VCLK_0:
CLK: 29 MHz
MIN_CLK: 914 MHz
MAX_CLK: 1480 MHz
CLK_LOCKED: N/A
DEEP_SLEEP: ENABLED
VCLK_1:
CLK: 29 MHz
MIN_CLK: 914 MHz
MAX_CLK: 1480 MHz
CLK_LOCKED: N/A
DEEP_SLEEP: ENABLED
VCLK_2:
CLK: 29 MHz
MIN_CLK: 914 MHz
MAX_CLK: 1480 MHz
CLK_LOCKED: N/A
DEEP_SLEEP: ENABLED
VCLK_3:
CLK: 29 MHz
MIN_CLK: 914 MHz
MAX_CLK: 1480 MHz
CLK_LOCKED: N/A
DEEP_SLEEP: ENABLED
DCLK_0:
CLK: 22 MHz
MIN_CLK: 711 MHz
MAX_CLK: 1233 MHz
CLK_LOCKED: N/A
DEEP_SLEEP: ENABLED
DCLK_1:
CLK: 22 MHz
MIN_CLK: 711 MHz
MAX_CLK: 1233 MHz
CLK_LOCKED: N/A
DEEP_SLEEP: ENABLED
DCLK_2:
CLK: 22 MHz
MIN_CLK: 711 MHz
MAX_CLK: 1233 MHz
CLK_LOCKED: N/A
DEEP_SLEEP: ENABLED
DCLK_3:
CLK: 22 MHz
MIN_CLK: 711 MHz
MAX_CLK: 1233 MHz
CLK_LOCKED: N/A
DEEP_SLEEP: ENABLED
- **Added deferred ecc counts**.
Added deferred error correctable counts to `amd-smi metric --ecc --ecc-blocks`
shell
$ amd-smi metric --ecc --ecc-blocks
GPU: 0
ECC:
TOTAL_CORRECTABLE_COUNT: 0
TOTAL_UNCORRECTABLE_COUNT: 0
TOTAL_DEFERRED_COUNT: 0
CACHE_CORRECTABLE_COUNT: 0
CACHE_UNCORRECTABLE_COUNT: 0
ECC_BLOCKS:
UMC:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
SDMA:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
...
- **Updated `amd-smi topology --json` to align with host/guest**.
Topology's `--json` output now is changed to align with output host/guest systems. Additionally, users can select/filter specific topology details as desired (refer to `amd-smi topology -h` for full list). See examples shown below.
*Previous format:*
shell
$ amd-smi topology --json
[
{
"gpu": 0,
"link_accessibility": {
"gpu_0": "ENABLED",
"gpu_1": "DISABLED"
},
"weight": {
"gpu_0": 0,
"gpu_1": 40
},
"hops": {
"gpu_0": 0,
"gpu_1": 2
},
"link_type": {
"gpu_0": "SELF",
"gpu_1": "PCIE"
},
"numa_bandwidth": {
"gpu_0": "N/A",
"gpu_1": "N/A"
}
},
{
"gpu": 1,
"link_accessibility": {
"gpu_0": "DISABLED",
"gpu_1": "ENABLED"
},
"weight": {
"gpu_0": 40,
"gpu_1": 0
},
"hops": {
"gpu_0": 2,
"gpu_1": 0
},
"link_type": {
"gpu_0": "PCIE",
"gpu_1": "SELF"
},
"numa_bandwidth": {
"gpu_0": "N/A",
"gpu_1": "N/A"
}
}
]
*New format:*
shell
$ amd-smi topology --json
[
{
"gpu": 0,
"bdf": "0000:01:00.0",
"links": [
{
"gpu": 0,
"bdf": "0000:01:00.0",
"weight": 0,
"link_status": "ENABLED",
"link_type": "SELF",
"num_hops": 0,
"bandwidth": "N/A",
},
{
"gpu": 1,
"bdf": "0001:01:00.0",
"weight": 15,
"link_status": "ENABLED",
"link_type": "XGMI",
"num_hops": 1,
"bandwidth": "50000-100000",
},
...
]
},
...
]
shell
$ /opt/rocm/bin/amd-smi topology -a -t --json
[
{
"gpu": 0,
"bdf": "0000:08:00.0",
"links": [
{
"gpu": 0,
"bdf": "0000:08:00.0",
"link_status": "ENABLED",
"link_type": "SELF"
},
{
"gpu": 1,
"bdf": "0000:44:00.0",
"link_status": "DISABLED",
"link_type": "PCIE"
}
]
},
{
"gpu": 1,
"bdf": "0000:44:00.0",
"links": [
{
"gpu": 0,
"bdf": "0000:08:00.0",
"link_status": "DISABLED",
"link_type": "PCIE"
},
{
"gpu": 1,
"bdf": "0000:44:00.0",
"link_status": "ENABLED",
"link_type": "SELF"
}
]
}
]
Fixes
- **Fix for GPU reset error on non-amdgpu cards**.
Previously our reset could attempting to reset non-amd GPUS- resuting in "Unable to reset non-amd GPU" error. Fix
updates CLI to target only AMD ASICs.
- **Fix for `amd-smi static --pcie` and `amdsmi_get_pcie_info()` Navi32/31 cards**.
Updated API to include `amdsmi_card_form_factor_t.AMDSMI_CARD_FORM_FACTOR_CEM`. Prevously, this would report "UNKNOWN". This fix
provides the correct board `SLOT_TYPE` associated with these ASICs (and other Navi cards).
- **Fix for `amd-smi process`**.
Fixed output results when getting processes running on a device.
- **Improved Error handling for `amd-smi process`**.
Fixed Attribute Error when getting process in csv format
Known issues
- `amd-smi bad-pages` can results with "ValueError: NULL pointer access" with certain PM FW versions.