Databricks-labs-ucx

Latest version: v0.57.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 12

0.34.0

Not secure

* Added a check for No isolation shared clusters and MLR ([2484](https://github.com/databrickslabs/ucx/issues/2484)). This commit introduces a check for `No isolation shared clusters` utilizing MLR as part of the assessment workflow and cluster crawler, addressing issue [#846](https://github.com/databrickslabs/ucx/issues/846). A new function, `is_mlr`, has been implemented to determine if the Spark version corresponds to an MLR cluster. If the cluster has no isolation and uses MLR, the assessment failure list is appended with an appropriate error message. Thorough testing, including unit tests and manual verification, has been conducted. However, user documentation and new CLI commands, workflows, tables, or unit/integration tests have not been added. Additionally, a new test has been added to verify the behavior of MLR clusters without isolation, enhancing the assessment workflow's accuracy in identifying unsupported configurations.
* Added a section in migration dashboard to list the failed tables, etc ([2406](https://github.com/databrickslabs/ucx/issues/2406)). In this release, we have introduced a new logging message format for failed table migrations in the `TableMigrate` class, specifically impacting the `_migrate_external_table`, `_migrate_external_table_hiveserde_in_place`, `_migrate_dbfs_root_table`, `_migrate_table_create_ctas`, `_migrate_table_in_mount`, and `_migrate_acl` methods within the `table_migrate.py` file. This update employs the `failed-to-migrate` prefix in log messages for improved failure reason identification during table migrations, enhancing debugging capabilities. As part of this release, we have also developed a new SQL file, `05_1_failed_table_migration.sql`, which retrieves a list of failed table migrations by extracting messages with the 'failed-to-migrate:' prefix from the inventory.logs table and returning the corresponding message text. While this release does not include new methods or user documentation, it resolves issue [#1754](https://github.com/databrickslabs/ucx/issues/1754) and has been manually tested with positive results in the staging environment, demonstrating its functionality.
* Added clean up activities when `migrate-credentials` cmd fails intermittently ([2479](https://github.com/databrickslabs/ucx/issues/2479)). This pull request enhances the robustness of the `migrate-credentials` command for Azure in the event of intermittent failures during the creation of access connectors and storage credentials. It introduces new methods, `delete_storage_credential` and `delete_access_connectors`, which are responsible for removing incomplete resources when errors occur. The `_migrate_service_principals` and `_create_storage_credentials_for_storage_accounts` methods now handle `PermissionDenied`, `NotFound`, and `BadRequest` exceptions, deleting created storage credentials and access connectors if exceptions occur. Additionally, error messages have been updated to guide users in resolving issues before attempting the operation again. The PR also modifies the `sp_migration` fixture in the `tests/unit/azure/test_credentials.py` file, simplifying the deletion process for access connectors and improving the testing of the `ServicePrincipalMigration` class. These changes address issue [#2362](https://github.com/databrickslabs/ucx/issues/2362), ensuring clean-up activities in case of intermittent failures and improving the overall reliability of the system.
* Added standalone migrate ACLs ([2284](https://github.com/databrickslabs/ucx/issues/2284)). A new `migrate-acls` command has been introduced to facilitate the migration of Access Control Lists (ACLs) from a legacy metastore to a Unity Catalog (UC) metastore. The command, designed to work with HMS federation and other table migration scenarios, can be executed with optional flags `target-catalog` and `hms-fed` to specify the target catalog and migrate HMS-FED ACLs, respectively. The release also includes modifications to the `labs.yml` file, adding the new command and its details to the `commands` section. In addition, a new `ACLMigrator` class has been added to the `databricks.labs.ucx.contexts.application` module to handle ACL migration for tables in a standalone manner. A new test file, `test_migrate_acls.py`, contains unit tests for ACL migration in a Hive metastore, covering various scenarios and ensuring proper query generation. These features streamline and improve the functionality of ACL migration, offering better access control management for users.
* Appends metastore_id or location_name to roles for uniqueness ([2471](https://github.com/databrickslabs/ucx/issues/2471)). A new method, `_generate_role_name`, has been added to the `Access` class in the `aws/access.py` file of the `databricks/labs/ucx` module to generate unique names for AWS roles using a consistent naming convention. The `list_uc_roles` method has been updated to utilize this new method for creating role names. In response to issue [#2336](https://github.com/databrickslabs/ucx/issues/2336), the `create_missing_principals` change enforces role uniqueness on AWS by modifying the `ExternalLocation` table to include `metastore_id` or `location_name` for uniqueness. To ensure proper cleanup, the `create_uber_principal` method has been updated to delete the instance profile if creating the cluster policy fails due to a `PermissionError`. Unit tests have been added to verify these changes, including tests for the new role name generation method and the updated `ExternalLocation` table. The `MetastoreAssignment` class is also imported in this diff, although its usage is not immediately clear. These changes aim to improve the creation of unique AWS roles for Databricks Labs UCX and enforce role uniqueness on AWS.
* Cache workspace content ([2497](https://github.com/databrickslabs/ucx/issues/2497)). In this release, we have implemented a caching mechanism for workspace content to improve load times and bypass rate limits. The `WorkspaceCache` class handles caching of workspace content, with the `_CachedIO` and `_PathLruCache` classes managing IO operation caching and LRU caching, respectively. The `_CachedPath` class, a subclass of `WorkspacePath`, handles caching of workspace paths. The `open` and `unlink` methods of `_CachedPath` have been overridden to cache results and remove corresponding cache entries. The `guess_encoding` function is used to determine the encoding of downloaded content. Unit tests have been added to ensure the proper functioning of the caching mechanism, including tests for cache reuse, invalidation, and encoding determination. This feature aims to enhance the performance of file operations, making the overall system more efficient for users.
* Changes the security mode for assessment cluster ([2472](https://github.com/databrickslabs/ucx/issues/2472)). In this release, the security mode of the `main` cluster assessment has been updated from LEGACY_SINGLE_USER to LEGACY_SINGLE_USER_STANDARD in the workflows.py file. This change disables passthrough and addresses issue [#1717](https://github.com/databrickslabs/ucx/issues/1717). The new data security mode is defined in the compute.ClusterSpec object for the `main` job cluster by modifying the data_security_mode attribute. While no new methods have been introduced, existing functionality related to the cluster's security mode has been modified. Software engineers adopting this project should be aware of the security implications of this change, ensuring the appropriate data protection measures are in place. Manual testing has been conducted to verify the functionality of this update.
* Do not normalize cases when reformatting SQL queries in CI check ([2495](https://github.com/databrickslabs/ucx/issues/2495)). In this release, the CI workflow for pushing changes to the repository has been updated to improve the behavior of the SQL query reformatting step. Previously, case normalization of SQL queries was causing issues with case-sensitive columns, resulting in blocked CI checks. This release addresses the issue by adding the `--normalize-case false` flag to the `databricks labs lsql fmt` command, which disables case normalization. This modification allows the CI workflow to pass and ensures correct SQL query formatting, regardless of case sensitivity. The change impacts the assessment/interactive directory, specifically a cluster summary query for interactive assessments. This query involves a change in the ORDER BY clause, replacing a normalized case with the original case. Despite these changes, no new methods have been added, and existing functionality has been modified solely to improve CI efficiency and SQL query compatibility.
* Drop source table after successful table move not before ([2430](https://github.com/databrickslabs/ucx/issues/2430)). In this release, we have addressed an issue where the source table was being dropped before a new table was created, which could cause the creation process to fail and leave the source table unavailable. This problem has been resolved by modifying the `_recreate_table` method of the `TableMove` class in the `hive_metastore` package to drop the source table after the new table creation. The updated implementation ensures that the source table remains intact during the creation process, even in case of any issues. This change comes with integration tests and does not involve any modifications to user documentation, CLI commands, workflows, tables, or existing functionality. Additionally, a new test function `test_move_tables_table_properties_mismatch_preserves_original` has been added to `test_table_move.py`, which checks if the original table is preserved when there is a mismatch in table properties during the move operation. The changes also include adding the `pytest` library and the `BadRequest` exception from the `databricks.sdk.errors` package for the new test function. The imports section has been updated accordingly with the removal of `databricks.sdk.errors.NotFound` and the addition of `pytest` and `databricks.sdk.errors.BadRequest`.
* Enabled `principal-prefix-access` command to run as collection ([2450](https://github.com/databrickslabs/ucx/issues/2450)). This commit introduces several improvements to the `principal-prefix-access` command in our open-source library. A new flag `run-as-collection` has been added, allowing the command to run as a collection across multiple AWS accounts. A new `get_workspace_context` function has also been implemented, which encapsulates common functionalities and enhances code reusability. Additionally, the `get_workspace_contexts` method has been developed to retrieve a list of `WorkspaceContext` objects, making the command more efficient when handling collections of workspaces. Furthermore, the `install_on_account` method has been updated to use the new `get_workspace_contexts` method. The `principal-prefix-access` command has been enhanced to accept an optional `acc_client` argument, which is used to retrieve information about the assessment run. These changes improve the functionality and organization of the codebase, making it more efficient, flexible, and easier to maintain for users working with multiple AWS accounts and workspaces.
* Fixed Driver OOM error by increasing the min memory requirement for node from 16GB to 32 GB ([2473](https://github.com/databrickslabs/ucx/issues/2473)). A modification has been implemented in the `policy.py` file located in the `databricks/labs/ucx/installer` directory, which enhances the minimum memory requirement for the node type from 16GB to 32GB. This adjustment is intended to prevent driver out-of-memory (OOM) errors during assessments. The `_definition` function in the `policy` class has been updated to incorporate the new memory requirement, which will be employed for selecting a suitable node type. The rest of the code remains unchanged. This modification addresses issue [#2398](https://github.com/databrickslabs/ucx/issues/2398). While the code has been tested, specific testing details are not provided in the commit message.
* Fixed issue when running create-missing-credential cmd tries to create the role again if already created ([2456](https://github.com/databrickslabs/ucx/issues/2456)). In this release, we have implemented a fix to address an issue in the `_identify_missing_paths` function within the `access.py` file of the `databricks/labs/ucx/aws` directory, where the `create-missing-credential` command was attempting to create a role again even if it had already been created. This issue was due to a mismatch in path comparison using the `match` function, which has now been updated to use the `startswith` function instead. This change ensures that the code checks if the path starts with the resource path, thereby resolving issue [#2413](https://github.com/databrickslabs/ucx/issues/2413). The `_identify_missing_paths` function identifies missing paths by loading UC compatible roles and iterating through each external location. If a location matches any of the resource paths of the UC compatible roles, the `matching_role` variable is set to True, and the code continues to the next role. If the location does not match any of the resource paths, the `matching_role` variable is set to False. If a match is found, the code continues to the next external location. If no match is found for any of the UC compatible roles, then the location is added to the `missing_paths` set. The diff also includes a conditional check to return an empty list if the `missing_paths` set is empty. Additionally, tests have been added or modified to ensure the proper functioning of the updated code, including unit tests and integration tests. However, there is no mention of manual testing or verification on a staging environment. Overall, this update fixes a specific issue with the `create-missing-credential` command and includes updated tests to ensure proper functionality.
* Fixed issue with Interactive Dashboard not showing output ([2476](https://github.com/databrickslabs/ucx/issues/2476)). In this release, we have resolved an issue with the Interactive Dashboard not displaying output by fixing a bug in the query used for the dashboard. Previously, the query was joining on "request_params.clusterid" and selecting "request_params.clusterid" in the SELECT clause, but the correct field name is "request_params.clusterId". The query has been updated to use "request_params.clusterId" instead, both in the JOIN and SELECT clauses. These changes ensure that the Interactive Dashboard displays the correct output, improving the overall functionality and usability of the product. No new methods were added, and existing functionality was changed within the scope of the Interactive Dashboard query. Manual testing is recommended to ensure that the output is now displayed correctly. Additionally, a change has been made to the 'test_installation.py' integration test file to improve the performance of clusters by updating the `min_memory_gb` argument from 16 GB to 32 GB in the `test_job_cluster_policy` function.
* Fixed support for table/schema scope for the revert table cli command ([2428](https://github.com/databrickslabs/ucx/issues/2428)). In this release, we have enhanced the `revert table` CLI command to support table and schema scopes in the open-source library. The `revert_migrated_tables` function now accepts optional parameters `schema` and `table` of types str or None, which were previously required parameters. Similarly, the `print_revert_report` function in the `tables_migrator` object within `WorkspaceContext` has been updated to accept the same optional parameters. The `revert_migrated_tables` function now uses these optional parameters when calling the `revert_migrated_tables` method of `tables_migrator` within 'ctx'. Additionally, we have introduced a new dictionary called `reverse_seen` and modified the `_get_tables_to_revert` and `print_revert_report` functions to utilize this dictionary, providing more fine-grained control when reverting table migrations. The `delete_managed` parameter is used to determine if managed tables should be deleted. These changes allow users to specify a specific schema and table to revert, rather than reverting all migrated tables within a workspace.
* Refactor view sequencing and return sequenced views if recursion is found ([2499](https://github.com/databrickslabs/ucx/issues/2499)). In this refactored code, the view sequencing for table migration has been improved and now returns sequenced views if recursion is found, addressing issue [#249](https://github.com/databrickslabs/ucx/issues/249)
* Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 ([2489](https://github.com/databrickslabs/ucx/issues/2489)). In this release, we have updated the version requirements for the `databricks-labs-lsql` package, changing it from greater than 0.5 and less than 0.9 to greater than 0.5 and less than 0.10. This update enables the use of newer versions of the package while maintaining compatibility with existing systems. The `databricks-labs-lsql` package is used for creating dashboards and managing SQL queries in Databricks. The pull request also includes detailed release notes, a comprehensive changelog, and a list of commits for the updated package. We recommend that all users of this package review the release notes and update to the new version to take advantage of the latest features and improvements.
* Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 ([2417](https://github.com/databrickslabs/ucx/issues/2417)). In this pull request, the `databricks-sdk` dependency has been updated from version `~=0.29.0` to `>=0.29,<0.31` to allow for the latest version of the package, which includes new features, bug fixes, internal changes, and other updates. This update is in response to the release of version `0.30.0` of the `databricks-sdk` library, which includes new features such as DataPlane support and partner support. In addition to the updated dependency, there have been changes to several files, including `access.py`, `fixtures.py`, `test_access.py`, and `test_workflows.py`. These changes include updates to method calls, import statements, and test data to reflect the new version of the `databricks-sdk` library. The `pyproject.toml` file has also been updated to reflect the new dependency version. This pull request does not include any other changes.
* Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 ([2431](https://github.com/databrickslabs/ucx/issues/2431)). In this pull request, we are updating the `sqlglot` dependency from version `>=25.5.0,<25.12` to `>=25.5.0,<25.13`. This update allows us to use the latest version of the `sqlglot` library, which includes several new features and bug fixes. Specifically, the new version includes support for `TryCast` generation and improvements to the `clickhouse` dialect. It is important to note that the previous version had a breaking change related to treating `DATABASE` as `SCHEMA` in `exp.Create`. Therefore, it is crucial to thoroughly test the changes before merging, as breaking changes may affect existing functionality.
* Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 ([2453](https://github.com/databrickslabs/ucx/issues/2453)). In this pull request, we have updated the required version range of the `sqlglot` package from `>=25.5.0,<25.13` to `>=25.5.0,<25.15`. This change allows us to install the latest version of the package, which includes several bug fixes and new features. These include improved transpilation of nullable/non-nullable data types and support for TryCast generation in ClickHouse. The changelog for `sqlglot` provides a detailed list of changes in each release, and a list of commits made in the latest release is also included in the pull request. This update will improve the functionality and reliability of our software, as we will now be able to take advantage of the latest features and fixes provided by `sqlglot`.
* Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 ([2480](https://github.com/databrickslabs/ucx/issues/2480)). In this release, we have updated the requirement range for the `sqlglot` dependency to '>=25.5.0,<25.17' from '<25.15,>=25.5.0'. This change resolves issues [#2452](https://github.com/databrickslabs/ucx/issues/2452) and [#2451](https://github.com/databrickslabs/ucx/issues/2451) and includes several bug fixes and new features in the `sqlglot` library version 25.16.1. The updated version includes support for timezone in exp.TimeStrToTime, transpiling from_iso8601_timestamp from presto/trino to duckdb, and mapping %e to %-d in BigQuery. Additionally, there are changes to the parser and optimizer, as well as other bug fixes and refactors. This update does not introduce any major breaking changes and should not affect the functionality of the project. The `sqlglot` library is used for parsing, analyzing, and rewriting SQL queries, and the new version range provides improved functionality and reliability.
* Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 ([2488](https://github.com/databrickslabs/ucx/issues/2488)). this pull request updates the sqlglot library requirement to version 25.5.0 or greater, but less than 25.18. By doing so, it enables the use of the latest version of sqlglot, while still maintaining compatibility with the current implementation. The changelog and commits for each release from v25.17.0 to v25.16.1 are provided for reference, detailing bug fixes, new features, and breaking changes. As a software engineer, it's important to review this pull request and ensure it aligns with the project's requirements before merging, to take advantage of the latest improvements and fixes in sqlglot.
* Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 ([2509](https://github.com/databrickslabs/ucx/issues/2509)). In this release, we have updated the required version of the `sqlglot` package in our project's dependencies. Previously, we required a version greater than or equal to 25.5.0 and less than 25.18, which has now been updated to require a version greater than or equal to 25.5.0 and less than 25.19. This change was made automatically by Dependabot, a service that helps to keep dependencies up to date, in order to permit the latest version of the `sqlglot` package. The pull request contains a detailed list of the changes made in the `sqlglot` package between versions 25.5.0 and 25.18.0, as well as a list of the commits that were made during this time. These details can be helpful for understanding the potential impact of the update on the project.
* [chore] make `GRANT` migration logic isolated to `MigrateGrants` component ([2492](https://github.com/databrickslabs/ucx/issues/2492)). In this release, the grant migration logic has been isolated to a separate `MigrateGrants` component, enhancing code modularity and maintainability. This new component, along with the `ACLMigrator`, is now responsible for handling grants and Access Control Lists (ACLs) migration. The `MigrateGrants` class takes grant loaders as input, applies grants to a Unity Catalog (UC) table based on a given source table, and is utilized in the `acl_migrator` method. The `ACLMigrator` class manages ACL migration for the migrated tables, taking instances of necessary classes as arguments and setting ACLs for the migrated tables based on the migration status. These changes bring better separation of concerns, making the code easier to understand, test, and maintain.

Dependency updates:

* Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 ([2417](https://github.com/databrickslabs/ucx/pull/2417)).
* Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 ([2431](https://github.com/databrickslabs/ucx/pull/2431)).
* Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 ([2453](https://github.com/databrickslabs/ucx/pull/2453)).
* Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 ([2480](https://github.com/databrickslabs/ucx/pull/2480)).
* Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 ([2489](https://github.com/databrickslabs/ucx/pull/2489)).
* Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 ([2488](https://github.com/databrickslabs/ucx/pull/2488)).
* Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 ([2509](https://github.com/databrickslabs/ucx/pull/2509)).

0.33.0

Not secure

* Added `validate-table-locations` command for checking overlapping tables across workspaces ([2341](https://github.com/databrickslabs/ucx/issues/2341)). A new command, `validate-table-locations`, has been added to check for overlapping table locations across workspaces before migrating tables. This command is intended to ensure that tables can be migrated across workspaces without issues. The new command is part of the table migration workflows and uses a `LocationTrie` data structure to efficiently search for overlapping table locations. If any overlaps are found, the command logs a warning message and adds the conflicting tables to a list of all conflicts. This list is returned at the end of the command. The `validate-table-locations` command is intended to be run before migrating tables to ensure that the tables can be migrated without conflicts. The command includes a `workspace-ids` flag, which allows users to specify a list of workspace IDs to include in the validation. If this flag is not provided, the command will include all workspaces present in the account. This new command resolves issue [#673](https://github.com/databrickslabs/ucx/issues/673). The `validate_table_locations` method is added to the `AccountAggregate` class and the `ExternalLocations` class has been updated to use the new `LocationTrie` class. The import section has also been updated to include new modules such as `LocationTrie` and `Table` from `databricks.labs.ucx.hive_metastore.locations` and `databricks.labs.ucx.hive_metastore.tables` respectively. Additionally, test cases have been added to ensure the correct functioning of the `LocationTrie` class.
* Added references to hive_metastore catalog in all table references an… ([2419](https://github.com/databrickslabs/ucx/issues/2419)). In this release, we have updated various methods and functions across multiple files to include explicit references to the `hive_metastore` catalog in table references. This change aims to improve the accuracy and consistency of table references in the codebase, enhancing reliability and maintainability. Affected files include `azure.py`, `init_scripts.py`, `pipelines.py`, and others in the `databricks/labs/ucx/assessment` module, as well as test files in the `tests/unit/assessment` and `tests/unit/azure` directories. The `_try_fetch` method has been updated to include the catalog name in table references in all instances, ensuring the correct catalog is referenced in all queries. Additionally, various test functions in affected files have been updated to reference the `hive_metastore` catalog in SQL queries. This update is part of the resolution of issue [#2207](https://github.com/databrickslabs/ucx/issues/2207) and promotes robust handling of catalog, schema, and table naming scenarios in hive metastore migration status management.
* Added support for skipping views when migrating tables and views ([2343](https://github.com/databrickslabs/ucx/issues/2343)). In this release, we've added support for skipping both tables and views during the migration process in the `databricks labs ucx` command, addressing issue [#1937](https://github.com/databrickslabs/ucx/issues/1937). The `skip` command has been enhanced to support skipping views, and new functions `skip_table_or_view` and `load_one` have been introduced to the `Table` class. Appropriate error handling and tests, including unit tests and integration tests, have been implemented to ensure the functionality works as expected. With these changes, users can now skip views during migration and have more flexibility when working with tables in the Unity Catalog.
* Avoid false positives when linting for pyspark patterns ([2381](https://github.com/databrickslabs/ucx/issues/2381)). This release includes enhancements to the PySpark linter aimed at reducing false positives during linting. The linter has been updated to check the originating module when detecting PySpark calls, ensuring that warnings are triggered only for relevant nodes from the pyspark or dbutils modules. Specifically, the `ReturnValueMatcher` and `DirectFilesystemAccessMatcher` classes have been modified to include this new check. These changes improve the overall accuracy of the PySpark linter, ensuring that only pertinent warnings are surfaced during linting. Additionally, the commit includes updated unit tests to verify the correct behavior of the modified linter. Specific improvements have been made to avoid false positives when detecting the `listTables` function in the PySpark catalog, ensuring that the warning is only triggered for the actual PySpark `listTables` method call.
* Bug: Generate custom warning when doing table size check and encountering DELTA_INVALID_FORMAT exception ([2426](https://github.com/databrickslabs/ucx/issues/2426)). A modification has been implemented in the `_safe_get_table_size` method within the `table_size.py` file of the `hive_metastore` package. This change addresses an issue ([#1913](https://github.com/databrickslabs/ucx/issues/1913)) concerning the occurrence of a `DELTA_INVALID_FORMAT` exception while determining the size of a Delta table. Instead of raising an error, the exception is now converted into a warning, and the function proceeds to process the rest of the table. A corresponding warning message has been added to inform users about the issue and suggest checking the table structure. No new methods have been introduced, and existing functionalities have been updated to handle this specific exception more gracefully. The changes have been thoroughly tested with unit tests for the table size check when encountering a `DELTA_INVALID_FORMAT` error, employing a mock backend and a mock Spark session to simulate the error conversion. This change does not affect user documentation, CLI commands, workflows, or tables, and is solely intended for software engineers adopting the project.
* Clean up left over uber principal resources for Azure ([2370](https://github.com/databrickslabs/ucx/issues/2370)). This commit includes modifications to the Azure access module of the UCX project to clean up resources if the creation of the uber principal fails midway. It addresses issues [#2360](https://github.com/databrickslabs/ucx/issues/2360) (Azure part) and [#2363](https://github.com/databrickslabs/ucx/issues/2363), and modifies the command `databricks labs ucx create-uber-principal` to include this functionality. The changes include adding new methods and modifying existing ones for working with Azure resources, such as `StorageAccount`, `AccessConnector`, and `AzureRoleAssignment`. Additionally, new unit and integration tests have been added and manually tested to ensure that the changes work as intended. The commit also includes new fixtures for testing storage accounts and access connectors, and a test case for getting, applying, and deleting storage permissions. The `azure_api_client` function has been updated to handle different input argument lengths and methods such as "get", "put", and "post". A new managed identity, "appIduser1", has been added to the Azure mappings file, and the corresponding role assignments have been updated. The changes include error handling mechanisms for certain scenarios that may arise during the creation of the uber service principal.
* Crawlers: Use `TRUNCATE TABLE` instead of `DELETE FROM` when resetting crawler tables ([2392](https://github.com/databrickslabs/ucx/issues/2392)). In this release, the `.reset()` method for crawlers has been updated to use `TRUNCATE TABLE` instead of `DELETE FROM` when clearing out crawler tables, resulting in more efficient and idiomatic code. This change affects the existing `migrate-data-reconciliation` workflow and is accompanied by updated unit and integration tests to ensure correct functionality. The `reset()` method now accepts a table name argument, which is passed to the newly introduced `escape_sql_identifier()` utility function from the `databricks.labs.ucx.framework.utils` module for added safety. The migration status is now refreshed using the `TRUNCATE TABLE` command, which removes all records from the table, providing improved performance compared to the previous implementation. The `SHOW DATABASES` and `TRUNCATE TABLE` queries are validated in the `refresh_migration_status` workflow test, which now checks if the `TRUNCATE TABLE` query is used instead of `DELETE FROM` when resetting crawler tables.
* Detect tables that are not present in the mapping file ([2205](https://github.com/databrickslabs/ucx/issues/2205)). In this release, we have introduced a new method `get_remaining_tables()` that returns a list of tables in the Hive metastore that have not been processed by the migration tool. This method performs a full refresh of the index and checks each table in the Hive metastore against the index to determine if it has been migrated. We have also added a new private method `_is_migrated()` to check if a given table has already been migrated. Additionally, we have replaced the `refresh_migration_status` method with `update_migration_status` in several workflows to present a more accurate representation of the migration process in the dashboard. A new SQL script, 04_1_remaining_hms_tables.sql, has been added to list the remaining tables in Hive Metastore which are not present in the mapping file. We have also added a new test for the table migration job that verifies that tables not present in the mapping file are detected and reported. A new test function `test_refresh_migration_status_published_remained_tables` has been added to ensure that the migration process correctly handles the case where tables have been published to the target metadata store but still remain in the source metadata store. These changes are intended to improve the functionality of the migration tool for Hive metastore tables and resolve issue [#1221](https://github.com/databrickslabs/ucx/issues/1221).
* Fixed ConcurrentDeleteReadException in migrate-view task during table migration ([2282](https://github.com/databrickslabs/ucx/issues/2282)). In this release, we have implemented a fix for the ConcurrentDeleteReadException that occurred during the `migrate-view` command's migration task. The solution involved moving the refreshing of migration status between batches from within the batches. Along with the fix, we added a new method `index()` to the `TableMigrate` class, which checks if a table has been migrated or not. This method is utilized in the `_view_can_be_migrated` method to ensure that all dependencies of a view have been migrated before migrating the view. The `index_full_refresh()` method, which earlier performed this check, has been modified to refresh the index between batches instead of within batches. It is worth noting that the changes made have been manually tested, but no unit tests, integration tests, or verification on staging environments have been added. The target audience for this release is software engineers who adopt this project. No new documentation, commands, workflows, or tables have been added or modified in this release.
* Fixed documentation typos: `create-missing-pricipals` -> `create-missing-principals` ([2357](https://github.com/databrickslabs/ucx/issues/2357)). This pull request resolves typographical errors in the `create-missing-principals` command documentation, correcting the mistaken usage of `create-missing-pricipals` throughout the project documentation. The changes encompass the command description, the UCX command section, and the manual process documentation for AWS storage credentials. The `create-missing-principals` command, utilized by Cloud Admins to create and configure new AWS roles for Unity Catalog access, remains functionally unaltered.
* Fixed linting for Spark Python workflow tasks ([2349](https://github.com/databrickslabs/ucx/issues/2349)). This commit updates the linter to support PySpark tasks in workflows by modifying the existing `experimental-workflow-linter` to correctly handle these tasks. Previously, the linter assumed Python files were Jupyter notebooks, but PySpark tasks are top-level Python files run as `__main__`. This change introduces a new `ImportFileResolver` class to resolve imports for PySpark tasks, and updates to the `DependencyResolver` class to properly handle them. Additionally, unit and integration tests have been updated and added to ensure the correct behavior of the linter. The `DependencyResolver` constructor now accepts an additional `import_resolver` argument in some instances. This commit resolves issue [#2213](https://github.com/databrickslabs/ucx/issues/2213) and improves the accuracy and versatility of the linter for different types of Python files.
* Fixed missing `security_policy` when updating SQL warehouse config ([2409](https://github.com/databrickslabs/ucx/issues/2409)). In this release, we have added new methods `GetWorkspaceWarehouseConfigResponseSecurityPolicy` and `SetWorkspaceWarehouseConfigRequestSecurityPolicy` to improve handling of SQL warehouse config security policy. We have introduced a new variable `security_policy` to store the security policy value, which is used when updating the SQL warehouse configuration, ensuring that the required security policy is set and fixing the `InvalidParameterValue: Endpoint security policy is required and must be one of NONE, DATA_ACCESS_CONTROL, PASSTHROUGH` error. Additionally, when the `enable_serverless_compute` error occurs, the new SQL warehouse data access config is printed in the log, allowing users to manually configure the uber principal in the UI. We have also updated the `create_uber_principal` method to set the security policy correctly and added parameterized tests to test the setting of the warehouse configuration security policy. The `test_create_global_spn` method has been updated to include the `security_policy` parameter in the `create_global_spn` method call, and new test cases have been added to verify the warehouse config's security policy is correctly updated. These enhancements will help make the system more robust and user-friendly.
* Fixed raise logs `ResourceDoesNotExists` when iterating the log paths ([2382](https://github.com/databrickslabs/ucx/issues/2382)). In this commit, we have improved the handling of the `ResourceDoesNotExist` exception when iterating through log paths in the open-source library. Previously, the exception was not being properly raised or handled, resulting in unreliable code behavior. To address this, we have added unit tests in the `test_install.py` file that accurately reflect the actual behavior when iterating log paths. We have also modified the test to raise the `ResourceDoesNotExist` exception when the result is iterated over, rather than when the method is called. Additionally, we have introduced the `ResourceDoesNotExistIter` class to make it easier to simulate the error during testing. These changes ensure that the code can gracefully handle cases where the specified log path does not exist, improving the overall reliability and robustness of the library. Co-authored by Andrew Snare.
* Generate custom error during installation due to external metastore connectivity issues ([2425](https://github.com/databrickslabs/ucx/issues/2425)). In this release, we have added a new custom error `OperationFailed` to the `InstallUcxError` enumeration in the `databricks/labs/ucx/install.py` file. This change is accompanied by an exception handler that checks if the error message contains a specific string related to AWS credentials, indicating an issue with external metastore connectivity. If this condition is met, a new `OperationFailed` error is raised with a custom message, providing instructions for resolving the external metastore connectivity issue and re-running the UCX installation. This enhancement aims to provide a more user-friendly error message for users encountering issues during UCX installation due to external metastore connectivity. The functionality of the `_create_database` method has been modified to include this new error handling mechanism, with no alterations made to the existing functionality. Although the changes have not been tested using unit tests, integration tests, or staging environments, they have been manually tested on a workspace with incorrect external metastore connectivity.
* Improve logging when waiting for workflows to complete ([2364](https://github.com/databrickslabs/ucx/issues/2364)). This pull request enhances the logging and error handling of the databricks labs ucx project, specifically when executing workflows. It corrects a bug where `skip_job_wait` was not handled correctly, and now allows for a specified timeout period instead of the default 20 minutes. Job logs are replicated into the local logfile if a workflow times out. The existing `databricks labs ucx ensure-assessment-run` and `databricks labs ucx migrate-tables` commands have been updated, and unit and integration tests have been added. These improvements provide more detailed and informative logs, and ensure the quality of the code through added tests. The behavior of the `wait_get_run_job_terminated_or_skipped` method has been modified in the tests, and the `assign_metastore` function has also been updated, but the specific changes are not specified in the commit message.
* Lint dependencies consistently ([2400](https://github.com/databrickslabs/ucx/issues/2400)). In this release, the `files.py` module in the `databricks/labs/ucx/source_code/linters` package has been updated to ensure consistent linting of jobs. Previously, the linting sequence was not consistent and did not provide inherited context when linting jobs. These issues have been resolved by modifying the `_lint_one` function to build an inherited tree when linting files or notebooks, and by adding a `FileLinter` object to determine which file/notebook linter to use for linting. The `_lint_task` function has also been updated to yield a `LocatedAdvice` object instead of a tuple, and takes an additional argument, `linted_paths`, which is a set of paths that have been previously linted. Additionally, the `_lint_notebook` and `_lint_file` functions have been removed as their functionality is now encompassed by the `_lint_one` function. These changes ensure consistent linting of jobs and inherited context. Unit tests were run and passed. Co-authored by Eric Vergnaud.
* Make Lakeview names unique ([2354](https://github.com/databrickslabs/ucx/issues/2354)). A change has been implemented to guarantee the uniqueness of dataset names in Lakeview dashboards, addressing issue [#2345](https://github.com/databrickslabs/ucx/issues/2345) where non-unique names caused system errors. This change includes renaming the `count` dataset to `fourtytwo` in the `datasets` field and updating the `name` field for a widget query from `count` to 'counter'. These internal adjustments streamline the Lakeview system's functionality while ensuring consistent and unique dataset naming.
* Optimisation: when detecting if a file is a notebook only read the start instead of the whole file ([2390](https://github.com/databrickslabs/ucx/issues/2390)). In this release, we have optimized the detection of notebook files during linting in a non-Workspace path in our open-source library. This has been achieved by modifying the `is_a_notebook` function in the `base.py` file to improve efficiency. The function now checks the start of the file instead of loading the entire file, which is more resource-intensive. If the content of the file is not available, it will attempt to read the file header when opening the file, instead of returning False if there's an error. The `magic_header` is used to determine if the file is a notebook or not. If the content is available, the function will check if it starts with the `magic_header` to determine if a file is a notebook or not, without having to read the entire file, which can be time-consuming and resource-intensive. This change will improve the linting performance and reduce resource usage, making the library more efficient and user-friendly for software engineers.
* Retry dashboard install on DeadlineExceeded ([2379](https://github.com/databrickslabs/ucx/issues/2379)). In this release, we have added the DeadlineExceeded exception to the retried decorator in the _create_dashboard method, which is used to create a lakeview dashboard from SQL queries in a folder. This modification to the existing functionality is intended to improve the reliability of dashboard installation by retrying the operation up to four minutes in case of a DeadlineExceeded error. This change resolves issues [#2376](https://github.com/databrickslabs/ucx/issues/2376), [#2377](https://github.com/databrickslabs/ucx/issues/2377), and [#2389](https://github.com/databrickslabs/ucx/issues/2389), which were related to dashboard installation timeouts. Software engineers will benefit from this update as it will ensure successful installation of dashboards, even in scenarios where timeouts were previously encountered.
* Updated databricks-labs-lsql requirement from <0.8,>=0.5 to >=0.5,<0.9 ([2416](https://github.com/databrickslabs/ucx/issues/2416)). In this update, we have modified the version requirement for the `databricks-labs-lsql` library to allow version 0.8, which includes bug fixes and improvements in dashboard creation and deployment. We have changed the version constraint from '<0.8,>=0.5' to '>=0.5,<0.9' to accommodate the latest version while preventing future major version upgrades. This change enhances the overall design of the system and simplifies the code for managing dashboard deployment. Additionally, we have introduced a new test that verifies the `deploy_dashboard` method is no longer being used, utilizing the `deprecated_call` function from pytest to ensure that calling the method raises a deprecation warning. This ensures that the system remains maintainable and up-to-date with the latest version of the `databricks-labs-lsql` library.
* Updated ext hms detection to include more conf attributes ([2414](https://github.com/databrickslabs/ucx/issues/2414)). In this enhancement, the code for installing UCX has been updated to include additional configuration attributes for detecting external Hive Metastore (HMS). The previous implementation failed to recognize specific attributes like "spark.hadoop.hive.metastore.uris". This update introduces a new list of recognized prefixes for external HMS spark attributes, namely "spark_conf.spark.sql.hive.metastore", "spark_conf.spark.hadoop.hive.metastore", "spark_conf.spark.hadoop.javax.jdo.option", and "spark_conf.spark.databricks.hive.metastore". This change enables the extraction of a broader set of configuration attributes during installation, thereby improving the overall detection and configuration of external HMS attributes.
* Updated sqlglot requirement from <25.11,>=25.5.0 to >=25.5.0,<25.12 ([2415](https://github.com/databrickslabs/ucx/issues/2415)). In this pull request, we have updated the `sqlglot` dependency to a new version range, `>=25.5.0,<25.12`, to allow for the latest version of `sqlglot` to be used while ensuring that the version does not exceed 25.12. This change includes several breaking changes, new features, and bug fixes. New features include support for ALTER VIEW AS SELECT, UNLOAD, and other SQL commands, as well as improvements in performance. Bug fixes include resolutions for issues related to OUTER/CROSS APPLY parsing, GENERATE_TIMESTAMP_ARRAY, and other features. Additionally, there are changes that improve the handling of various SQL dialects, including BigQuery, DuckDB, Snowflake, Oracle, and ClickHouse. The changelog and commits also reveal fixes for bugs and improvements in the library since the last version used in the project. By updating the `sqlglot` dependency to the new version range, the project can leverage the newly added features, bug fixes, and performance improvements while ensuring that the version does not exceed 25.12.
* Updated sqlglot requirement from <25.9,>=25.5.0 to >=25.5.0,<25.11 ([2403](https://github.com/databrickslabs/ucx/issues/2403)). In this release, we have updated the required version range for the `sqlglot` library from '[25.5.0, 25.9)' to '[25.5.0, 25.11)'. This update allows us to utilize newer versions of `sqlglot` while maintaining compatibility with previous versions. The `sqlglot` team's latest release, version 25.10.0, includes several breaking changes, new features, bug fixes, and refactors. Notable changes include support for STREAMING tables in Databricks, transpilation of Snowflake's CONVERT_TIMEZONE function for DuckDB, support for GENERATE_TIMESTAMP_ARRAY in BigQuery, and the ability to parse RENAME TABLE as a Command in Teradata. Due to these updates, we strongly advise conducting thorough testing before deploying to production, as these changes may impact the functionality of the project.
* Use `load_table` instead of deprecated `is_view` in failing integration test `test_mapping_skips_tables_databases` ([2412](https://github.com/databrickslabs/ucx/issues/2412)). In this release, the `is_view` parameter of the `skip_table_or_view` method in the `test_mapping_skips_tables_databases` integration test has been deprecated and replaced with a more flexible `load_table` parameter. This change allows for greater control and customization when specifying how a table should be loaded during the test. The `load_table` parameter is a callable that returns a `Table` object, which contains information about the table's schema, name, object type, and table format. This improvement removes the use of a deprecated parameter, enhancing the maintainability of the test code.

Dependency updates:

* Updated sqlglot requirement from <25.9,>=25.5.0 to >=25.5.0,<25.11 ([2403](https://github.com/databrickslabs/ucx/pull/2403)).
* Updated databricks-labs-lsql requirement from <0.8,>=0.5 to >=0.5,<0.9 ([2416](https://github.com/databrickslabs/ucx/pull/2416)).
* Updated sqlglot requirement from <25.11,>=25.5.0 to >=25.5.0,<25.12 ([2415](https://github.com/databrickslabs/ucx/pull/2415)).

0.32.0

Not secure

* Added troubleshooting guide for self-signed SSL cert related error ([2346](https://github.com/databrickslabs/ucx/issues/2346)). In this release, we have added a troubleshooting guide to the README file to address a specific error that may occur when connecting from a local machine to a Databricks Account and Workspace using a web proxy and self-signed SSL certificate. This error, SSLCertVerificationError, can prevent UCX from connecting to the Account and Workspace. To resolve this issue, users can now set the `REQUESTS_CA_BUNDLE` and `CURL_CA_BUNDLE` environment variables to force the requests library to set `verify=False`, and set the `SSL_CERT_DIR` env var pointing to the proxy CA cert for the urllib3 library. This guide will help users understand and resolve this error, making it easier to connect to Databricks Accounts and Workspaces using a web proxy and self-signed SSL certificate.
* Code Compatibility Dashboard: Fix broken links ([2347](https://github.com/databrickslabs/ucx/issues/2347)). In this release, we have addressed and resolved two issues in the Code Compatibility Dashboard of the UCX Migration (Main) project, enhancing its overall usability. Previously, the Markdown panel contained a broken link to the workflow due to an incorrect anchor, and the links in the table widget to the workflow and task definitions did not render correctly. These problems have been rectified, and the dashboard has been manually tested and verified in a staging environment. Additionally, we have updated the `invisibleColumns` section in the SQL file by changing the `fieldName` attribute to 'name', which will now display the `workflow_id` as a link. Before and after screenshots have been provided for visual reference. The corresponding workflow is now referred to as "Jobs Static Code Analysis Workflow".
* Filter out missing import problems for imports within a try-except clause with ImportError ([2332](https://github.com/databrickslabs/ucx/issues/2332)). This release introduces changes to handle missing import problems within a try-except clause that catches ImportError. A new method, `_filter_import_problem_in_try_except`, has been added to filter out import-not-found issues when they occur in such a clause, preventing unnecessary build failures. The `_register_import` method now returns an Iterable[DependencyProblem] instead of yielding problems directly. Supporting classes and methods, including Dependency, DependencyGraph, and DependencyProblem from the databricks.labs.ucx.source_code.graph module, as well as FileLoader and PythonCodeAnalyzer from the databricks.labs.ucx.source_code.notebooks.cells module, have been added. The ImportSource.extract_from_tree method has been updated to accept a DependencyProblem object as an argument. Additionally, a new test case has been included for the scenario where a failing import in a try-except clause goes unreported. Issue [#1705](https://github.com/databrickslabs/ucx/issues/1705) has been resolved, and unit tests have been added to ensure proper functionality.
* Fixed `report-account-compatibility` cli command docstring ([2340](https://github.com/databrickslabs/ucx/issues/2340)). In this release, we have updated the `report-account-compatibility` CLI command's docstring to accurately reflect its functionality, addressing a previous issue where it inadvertently duplicated the `sync-workspace-info` command's description. This command now provides a clear and concise explanation of its purpose: "Report compatibility of all workspaces available in the account." Upon execution, it generates a readiness report for the account, specifically focusing on workspaces where ucx is installed. This enhancement improves the clarity of the CLI's functionality for software engineers, enabling them to understand and effectively utilize the `report-account-compatibility` command.
* Fixed broken table migration workflow links in README ([2286](https://github.com/databrickslabs/ucx/issues/2286)). In this release, we have made significant improvements to the README file of our open-source library, including fixing broken links and adding a mermaid flowchart to demonstrate the table migration workflows. The table migration workflow has been renamed to the table migration process, which includes migrating Delta tables, non-Delta tables, external tables, and views. Two optional workflows have been added for migrating HiveSerDe tables in place and for migrating external tables using CTAS. Additionally, the commands related to table migration have been updated, with the table migration workflow being renamed to the table migration process. These changes are aimed at providing a more comprehensive understanding of the table migration process and enhancing the overall user experience.
* Fixed dashboard queries fail when default catalog is not `hive_metastore` ([2278](https://github.com/databrickslabs/ucx/issues/2278)). In this release, we have addressed an issue where dashboard queries fail when the default catalog is not set to `hive_metastore`. This has been achieved by modifying the existing `databricks labs ucx install` command to always include the `hive_metastore` namespace in dashboard queries. Additionally, the code has been updated to add the `hive_metastore` namespace to the `DashboardMetadata` object used in creating a dashboard from SQL queries in a folder, ensuring queries are executed in the correct database. The commit also includes modifications to the `test_install.py` unit test file to ensure the installation process correctly handles specific configurations related to the `ucx` namespace for managing data storage and retrieval. The changes have been manually tested and verified on a staging environment.
* Improve group migration error reporting ([2344](https://github.com/databrickslabs/ucx/issues/2344)). This PR introduces enhancements to the group migration dashboard, focusing on improved error reporting and a more informative user experience. The documentation widgets have been fine-tuned, and the failed-migration widget now provides formatted failure information with a link to the failed job run. The dashboard will display only failures from the latest workflow run, complete with logs. A new link to the job list has been added in the [workflows](/jobs) section of the documentation to assist users in identifying and troubleshooting issues. Additionally, the SQL query for retrieving group migration failure information has been refactored, improving readability and extracting relevant data using regular expressions. The changes have been tested and verified on the staging environment, providing clearer and more actionable insights during group migrations. The PR is related to previous work in [#2333](https://github.com/databrickslabs/ucx/issues/2333) and [#1914](https://github.com/databrickslabs/ucx/issues/1914), with updates to the UCX Migration (Groups) dashboard, but no new methods have been added.
* Improve type checking in cli command ([2335](https://github.com/databrickslabs/ucx/issues/2335)). This release introduces enhanced type checking in the command line interface (CLI) of our open-source library, specifically in the `lint_local_code` function of the `cli.py` file. By utilizing a newly developed local code linter object, the function now performs more rigorous and accurate type checking for potential issues in the local code. While the functionality remains consistent, this improvement is expected to prevent similar occurrences like issue [#2221](https://github.com/databrickslabs/ucx/issues/2221), ensuring more robust and reliable code. This change underscores our commitment to delivering a high-quality, efficient, and developer-friendly library.
* Lint dependencies in context ([2236](https://github.com/databrickslabs/ucx/issues/2236)). The `InheritedContext` class has been introduced to gather code fragments from parent files or notebooks during linting of child files or notebooks, addressing issues [#2155](https://github.com/databrickslabs/ucx/issues/2155), [#2156](https://github.com/databrickslabs/ucx/issues/2156), and [#2221](https://github.com/databrickslabs/ucx/issues/2221). This new feature includes the addition of the `InheritedContext` class, with methods for building instances from a route of dependencies, appending other `InheritedContext` instances, and finalizing them for use with linting. The `DependencyGraph` class has been updated to support the new functionality, and various classes, methods, and functions for handling the linter context have been added or updated. Unit, functional, and integration tests have been added to ensure the correct functioning of the changes, which improve the linting functionality by allowing it to consider the broader context of the codebase.
* Make ucx pylsp plugin configurable ([2280](https://github.com/databrickslabs/ucx/issues/2280)). This commit introduces the ability to configure the ucx pylsp plugin with cluster information, which can be provided either in a file or by a client and is managed by the pylsp infrastructure. The Spark Connect linter is now only applied to UC Shared clusters, as Single-User clusters run in Spark Classic mode. A new entry point `pylsp_ucx` has been added to the pylsp configuration file. The changes affect the pylsp plugin configuration and the application of the Spark Connect linter. Unit tests and manual testing have been conducted, but integration tests and verification on a staging environment are not included in this release.
* New dashboard: group migration, showing groups that failed to migrate ([2333](https://github.com/databrickslabs/ucx/issues/2333)). In this release, we have developed a new dashboard for monitoring group migration in the UCX Migration (Groups) workspace. This dashboard includes a widget displaying messages related to groups that failed to migrate during the `migrate-groups-experimental` workflow, aiding users in identifying and addressing migration issues. The group migration process consists of several steps, including renaming workspace groups, provisioning account-level groups, and replicating permissions. The release features new methods for displaying and monitoring migration-related messages, as well as links to documentation and workflows for assessing, validating, and removing workspace-level groups post-migration. The new dashboard is currently not connected to the existing system, but it has undergone manual testing and verification on the staging environment. The changes include the addition of a new SQL query file to implement the logic for fetching group migration failures and a new Markdown file displaying the Group Migration Failures section.
* Support spaces in run cmd args ([2330](https://github.com/databrickslabs/ucx/issues/2330)). The recent commit resolves an issue where the system had trouble handling spaces in command-line arguments when running subprocesses. The previous implementation only accepted a full command line, which it would split on spaces, causing problems when the command line contained arguments with spaces. The new implementation supports argument lists, which are passed `as is` to `Popen`, allowing for proper handling of command lines with spaces. This change is incorporated in the `run_command` function of the `utils.py` file and the `_install_pip` method of the `PythonLibraryResolver` class. The `shlex.join()` function has been replaced with direct string formatting for increased flexibility. The feature is intended for use with the `PythonLibraryResolver` class and is co-authored by Eric Vergnaud and Andrew Snare. Integration tests have been enabled to ensure the proper functioning of the updated code.
* Updated error messages for SparkConnect linter ([2348](https://github.com/databrickslabs/ucx/issues/2348)). The SparkConnect linter's error messages have been updated to improve clarity and precision. The term `UC Shared clusters` has been replaced with `Unity Catalog clusters in Shared access mode` throughout the codebase, affecting messages related to various unsupported functionalities or practices on these clusters. These changes include warnings about direct Spark log level setting, accessing the Spark Driver JVM or its logger, using `sc`, and employing RDD APIs. This revision enhances user experience by providing more accurate and descriptive error messages, enabling them to better understand and address the issues in their code. The functionality of the linter remains unchanged.
* Updated sqlglot requirement from <25.8,>=25.5.0 to >=25.5.0,<25.9 ([2279](https://github.com/databrickslabs/ucx/issues/2279)). In this update, we have updated the required version range of the `sqlglot` dependency in the 'pyproject.toml' file from 'sqlglot>=25.5.0,<25.8' to 'sqlglot>=25.5.0,<25.9'. This change allows the project to utilize any version of `sqlglot` that is greater than or equal to 25.5.0 and less than 25.9, including the latest version. The update also includes a changelog for the updated version range, sourced from 'sqlglot's official changelog. This changelog includes various bug fixes and new features for several dialects such as BigQuery, DuckDB, and tSQL. Additionally, the parser has undergone some refactors and improvements. The commits section lists the individual commits included in this update.

Dependency updates:

* Updated sqlglot requirement from <25.8,>=25.5.0 to >=25.5.0,<25.9 ([2279](https://github.com/databrickslabs/ucx/pull/2279)).

0.31.0

Not secure

* Added handling for corrupted dashboard state in installation process ([2262](https://github.com/databrickslabs/ucx/issues/2262)). This commit introduces a new method, `_handle_existing_dashboard`, to manage various scenarios related to an existing dashboard during the installation process of UCX, including updating, upgrading from Redash to Lakeview, handling trashed dashboards, and recovering corrupted dashboard references. The `_create_dashboard` method now requires a non-null `parent_path` and has updated its docstring. New unit and integration tests have been added to verify the changes, with a particular focus on handling corrupted dashboard state during the installation process. These tests resolve issue [#2261](https://github.com/databrickslabs/ucx/issues/2261) and improve the reliability of the installation process. Additionally, the commit includes new fixtures such as `ws`, `any_prompt`, and `clusters` to create controlled test environments and ensure proper validation of the dashboard and removal of the database. The `DashboardMetadata` class has been removed from `databricks.labs.lsql.dashboards`, and the `ProductInfo` and `WheelsV2` classes have been updated in `databricks.labs.blueprint.wheels`. The `remove_database` method has also been added to the `MockBackend` class in `databricks.labs.lsql.backends`. This comprehensive update enhances the robustness of the installation process and the handling of different dashboard types and scenarios.
* Added support for migrating Table ACL for SQL Warehouse cluster in AWS using Instance Profile and Azure using SPN ([2258](https://github.com/databrickslabs/ucx/issues/2258)). This pull request introduces support for migrating Table Access Control (ACL) for SQL Warehouse clusters in Amazon Web Services (AWS) using Instance Profiles and Azure using Service Principal Names (SPNs). It includes identifying SQL Warehouse instances, associated Instance Profiles/SPNs, and principals with access to the warehouse, as well as retrieving their permissions and external locations. The code has been enhanced to represent compute and associated permissions more flexibly, improving the project's handling of both clusters and warehouses. New test cases and a retry mechanism for transient errors have been implemented to ensure robustness. The `AzureServicePrincipalCrawler` class has been added for managing SQL Warehouses using SPNs in Azure and Instance Profiles in AWS. These changes resolve issue [#2238](https://github.com/databrickslabs/ucx/issues/2238) and enhance the project's ability to handle ACL migration for SQL Warehouse clusters in AWS and Azure.
* Consistently have `db-temp-` as the backup prefix for renaming groups ([2266](https://github.com/databrickslabs/ucx/issues/2266)). In this release, we are implementing a consistent backup prefix for renaming workspace groups during migration, ensuring codebase and documentation consistency. The `db-temp-` prefix is now used for renaming groups, replacing the previous `ucx-renamed-` prefix in the `rename_workspace_local_groups` task. This change mitigates potential conflicts with account-level groups of the same name. The group migration workflow remains unaltered, except for the `rename_workspace_local_groups` task that now uses the new prefix. Affected files include 'config.py', 'groups.py', and 'test_groups.py', with corresponding changes in the `WorkspaceConfig` class, `Groups` class, and test cases. This feature is experimental, subject to further development, and may change in the future.
* Fixed `astroid` in the upload_dependencies ([2267](https://github.com/databrickslabs/ucx/issues/2267)). In this update, we have added the `astroid` library as a dependent library for UCX in the `upload_wheel_dependencies` function to resolve the reported issue [#2257](https://github.com/databrickslabs/ucx/issues/2257). Previously, the absence of `astroid` from the dependencies caused problems in certain workspaces. To address this, we modified the `_upload_wheel` function to include `astroid` in the list of libraries uploaded as wheel dependencies. This change has been manually tested and confirmed to work in a blocked workspace. No new methods have been added, and existing functionality has been updated within the `_upload_wheel` function to include `astroid` in the uploaded dependencies.
* Group migration: improve robustness when renaming groups ([2263](https://github.com/databrickslabs/ucx/issues/2263)). This pull request introduces changes to the group migration functionality to improve its robustness when renaming groups. Instead of assuming that a group rename has taken effect immediately after renaming it, the code now double-checks to ensure that the rename has taken place. This change affects the `migrate-groups` and `migrate-groups-experimental` workflows, which have been modified accordingly. Additionally, unit tests and existing integration tests have been updated to account for these changes. The `test_rename_groups_should_patch_eligible_groups` and `test_rename_groups_should_wait_for_renames_to_complete` tests have been updated to include a mock `sleep` function, allowing for more thorough testing of the rename process. The `list` and `get` methods of the `workspace_client` are mocked to return different values at different times, simulating the various stages of the rename process. This allows the tests to thoroughly exercise the code that handles group renames and ensures that it handles failure and success cases correctly. The methods `_rename_group`, `_wait_for_group_rename`, and `_wait_for_renamed_groups` have been added or modified to support this functionality. The `_wait_for_workspace_group_deletion` and `_check_workspace_group_deletion` methods have also been updated to support the deletion of original workspace groups. The `delete_original_workspace_groups` method has been modified to use these new and updated methods for deleting groups and confirming that the deletion has taken effect.
* Install state misses `dashboards` fields ([2275](https://github.com/databrickslabs/ucx/issues/2275)). In this release, we have resolved a bug related to the installation process in the `databricks/labs/blueprint` project that resulted in the omission of the `dashboards` field from the installation state. This bug was introduced in a previous update ([#2229](https://github.com/databrickslabs/ucx/issues/2229)) which parallelized the installation process. This commit addresses the issue by saving the installation state at the end of the `WorkspaceInstallation.run` method, ensuring that the `dashboards` field is included in the state. Additionally, a new method `_install_state.save()` has been added to save the installation state. The changes also include adding a new method `InstallState.from_installation()` and a new test case `test_installation_stores_install_state_keys()` to retrieve the installation state and check for the presence of specific keys (`jobs` and `dashboards`). The `test_uninstallation()` test case has been updated to ensure that the installation and uninstallation processes work correctly. These changes enhance the installation and uninstallation functionality for the `databricks/labs/blueprint` project by ensuring that the installation state is saved correctly and that the `jobs` and `dashboards` keys are stored as expected, providing improved coverage and increased confidence in the functionality. The changes affect the existing command `databricks labs install ucx`.
* Use deterministic names to create AWS external locations ([2271](https://github.com/databrickslabs/ucx/issues/2271)). In this release, we have introduced deterministic naming for AWS external locations in our open-source library, addressing issue [#2270](https://github.com/databrickslabs/ucx/issues/2270). The `run` method in the `locations.py` file has been updated to generate deterministic names for external locations using the new `_generate_external_location_name` method. This method generates names based on the lowercase parts of a file path, joined by underscores, instead of using a prefix and a counter. Additionally, test cases for creating external locations in AWS have been updated to use the new naming convention, improving the predictability and consistency of the external location names. These changes simplify the management and validation of external locations, making it easier for software engineers to maintain and control the names of the external locations.

0.30.0

Not secure

* Fixed codec error in md ([2234](https://github.com/databrickslabs/ucx/issues/2234)). In this release, we have addressed a codec error in the `md` file that caused issues on Windows machines due to the presence of curly quotes. This has been resolved by replacing curly quotes with straight quotes. The affected code pertains to the `.setJobGroup` pattern in the `SparkContext` where `spark.addTag()` is used to attach a tag, and `getTags()` and `interruptTag(tag)` are used to act upon the presence or absence of a tag. These APIs are specific to Spark Connect (Shared Compute Mode) and will not work in `Assigned` access mode. Additionally, the release includes updates to the README.md file, providing solutions for various issues related to UCX installation and configuration. These changes aim to improve the user experience and ensure a smooth installation process for software engineers adopting the project. This release also enhances compatibility and reliability of the code for users across various operating systems. The changes were co-authored by Cor and address issue [#2234](https://github.com/databrickslabs/ucx/issues/2234). Please note that this release does not provide medical advice or treatment and should not be used as a substitute for professional medical advice. It also does not process Protected Health Information (PHI) as defined in the Health Insurance Portability and Accountability Act of 1996, unless certain conditions are met. All names used in the tool have been synthetically generated and do not map back to any actual persons or locations.
* Group manager optimisation: during group enumeration only request the attributes that are needed ([2240](https://github.com/databrickslabs/ucx/issues/2240)). In this optimization update to the `groups.py` file, the `_list_workspace_groups` function has been modified to reduce the number of attributes requested during group enumeration to the minimum set necessary. This improvement is achieved by removing the `members` attribute from the list of requested attributes when it is requested during enumeration. For each group returned by `self._ws.groups.list`, the function now checks if the group is out of scope and, if not, retrieves the group with all its attributes using the `_get_group` function. Additionally, the new `scan_attributes` variable limits the attributes requested during the initial enumeration to "id", "displayName", and "meta". This optimization reduces the risk of timeouts caused by large attributes and improves the performance of group enumeration, particularly in cases where members are requested during enumeration due to API issues.
* Group migration: additional logging ([2239](https://github.com/databrickslabs/ucx/issues/2239)). In this release, we have implemented logging improvements for group migration within the group manager. These enhancements include the addition of new informational and debug logs aimed at helping to understand potential issues during group migration. The affected functionality includes the existing workflow `group-migration`. New logging statements have been added to numerous methods, such as `rename_groups`, `_rename_group`, `_wait_for_rename`, `_wait_for_renamed_groups`, `reflect_account_groups_on_workspace`, `delete_original_workspace_groups`, and `validate_group_membership`, as well as data retrieval methods including `_workspace_groups_in_workspace`, `_account_groups_in_workspace`, and `_account_groups_in_account`. These changes will provide increased visibility into the group migration process, including starting to rename/reflect groups, checking for renamed groups, and validating group membership.
* Group migration: improve robustness while deleting workspace groups ([2247](https://github.com/databrickslabs/ucx/issues/2247)). This pull request introduces changes to the group manager aimed at enhancing the reliability of deleting workspace groups, addressing an issue where deletion was being skipped for groups that had recently been renamed due to eventual consistency concerns. The changes involve double-checking the deletion of groups by ensuring they can no longer be directly retrieved from the API and are no longer present in the list of groups during enumeration. Additionally, logging has been improved, and the renaming of groups will be updated in a subsequent pull request. The `remove-workspace-local-backup-groups` workflow and related tests have been modified, and new classes indicating incomplete deletion or rename operations have been implemented. These changes improve the robustness of deleting workspace groups, reducing the likelihood of issues arising post-deletion and enhancing overall system consistency.
* Improve error messages in case of connection errors ([2210](https://github.com/databrickslabs/ucx/issues/2210)). In this release, we've made significant improvements to error messages for connection errors in the `databricks labs ucx (un)install` command, addressing part of issue [#1323](https://github.com/databrickslabs/ucx/issues/1323). The changes include the addition of a new import, `RequestsConnectionError` from the `requests` package, and updates to the error handling in the `run` method to provide clearer and more informative messages during connection problems. A new `except` block has been added to handle `TimeoutError` exceptions caused by `RequestsConnectionError`, logging a warning message with information on troubleshooting network connectivity issues. The `configure` method has also been updated with a docstring noting that connection errors are not handled within it. To ensure the improvements work as expected, we've added new manual and integration tests, including a test for a simulated workspace with no internet connection, and a new function to configure such a workspace. The test checks for the presence of a specific warning message in the log output. The changes also include new type annotations and imports. The target audience for this update includes software engineers adopting the project, who will benefit from clearer error messages and guidance when troubleshooting connection problems.
* Increase timeout for sequence of slow preliminary jobs ([2222](https://github.com/databrickslabs/ucx/issues/2222)). In this enhancement, the timeout duration for a series of slow preliminary jobs has been increased from 4 minutes to 6 minutes, addressing issue [#2219](https://github.com/databrickslabs/ucx/issues/2219). The modification is implemented in the `test_running_real_remove_backup_groups_job` function in the `tests/integration/install/test_installation.py` file, where the `get_group` function's `retried` decorator timeout is updated from 4 minutes to 6 minutes. This change improves the system's handling of slow preliminary jobs by allowing more time for the API to delete a group and minimizing errors resulting from insufficient deletion time. The overall functionality and tests of the system remain unaffected.
* Init `RuntimeContext` from debug notebook to simplify interactive debugging flows ([2253](https://github.com/databrickslabs/ucx/issues/2253)). In this release, we have implemented a change to simplify interactive debugging flows in UCX workflows. We have introduced a new feature that initializes the `RuntimeContext` object from a debug notebook. The `RuntimeContext` is a subclass of `GlobalContext` that manages all object dependencies. Previously, all UCX workflows used a `RuntimeContext` instance for any object lookup, which could be complex during debugging. This change pre-initializes the `RuntimeContext` object correctly, making it easier to perform interactive debugging. Additionally, we have replaced the use of `Installation.load_local` and `WorkspaceClient` with the newly initialized `RuntimeContext` object. This reduces the complexity of object lookup and simplifies the code for debugging purposes. Overall, this change will make it easier to debug UCX workflows by pre-initializing the `RuntimeContext` object with the necessary configurations.
* Lint child dependencies recursively ([2226](https://github.com/databrickslabs/ucx/issues/2226)). In this release, we've implemented significant changes to our linting process for enhanced context awareness, particularly in the context of parent-child file relationships. The `DependencyGraph` class in the `graph.py` module has been updated with new methods, including `parent`, `root_dependencies`, `root_paths`, and `root_relative_names`, and an improved `_relative_names` method. These changes allow for more accurate linting of child dependencies. The `lint` function in the `files.py` module has also been modified to accept new parameters and utilize a recursive linting approach for child dependencies. The `databricks labs ucx lint-local-code` command has been updated to include a `paths` parameter and lint child dependencies recursively, improving the linting process by considering parent-child relationships and resulting in better contextual code analysis. The release contains integration tests to ensure the functionality of these changes, addressing issues [#2155](https://github.com/databrickslabs/ucx/issues/2155) and [#2156](https://github.com/databrickslabs/ucx/issues/2156).
* Removed deprecated `install.sh` script ([2217](https://github.com/databrickslabs/ucx/issues/2217)). In this release, we have removed the deprecated `install.sh` script from the codebase, which was previously used to install and set up the environment for the project. This script would check for the presence of Python binaries, identify the latest version, create a virtual environment, and install project dependencies. Going forward, developers will need to utilize an alternative method for installing and setting up the project environment, as the use of this script is now obsolete. We recommend consulting the updated documentation for guidance on the new installation process.
* Tentatively fix failure when running assessment without a hive_metastore ([2252](https://github.com/databrickslabs/ucx/issues/2252)). In this update, we have enhanced the error handling of the `LocalCheckoutContext` class in the `workspace_cli.py` file. Specifically, we have addressed the issue where a fatal failure occurred when running an assessment without a Hive metastore ([#2252](https://github.com/databrickslabs/ucx/issues/2252)) by implementing a more graceful error handling mechanism. Now, when the metastore fails to load during the initialization of a `LinterContext` object, a warning message is logged instead, and the `MigrationIndex` is initialized with an empty list. This change is linked to the resolution of issue [#2221](https://github.com/databrickslabs/ucx/issues/2221). Additionally, we have imported the `MigrationIndex` class from the `hive_metastore.migration_status` module and added a logger to the module. However, please note that functional tests for this specific modification have not been conducted.
* Total Storage Credentials count widget for Assessment Dashboard ([2201](https://github.com/databrickslabs/ucx/issues/2201)). In this commit, a new widget has been added to the Assessment Dashboard that displays the current total number of storage credentials created in the workspace, up to a limit of 200. This change includes a new SQL query to retrieve the count of storage credentials from the `inventory.external_locations` table and modifies the display of the widget with customized settings. Additionally, a new warning mechanism has been implemented to prevent migration from exceeding the UC storage credentials limit of 200. A new method, `get_roles_to_migrate`, has been added to `access.py` to retrieve the roles that need to be migrated. If the number of roles exceeds 200, a `RuntimeWarning` is raised. User documentation and manual testing have been updated to reflect these changes, but no unit or integration tests have been added yet. This feature is part of the implementation of issue [#1600](https://github.com/databrickslabs/ucx/issues/1600) and is co-authored by Serge Smertin.
* Updated dashboard install using latest `lsql` release ([2246](https://github.com/databrickslabs/ucx/issues/2246)). In this release, the install function for the UCX dashboard has been updated in the `databricks/labs/ucx/install.py` file to use the latest `lsql` release. The `databricks labs instal ucx` command has been modified to accommodate the updated `lsql` version and now includes new methods for upgrading dashboards from Redash to Lakeview, as well as creating and deleting dashboards in Lakeview, which also feature functionality to publish dashboards. The changes have been manually tested and verified on a staging environment. The query formatting in the dashboard has been improved, and the `--width` parameter is no longer necessary in certain instances. This update streamlines the dashboard installation process, enhances its functionality, and ensures its compatibility with the latest `lsql` release.
* Updated sqlglot requirement from <25.7,>=25.5.0 to >=25.5.0,<25.8 ([2248](https://github.com/databrickslabs/ucx/issues/2248)). In this update, we have adjusted the version requirements for the SQL transpiler library, sqlglot, in our pyproject.toml file. The requirement has been updated from ">=25.5.0, <25.7" to ">=25.5.0, <25.8", allowing us to utilize the latest features and bug fixes available in sqlglot version 25.7.0 while still maintaining our previous version constraint. The changelog from sqlglot's repository has been included in this commit, detailing the new features and improvements introduced in version 25.7.0. A list of commits made since the previous version is also provided. The diff of this commit shows that the change only affects the version constraint for sqlglot and does not impact any other parts of the codebase. This update ensures that we are using the most recent stable version of sqlglot while maintaining backward compatibility.

Dependency updates:

* Updated sqlglot requirement from <25.7,>=25.5.0 to >=25.5.0,<25.8 ([2248](https://github.com/databrickslabs/ucx/pull/2248)).

0.29.0

Not secure

* Added `lsql` lakeview dashboard-as-code implementation ([1920](https://github.com/databrickslabs/ucx/issues/1920)). The open-source library has been updated with new features in its dashboard creation functionality. The `assessment_report` and `estimates_report` jobs, along with their corresponding tasks, have been removed. The `crawl_groups` task has been modified to accept a new parameter, `group_manager`. These changes are part of a larger implementation of the `lsql` Lakeview dashboard-as-code system for creating dashboards. The new implementation has been tested through manual testing, existing unit tests, integration tests, and verification on a staging environment, and is expected to improve the functionality and maintainability of the dashboards. The removal of the `assessment_report` and `estimates_report` jobs and tasks may indicate that their functionality has been incorporated into the new `lsql` implementation or is no longer necessary. The new `crawl_groups` task parameter may be used in conjunction with the new `lsql` implementation to enhance the assessment and estimation of groups.
* Added new widget to get table count ([2202](https://github.com/databrickslabs/ucx/issues/2202)). A new widget has been introduced that presents a table count summary, categorized by type (external or managed), location (DBFS root, mount, cloud), and format (delta, parquet, etc.). This enhancement is complemented by an additional SQL file, responsible for generating necessary count statistics. The script discerns the table type and location through location string analysis and subsequent categorization. The output is structured and ordered by table type. It's important to note that no existing functionality has been altered, and the new feature is self-contained within the added SQL file. To ensure the correct functioning of this addition, relevant documentation and manual tests have been incorporated.
* Added support for DBFS when building the dependency graph for tasks ([2199](https://github.com/databrickslabs/ucx/issues/2199)). In this update, we have added support for the Databricks File System (DBFS) when building the dependency graph for tasks during workflow assessment. This enhancement allows for the use of wheels, eggs, requirements.txt files, and PySpark jobs located in DBFS when assessing workflows. The `DependencyGraph` object's `register_library` method has been updated to handle paths in both Workspace and DBFS formats. Additionally, we have introduced the `_as_path` method and the `_temporary_copy` context manager to manage file copying and path determination. This development resolves issue [#1558](https://github.com/databrickslabs/ucx/issues/1558) and includes modifications to the existing `assessment` workflow and new unit tests.
* Applied `databricks labs lsql fmt` for SQL files ([2184](https://github.com/databrickslabs/ucx/issues/2184)). The engineering team has developed and applied formatting to several SQL files using the `databricks labs lsql fmt` tool from various pull requests, including <https://github.com/databrickslabs/lsql/pull/221>. These changes improve code readability and consistency without affecting functionality. The formatting includes adding comment delimiters, converting subqueries to nested SELECT statements, renaming columns for clarity, updating comments, modifying conditional statements, and improving indentation. The impacted SQL files include queries related to data migration complexity, assessing data modeling complexity, generating table estimates, and calculating data migration effort. Manual testing has been performed to ensure that the update does not introduce any issues in the installed dashboards.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([2182](https://github.com/databrickslabs/ucx/issues/2182)). In this release, the version of `sigstore/gh-action-sigstore-python` is bumped to 3.0.0 from 2.1.1 in the project's GitHub Actions workflow. This new version brings several changes, additions, and removals, such as the removal of certain settings like `fulcio-url`, `rekor-url`, `ctfe`, and `rekor-root-pubkey`, and output settings like `signature`, `certificate`, and `bundle`. The `inputs` field is now parsed according to POSIX shell lexing rules and is optional if `release-signing-artifacts` is true and the action's event is a `release` event. The default suffix has changed from `.sigstore` to `.sigstore.json`. Additionally, various deprecations present in `sigstore-python`'s 2.x series have been resolved. This PR also includes several commits, including preparing for version 3.0.0, cleaning up workflows, and removing old output settings. There are no conflicts with this PR, and Dependabot will resolve them automatically. Users can trigger Dependabot actions by commenting on this PR with specific commands.
* Consistently cleanup linter codes ([2194](https://github.com/databrickslabs/ucx/issues/2194)). This commit introduces changes to the linting functionality of PySpark, focusing on enhancing code consistency and accuracy. New checks have been added for detecting code incompatibilities with UC Shared Clusters, targeting Python UDF unsupported eval types, spark.catalog.X APIs on DBR versions earlier than 14.3, and the use of commandContext. A new file, python-udfs_14_3.py, containing tests for these incompatibilities has been added. The commit also resolves false linting advice for homonymous method names and updates the code for static analysis message codes, improving self-documentation and maintainability. These changes are limited to the linting functionality of PySpark and do not affect any other functionalities. Co-authored by Eric Vergnaud and Serge Smertin.
* Disable the builtin pip version check when running pip commands ([2214](https://github.com/databrickslabs/ucx/issues/2214)). In this release, we have introduced a modification to disable the built-in pip version check when using pip to install dependencies. This change involves altering the existing workflow of the `_install_pip` method to include the `--disable-pip-version-check` flag in the pip install command, reducing noise in pip-related errors and messages, and enhancing user experience. We have conducted manual and unit testing to ensure that the changes do not introduce any regressions and that existing functionalities remain unaffected. The error message has been updated to reflect the new pip behavior, including the `--disable-pip-version-check` flag in the message. Overall, these changes improve the user experience by reducing unnecessary error messages and providing clearer error information.
* Document `principal-prefix-access` for azure will only list abfss storage accounts ([2212](https://github.com/databrickslabs/ucx/issues/2212)). In this release, we have updated the documentation for the `principal-prefix-access` CLI command in the context of Azure. This command now exclusively lists Azure Storage Blob Gen2 accounts and disregards unsupported storage formats such as wasb:// or adl://. This change is significant as these unsupported storage formats are not compatible with Unity Catalog (UC) and will be disregarded during the migration process. This update clarifies the behavior of the command, ensuring that only relevant storage accounts are displayed. This modification is crucial for users who are migrating credentials to UC, as it prevents the incorporation of unsupported storage accounts, resulting in a more streamlined and efficient migration process.
* Group migration: change error logging format ([2215](https://github.com/databrickslabs/ucx/issues/2215)). In this release, we have updated the error logging format for failed permissions migrations during the experimental group migration workflow to enhance readability and debugging capabilities. Previously, the logs only stated that a migration failure occurred without further details. Now, the new format includes both the source and destination account names, as well as a description of the simulated failure during the migration process. This improves the transparency and usefulness of the error logs for debugging and troubleshooting purposes. Additionally, we have added unit tests to ensure the proper logging of failed migrations, ensuring the reliability of the group migration process for our users. This update demonstrates our commitment to providing clear and informative error messages to make the software engineering experience better.
* Improve error handling as already exists error occurs ([2077](https://github.com/databrickslabs/ucx/issues/2077)). The recent change enhances error handling for the `create-catalogs-schemas` CLI command, addressing an issue where the command would fail if the catalog or schema already existed. The modification involves the introduction of the `_get_missing_catalogs_schemas` method to avoid recreating existing ones. The `create_all_catalogs_schemas` method has been updated to include try-except blocks for `_create_catalog_validate` and `_create_schema` methods, skipping creation if a `BadRequest` error occurs with the message "already exists." This ensures that no overwriting of existing catalogs and schemas takes place. A new test case, "test_create_catalogs_schemas_handles_existing," has been added to verify the command's handling of existing catalogs and schemas. This change resolves issue [#1939](https://github.com/databrickslabs/ucx/issues/1939) and is manually tested; no new methods were added, and existing functionality was changed only within the test file.
* Support run assessment as a collection ([1925](https://github.com/databrickslabs/ucx/issues/1925)). This commit introduces the capability to run eligible CLI commands as a collection, with an initial implementation for the assessment run command. A new parameter `collection_workspace_id` has been added to determine whether the current installation workflow is run or if an account context is created to iterate through all workspaces of the specified collection and run the assessment workflow. The `join_collection` method has been updated to accept a list of workspace IDs and a boolean value. Unit tests have been added and existing tests have been updated to ensure proper functionality. The `databricks labs ucx` command has also been modified to support this feature, with the `join_collection` method syncing workspaces in the collection when the `sync` flag is set to True.
* Test UCX over Python v3.10, v3.11, and v3.12 ([2195](https://github.com/databrickslabs/ucx/issues/2195)). In this release, we introduce significant enhancements to our GitHub Actions CI workflow, enabling more comprehensive testing of UCX over Python versions 3.10, 3.11, and 3.12. We've implemented a new matrix strategy in the `push.yml` workflow file, dynamically setting the `python-version` using the `${{ matrix.pyVersion }}` variable. This allows developers to test UCX with specific Python versions by setting the `HATCH_PYTHON` variable. Additionally, we've updated the `pyproject.toml` file, removing the Python 3.10 requirement and improving virtual environment integration with popular IDEs. The `test_migrator_supported_language_with_fixer` function in `test_files.py` has been refactored for a more efficient 'migrator.apply' method test using temporary directories and files. This release aims to ensure compatibility, identify version-specific issues, and improve the user experience for developers.
* Updated databricks-labs-blueprint requirement from ~=0.7.0 to >=0.7,<0.9 ([2191](https://github.com/databrickslabs/ucx/issues/2191)). In this pull request, the `databricks-labs-blueprint` package requirement has been updated from version `~=0.7.0` to `>=0.8,<0.9`. This update ensures compatibility with the project's requirements while allowing the use of the latest version of the package. The pull request also includes release notes and changelog information from the `databrickslabs/blueprint` repository, detailing various improvements and bug fixes, such as support for Python 3.12, type annotations for path-related unit tests, and fixes for the `WorkspacePath` class. A list of commits and their corresponding hashes is provided for engineers to review the changes made in the update and ensure compatibility with their projects.
* Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8 ([2189](https://github.com/databrickslabs/ucx/issues/2189)). In this update, the version requirement of the `databricks-labs-lsql` dependency has been updated from `<0.7,>=0.5` to `>=0.5,<0.8`. This change allows for the use of the latest version of the `databricks-labs-lsql` package while ensuring compatibility with the current system. Additionally, this commit includes the release notes, changelog, and commit details from the `databricks-labs-lsql` repository for version 0.7.1. These documents provide information on various bug fixes, improvements, and changes, such as updating the `sigstore/gh-action-sigstore-python` package from 2.1.1 to 3.0.0, using a default factory to create `Tile._position`, and other enhancements. The changelog includes detailed information about releases and features, while the commit details highlight the changes and contributors for each individual commit.
* Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7 ([2211](https://github.com/databrickslabs/ucx/issues/2211)). In this update, we have revised the requirement range for the `sqlglot` library to '>=25.5.0,<25.7' from '<25.6,>=25.5.0'. This modification allows us to utilize the latest version of sqlglot, which is v25.6.0, while ensuring that the version does not surpass 25.7. This change is part of issue [#2211](https://github.com/databrickslabs/ucx/issues/2211), and the new version includes several enhancements such as support for ORDER BY ALL, FROM ROWS FROM (...) in PostgreSQL, and exp.TimestampAdd in Presto and Trino. Furthermore, the update encompasses modifications to the bigquery, clickhouse, and duckdb dialects, as well as several bug fixes. These improvements are aimed at increasing functionality, stability, and addressing issues in the library.
* Yield `DependencyProblem` if job on runtime DBR14+ and using .egg dependency ([2020](https://github.com/databrickslabs/ucx/issues/2020)). In this release, we have introduced a new method, `_register_egg`, to handle the registration of libraries in .egg format in the `build_dependency_graph` method. This method checks the runtime version of Databricks. If the version is DBR14 or higher, it yields `DependencyProblem` with code 'not-supported', indicating that installing eggs is no longer supported in Databricks 14.0 or higher. For lower runtime versions, the method downloads the .egg file from the workspace, writes it to a temporary directory, and then registers the library with the `DependencyGraph`. The existing functionality, such as registering libraries in .whl format and registering notebooks, remains unchanged. This release also includes a new test case, `test_job_dependency_problem_egg_dbr14plus`, which creates a job with an .egg dependency and verifies that the expected `DependencyProblem` is raised when using .egg dependencies in a job on Databricks Runtime (DBR) version 14 or higher. This change addresses issue [#1793](https://github.com/databrickslabs/ucx/issues/1793) and improves dependency management, making it easier for software engineers to adopt and work seamlessly with the project.

Dependency updates:

* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([2182](https://github.com/databrickslabs/ucx/pull/2182)).
* Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8 ([2189](https://github.com/databrickslabs/ucx/pull/2189)).
* Updated databricks-labs-blueprint requirement from ~=0.7.0 to >=0.7,<0.9 ([2191](https://github.com/databrickslabs/ucx/pull/2191)).
* Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7 ([2211](https://github.com/databrickslabs/ucx/pull/2211)).

Page 5 of 12

Releases

Has known vulnerabilities

Previous Next

Databricks-labs-ucx

Page 5 of 12

0.34.0

0.33.0

0.32.0

0.31.0

0.30.0

0.29.0

Page 5 of 12

Links

Releases