Databricks-labs-ucx

Latest version: v0.57.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 12

0.40.0

Not secure

* Added `google-cloud-core` to known list ([2826](https://github.com/databrickslabs/ucx/issues/2826)). In this release, we have incorporated the `google-cloud-core` library into our project's configuration file, specifying several modules from this library. This change is part of the resolution of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which pertains to working with Google Cloud services. The `google-cloud-core` library offers core functionalities for Google Cloud client libraries, including helper functions, HTTP-related functionalities, testing utilities, client classes, environment variable handling, exceptions, obsolete features, operation tracking, and version management. By adding these new modules to the known list in the configuration file, we can now utilize them in our project as needed, thereby enhancing our ability to work with Google Cloud services.
* Added `gviz-api` to known list ([2831](https://github.com/databrickslabs/ucx/issues/2831)). In this release, we have added the `gviz-api` library to our known library list, specifically specifying the `gviz_api` package within it. This addition enables the proper handling and recognition of components from the `gviz-api` library in the system, thereby addressing a portion of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931). While the specifics of the `gviz-api` library's implementation and usage are not described in the commit message, it is expected to provide functionality related to data visualization. This enhancement will enable us to expand our system's capabilities and provide more comprehensive solutions for our users.
* Added export CLI functionality for assessment results ([2553](https://github.com/databrickslabs/ucx/issues/2553)). A new `export` command-line interface (CLI) function has been added to the open-source library to export assessment results. This feature includes the addition of a new `AssessmentExporter` class in the `export.py` module, which is responsible for exporting assessment results to CSV files inside a ZIP archive. Users can specify the destination path and type of report for the exported results. A notebook utility is also included to run the export from the workspace environment, with default location, unit tests, and integration tests for the notebook utility. The `acl_migrator` method has been optimized for better performance. This new functionality provides more flexibility in exporting assessment results and improves the overall assessment functionality of the library.
* Added functional test related to bug [2850](https://github.com/databrickslabs/ucx/issues/2850) ([#2880](https://github.com/databrickslabs/ucx/issues/2880)). A new functional test has been added to address a bug fix related to issue [#2850](https://github.com/databrickslabs/ucx/issues/2850), which involves reading data from a CSV file located in a volume using Spark's readStream function. The test specifies various options including file format, schema location, header, and compression. The CSV file is loaded from '/Volumes/playground/test/demo_data/' and the schema location is set to '/Volumes/playground/test/schemas/'. Additionally, a unit test has been added and is referenced in the commit. This functional test will help ensure that the bug fix for issue [#2850](https://github.com/databrickslabs/ucx/issues/2850) is working as expected.
* Added handling for `PermissionDenied` when retrieving `WorkspaceClient`s from account ([2877](https://github.com/databrickslabs/ucx/issues/2877)). In this release, the `workspace_clients` method of the `Account` class in `workspaces.py` has been updated to handle `PermissionDenied` exceptions when retrieving `WorkspaceClient`s. This change introduces a try-except block around the command retrieving the workspace client, which catches the `PermissionDenied` exception and logs a warning message if access to a workspace is denied. If no exception is raised, the workspace client is added to the list of clients as before. The commit also includes a new unit test to verify this functionality. This update addresses issue [#2874](https://github.com/databrickslabs/ucx/issues/2874) and enhances the robustness of the `databricks labs ucx sync-workspace-info` command by ensuring it gracefully handles permission errors during workspace retrieval.
* Added testing with Python 3.13 ([2878](https://github.com/databrickslabs/ucx/issues/2878)). The project has been updated to include testing with Python 3.13, in addition to the previously supported versions of Python 3.10, 3.11, and 3.12. This update is reflected in the `.github/workflows/push.yml` file, which now includes '3.13' in the `pyVersion` matrix for the jobs. This addition expands the range of Python versions that the project can be tested and run on, providing increased flexibility and compatibility for users, as well as ensuring continued support for the latest versions of the Python programming language.
* Added used tables in assessment dashboard ([2836](https://github.com/databrickslabs/ucx/issues/2836)). In this update, we introduce a new widget to the assessment dashboard for displaying used tables, enhancing visibility into how tables are utilized within the Databricks environment. This change includes the addition of the `UsedTable` class in the `databricks.labs.ucx.source_code.base` module, which tracks table usage details in the inventory database. Two new methods, `collect_dfsas_from_query` and `collect_used_tables_from_query`, have been implemented to collect data source access and used tables information from a query, with lineage information added to the table details. Additionally, a test function, `test_dashboard_with_prepopulated_data`, has been introduced to prepopulate data for use in the dashboard, ensuring proper functionality of the new feature.
* Avoid resource conflicts in integration tests by using a random dir name ([2865](https://github.com/databrickslabs/ucx/issues/2865)). In this release, we have implemented changes to address resource conflicts in integration tests by introducing random directory names. The `save_locations` method in `conftest.py` has been updated to generate random directory names using the `tempfile.mkdtemp` function, based on the value of the new `make_random` parameter. Additionally, in the `test_migrate.py` file located in the `tests/integration/hive_metastore` directory, the hard-coded directory name has been replaced with a random one generated by the `make_random` function, which is used when creating external tables and specifying the external delta location. Lastly, the `test_move_tables_table_properties_mismatch_preserves_original` function in `test_table_move.py` has been updated to include a randomly generated directory name in the table's external delta and storage location, ensuring that tests can run concurrently without conflicting with each other. These changes resolve the issue described in [#2797](https://github.com/databrickslabs/ucx/issues/2797) and improve the reliability of integration tests.
* Exclude dfsas from used tables ([2841](https://github.com/databrickslabs/ucx/issues/2841)). In this release, we've made significant improvements to the accuracy of table identification and handling in our system. We've excluded certain direct filesystem access patterns from being treated as tables in the current implementation, correcting a previous error. The `collect_tables` method has been updated to exclude table names matching defined direct filesystem access patterns. Additionally, we've added a new method `TableInfoNode` to wrap used tables and the nodes that use them. We've also introduced changes to handle direct filesystem access patterns more accurately, ensuring that the DataFrame API's `spark.table()` function is identified correctly, while the `spark.read.parquet()` function, representing direct filesystem access, is now ignored. These changes are supported by new unit tests to ensure correctness and reliability, enhancing the overall functionality and behavior of the system.
* Fixed known matches false postives for libraries starting with the same name as a library in the known.json ([2860](https://github.com/databrickslabs/ucx/issues/2860)). This commit addresses an issue of false positives in known matches for libraries that have the same name as a library in the known.json file. The `module_compatibility` function in the `known.py` file was updated to look for exact matches or parent module matches, rather than just matches at the beginning of the name. This more nuanced approach ensures that libraries with similar names are not incorrectly flagged as having compatibility issues. Additionally, the `known.json` file is now sorted when constructing module problems, indicating that the order of the entries in this file may have been relevant to the issue being resolved. To ensure the accuracy of the changes, new unit tests were added. The test suite was expanded to include tests for known and unknown compatibility, and a new load test was added for the known.json file. These changes improve the reliability of the known matches feature, which is critical for ensuring the correct identification of compatibility issues.
* Make delta format case sensitive ([2861](https://github.com/databrickslabs/ucx/issues/2861)). In this commit, the delta format is made case sensitive to enhance the robustness and reliability of the code. The `TableInMount` class has been updated with a `__post_init__` method to convert the `format` attribute to uppercase, ensuring case sensitivity. Additionally, the `Table` class in the `tables.py` file has been modified to include a `__post_init__` method that converts the `table_format` attribute to uppercase during object creation, making format comparisons case insensitive. New properties, `is_delta` and `is_hive`, have been added to the `Table` class to check if the table format is delta or hive, respectively. These changes affect the `what` method of the `AclMigrationWhat` enum class, which now checks for `is_delta` and `is_hive` instead of comparing `table_format` with `DELTA` and "HIVE". Relevant issues [#2858](https://github.com/databrickslabs/ucx/issues/2858) and [#2840](https://github.com/databrickslabs/ucx/issues/2840) have been addressed, and unit tests have been included to verify the behavior. However, the changes have not been verified on the staging environment yet.
* Make delta format case sensitive ([2862](https://github.com/databrickslabs/ucx/issues/2862)). The recent update, derived from the resolution of issue [#2861](https://github.com/databrickslabs/ucx/issues/2861), introduces a case-sensitive delta format to our open-source library, enhancing the precision of delta table tracking. This change impacts all table format-related code and is accompanied by additional tests for robustness. A new `location` column has been incorporated into the `table_estimates` view, facilitating the determination of delta table location. Furthermore, a new method has been implemented to extract the `location` column from the `table_estimates` view, further refining the project's functionality and accuracy in managing delta tables.
* Verify UCX catalog is accessible at start of `migration-progress-experimental` workflow ([2851](https://github.com/databrickslabs/ucx/issues/2851)). In this release, we have introduced a new `verify_has_ucx_catalog` method in the `Application` class of the `databricks.labs.ucx.contexts` module, which checks for the presence of a UCX catalog in the workspace and returns an instance of the `VerifyHasCatalog` class. This method is used in the `migration-progress-experimental` workflow to verify UCX catalog accessibility, addressing issues [#2577](https://github.com/databrickslabs/ucx/issues/2577) and [#2848](https://github.com/databrickslabs/ucx/issues/2848) and progressing work on [#2816](https://github.com/databrickslabs/ucx/issues/2816). The `verify_has_ucx_catalog` method is decorated with `cached_property` and takes `workspace_client` and `ucx_catalog` as arguments. Additionally, we have added a new `VerifyHasCatalog` class that checks if a specified Unity Catalog (UC) catalog exists in the workspace and updated the import statement to include a `NotFound` exception. We have also added a timeout parameter to the `validate_step` function in the `workflows.py` file, modified the `migration-progress-experimental` workflow to include a new step `verify_prerequisites` in the `table_migration` job cluster, and added unit tests to ensure the proper functioning of these changes. These updates improve the application's ability to interact with UCX catalogs and ensure their presence and accessibility during workflow execution, while also enhancing the robustness and reliability of the `migration-progress-experimental` workflow.

0.39.0

Not secure

* Added `Farama-Notifications` to known list ([2822](https://github.com/databrickslabs/ucx/issues/2822)). A new configuration has been implemented in this release to integrate Farama-Notifications into the existing system, partially addressing issue [#193](https://github.com/databrickslabs/ucx/issues/193)
* Added `aiohttp-cors` library to known list ([2775](https://github.com/databrickslabs/ucx/issues/2775)). In this release, we have added the `aiohttp-cors` library to our project, providing asynchronous Cross-Origin Resource Sharing (CORS) handling for the `aiohttp` library. This addition enhances the robustness and flexibility of CORS management in our relevant projects. The library includes several new modules such as "aiohttp_cors", "aiohttp_cors.abc", "aiohttp_cors.cors_config", "aiohttp_cors.mixin", "aiohttp_cors.preflight_handler", "aiohttp_cors.resource_options", and "aiohttp_cors.urldispatcher_router_adapter", which offer functionalities for configuring and handling CORS in `aiohttp` applications. This change partially resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931) and further strengthens our application's security and cross-origin resource sharing capabilities.
* Added `category-encoders` library to known list ([2781](https://github.com/databrickslabs/ucx/issues/2781)). In this release, we've added the `category-encoders` library to our supported libraries, which provides a variety of methods for encoding categorical variables as numerical data, including one-hot encoding and target encoding. This addition resolves part of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which concerned the support of this library. The library has been integrated into our system by adding a new entry for `category-encoders` in the known.json file, which contains several modules and classes corresponding to various encoding methods provided by the library. This enhancement enables software engineers to leverage the capabilities of `category-encoders` library to encode categorical variables more efficiently and effectively.
* Added `cmdstanpy` to known list ([2786](https://github.com/databrickslabs/ucx/issues/2786)). In this release, we have added `cmdstanpy` and `stanio` libraries to our codebase. `cmdstanpy` is a Python library for interfacing with the Stan probabilistic programming language and has been added to the whitelist. This addition enables the use of `cmdstanpy`'s functionalities, including loading, inspecting, and manipulating Stan model objects, as well as running MCMC simulations. Additionally, we have included the `stanio` library, which provides functionality for reading and writing Stan data and model files. These additions enhance the codebase's capabilities for working with probabilistic models, offering expanded options for loading, manipulating, and simulating models written in Stan.
* Added `confection` library to known list ([2787](https://github.com/databrickslabs/ucx/issues/2787)). In this release, the `confection` library, a lightweight, pure Python library for parsing and formatting cookies with two modules for working with cookie headers and utility functions, has been added to the known list of libraries and is now usable within the project. Additionally, several modules from the `srsly` library, a collection of serialization utilities for Python including support for JSON, MessagePack, cloudpickle, and Ruamel YAML, have been added to the known list of libraries, increasing the project's flexibility and functionality in handling serialized data. This partially resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931).
* Added `configparser` library to known list ([2796](https://github.com/databrickslabs/ucx/issues/2796)). In this release, we have added support for the `configparser` library, addressing issue [#1931](https://github.com/databrickslabs/ucx/issues/1931). `Configparser` is a standard Python library used for parsing configuration files. This change not only whitelists the library but also includes the "backports.configparser" and "backports.configparser.compat" modules, providing backward compatibility for older versions of Python. By recognizing and supporting the `configparser` library, users can now utilize it in their code with confidence, knowing that it is a known and supported library. This update also ensures that the backports for older Python versions are recognized, enabling users to leverage the library seamlessly, regardless of the Python version they are using.
* Added `diskcache` library to known list ([2790](https://github.com/databrickslabs/ucx/issues/2790)). A new update has been made to include the `diskcache` library in our open-source library's known list, as detailed in the release notes. This addition brings in multiple modules, including `diskcache`, `diskcache.cli`, `diskcache.core`, `diskcache.djangocache`, `diskcache.persistent`, and `diskcache.recipes`. The `diskcache` library is a high-performance caching system, useful for a variety of purposes such as caching database queries, API responses, or any large data that needs frequent access. By adding the `diskcache` library to the known list, developers can now leverage its capabilities in their projects, partially addressing issue [#1931](https://github.com/databrickslabs/ucx/issues/1931).
* Added `dm-tree` library to known list ([2789](https://github.com/databrickslabs/ucx/issues/2789)). In this release, we have added the `dm-tree` library to our project's known list, enabling its integration and use within our software. The `dm-tree` library is a C++ API that provides functionalities for creating and manipulating tree data structures, with support for sequences and tree benchmarking. This addition expands our range of available data structures, addressing the lack of support for tree data structures and partially resolving issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which may have been related to the integration of the `dm-tree` library. By incorporating this library, we aim to enhance our project's performance and versatility, providing software engineers with more options for handling tree data structures.
* Added `evaluate` to known list ([2821](https://github.com/databrickslabs/ucx/issues/2821)). In this release, we have added the `evaluate` package and its dependent libraries to our open-source library. The `evaluate` package is a tool for evaluating and analyzing machine learning models, providing a consistent interface to various evaluation tasks. Its dependent libraries include `colorful`, `cmdstanpy`, `comm`, `eradicate`, `multiprocess`, and `xxhash`. The `colorful` library is used for colorizing terminal output, while `cmdstanpy` provides Python infrastructure for Stan, a platform for statistical modeling and high-performance statistical computation. The `comm` library is used for creating and managing IPython comms, and `eradicate` is used for removing unwanted columns from pandas DataFrame. The `multiprocess` library is used for spawning processes, and `xxhash` is used for the XXHash algorithms, which are used for fast hash computation. This addition partly resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), providing enhanced functionality for evaluating machine learning models.
* Added `future` to known list ([2823](https://github.com/databrickslabs/ucx/issues/2823)). In this commit, we have added the `future` module, a compatibility layer for Python 2 and Python 3, to the project's known list in the configuration file. This module provides a wide range of backward-compatible tools and fixers to smooth over the differences between the two major versions of Python. It includes numerous sub-modules such as "future.backports", "future.builtins", "future.moves", and "future.standard_library", among others, which offer backward-compatible features for various parts of the Python standard library. The commit also includes related modules like "libfuturize", "libpasteurize", and `past` and their respective sub-modules, which provide tools for automatically converting Python 2 code to Python 3 syntax. These additions enhance the project's compatibility with both Python 2 and Python 3, providing developers with an easier way to write cross-compatible code. By adding the `future` module and related tools, the project can take full advantage of the features and capabilities provided, simplifying the process of writing code that works on both versions of the language.
* Added `google-api-core` to known list ([2824](https://github.com/databrickslabs/ucx/issues/2824)). In this commit, we have added the `google-api-core` and `proto-plus` packages to our codebase. The `google-api-core` package brings in a collection of modules for low-level support of Google Cloud services, such as client options, gRPC helpers, and retry mechanisms. This addition enables access to a wide range of functionalities for interacting with Google Cloud services. The `proto-plus` package includes protobuf-related modules, simplifying the handling and manipulation of protobuf messages. This package includes datetime helpers, enums, fields, marshaling utilities, message definitions, and more. These changes enhance the project's versatility, providing users with a more feature-rich environment for interacting with external services, such as those provided by Google Cloud. Users will benefit from the added functionality and convenience provided by these packages.
* Added `google-auth-oauthlib` and dependent libraries to known list ([2825](https://github.com/databrickslabs/ucx/issues/2825)). In this release, we have added the `google-auth-oauthlib` and `requests-oauthlib` libraries and their dependencies to our repository to enhance OAuth2 authentication flow support. The `google-auth-oauthlib` library is utilized for Google's OAuth2 client authentication and authorization flows, while `requests-oauthlib` provides OAuth1 and OAuth2 support for the `requests` library. This change partially resolves the missing dependencies issue and improves the project's ability to handle OAuth2 authentication flows with Google and other providers.
* Added `greenlet` to known list ([2830](https://github.com/databrickslabs/ucx/issues/2830)). In this release, we have added the `greenlet` library to the known list in the configuration file, addressing part of issue [#193](https://github.com/databrickslabs/ucx/issues/193)
* Added `gymnasium` to known list ([2832](https://github.com/databrickslabs/ucx/issues/2832)). A new update has been made to include the popular open-source `gymnasium` library in the project's configuration file. The library provides various environments, spaces, and wrappers for developing and testing reinforcement learning algorithms, and includes modules such as "gymnasium.core", "gymnasium.envs", "gymnasium.envs.box2d", "gymnasium.envs.classic_control", "gymnasium.envs.mujoco", "gymnasium.envs.phys2d", "gymnasium.envs.registration", "gymnasium.envs.tabular", "gymnasium.envs.toy_text", "gymnasium.experimental", "gymnasium.logger", "gymnasium.spaces", and "gymnasium.utils", each with specific functionality. This addition enables developers to utilize the library without having to modify any existing code and take advantage of the latest features and bug fixes. This change partly addresses issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), likely related to using `gymnasium` in the project, allowing developers to now use it for developing and testing reinforcement learning algorithms.
* Added and populate UCX `workflow_runs` table ([2754](https://github.com/databrickslabs/ucx/issues/2754)). In this release, we have added and populated a new `workflow_runs` table in the UCX project to track the status of workflow runs and handle concurrent writes. This update resolves issue [#2600](https://github.com/databrickslabs/ucx/issues/2600) and is accompanied by modifications to the `migration-process-experimental` workflow, new `WorkflowRunRecorder` and `ProgressTrackingInstallation` classes, and updated user documentation. We have also added unit tests, integration tests, and a `record_workflow_run` method in the `MigrationWorkflow` class. The new table and methods have been tested to ensure they correctly record workflow run information. However, there are still some issues to address, such as deciding on getting workflow run status from `parse_log_task`.
* Added collection of used tables from Python notebooks and files and SQL queries ([2772](https://github.com/databrickslabs/ucx/issues/2772)). This commit introduces the collection and storage of table usage information as part of linting jobs to enable tracking of legacy table usage and lineage. The changes include the modification of existing workflows, addition of new tables and views, and the introduction of new classes such as `UsedTablesCrawler`, `LineageAtom`, and `TableInfoNode`. The new classes and methods support tracking table usage and lineage in Python notebooks, files, and SQL queries. Unit tests and integration tests have been added and updated to ensure the correct functioning of this feature. This is the first pull request in a series of three, with the next two focusing on using the table information in queries and displaying results in the assessment dashboard.
* Changed logic of direct filesystem access linting ([2766](https://github.com/databrickslabs/ucx/issues/2766)). This commit modifies the direct filesystem access (DFSA) linting logic to reduce false positives and improve precision. Previously, all string constants matching a DFSA pattern were detected, with false positives filtered on a case-by-case basis. The new approach narrows DFSA detection to instances originating from `spark` or `dbutils` modules, ensuring relevance and minimizing false alarms. The commit introduces new methods, such as 'is_builtin()' and 'get_call_name()', to determine if a given node is a built-in or not. Additionally, it includes unit tests and updates to the test cases in `test_directfs.py` to reflect the new detection criteria. This change enhances the linting process and enables developers to maintain better control over direct filesystem access within the `spark` and `dbutils` modules.
* Fixed integration issue when collecting tables ([2817](https://github.com/databrickslabs/ucx/issues/2817)). In this release, we have addressed integration issues related to table collection in the Databricks Labs UCX project. We have introduced a new `UsedTablesCrawler` class to crawl tables in paths and queries, which resolves issues reported in tickets [#2800](https://github.com/databrickslabs/ucx/issues/2800) and [#2808](https://github.com/databrickslabs/ucx/issues/2808). Additionally, we have updated the `directfs_access_crawler_for_paths` and `directfs_access_crawler_for_queries` methods to work with the new `UsedTablesCrawler` class. We have also made changes to the `workflow_linter` method to include the new `used_tables_crawler_for_paths` property. Furthermore, we have refactored the `lint` method of certain classes to a `collect_tables` method, which returns an iterable of `UsedTable` objects to improve table collection. The `lint` method now processes the collected tables and raises advisories as needed, while the `apply` method remains unchanged. Integration tests were executed as part of this commit.
* Increase test coverage ([2818](https://github.com/databrickslabs/ucx/issues/2818)). In this update, we have expanded the test suite for the `Tree` class in our Python AST codebase with several new unit tests. These tests are designed to verify various behaviors, including checking for `None` returns, validating string truncation, ensuring `NotImplementedError` exceptions are raised during node appending and method calls, and testing the correct handling of global variables. Additionally, we have included tests that ensure a constant is not from a specific module. This enhancement signifies our dedication to improving test coverage and consistency, which will aid in maintaining code quality, detecting unintended side effects, and preventing regressions in future development efforts.
* Strip preliminary comments in pip cells ([2763](https://github.com/databrickslabs/ucx/issues/2763)). In this release, we have addressed an issue in the processing of pip commands preceded by non-MAGIC comments, ensuring that pip-based library management in Databricks notebooks functions correctly. The changes include stripping preliminary comments and handling the case where the pip command is preceded by a single '%' or '!'. Additionally, a new unit test has been added to validate the behavior of a notebook containing a malformed pip cell. This test checks that the notebook can still be parsed and built into a dependency graph without issues, even in the presence of non-MAGIC comments preceding the pip install command. The code for the test is written in Python and uses the Notebook, Dependency, and DependencyGraph classes to parse the notebook and build the dependency graph. The overall functionality of the code remains unchanged, and the code now correctly processes pip commands in the presence of non-MAGIC comments.
* Temporarily ignore `MANAGED` HMS tables on external storage location ([2837](https://github.com/databrickslabs/ucx/issues/2837)). This release introduces changes to the behavior of the `_migrate_external_table` method in the `table_migrate.py` file, specifically for handling managed tables located on external storage. Previously, the method attempted to migrate any external table, but with this change, it now checks if the object type is 'MANAGED'. If it is, a warning message is logged, and the migration is skipped due to UCX's lack of support for migrating managed tables on external storage. This change affects the existing workflow, specifically the behavior of the `migrate_dbfs_root_tables` function in the HMS table migration test suite. The function now checks for the absence of certain SQL queries, specifically those involving `SYNC TABLE` and `ALTER TABLE`, in the `backend.queries` list to ensure that queries related to managed tables on external storage locations are excluded. This release includes unit tests and integration tests to verify the changes and ensure proper behavior for the modified workflow. Issue [#2838](https://github.com/databrickslabs/ucx/issues/2838) has been resolved with this commit.
* Updated sqlglot requirement from <25.23,>=25.5.0 to >=25.5.0,<25.25 ([2765](https://github.com/databrickslabs/ucx/issues/2765)). In this release, we have updated the sqlglot requirement in the pyproject.toml file to allow for any version greater than or equal to 25.5.0 but less than 25.25. This resolves a conflict in the previous requirement, which ranged from >=25.5.0 to <25.23. The update includes several bug fixes, refactors, and new features, such as support for the OVERLAY function in PostgreSQL and a flag to automatically exclude Keep diff nodes. Additionally, the check_deploy job has been simplified, and the supported dialect count has increased from 21 to 23. This update ensures that the project remains up-to-date and compatible with the latest version of sqlglot, while also improving functionality and stability.
* Whitelists catalogue library ([2780](https://github.com/databrickslabs/ucx/issues/2780)). In this release, we've implemented a change to whitelist the catalogue library, which partially addresses issue [#193](https://github.com/databrickslabs/ucx/issues/193). This improvement allows for the reliable and secure use of the catalogue library in our open-source project. The whitelisting ensures that any potential security threats originating from this library are mitigated, enhancing the overall security of our software. This enhancement also promotes better code maintainability and readability, making it easier for software engineers to understand the library's role in the project. By addressing this issue, our library becomes more robust, dependable, and maintainable for both current and future developments.
* Whitelists circuitbreaker ([2783](https://github.com/databrickslabs/ucx/issues/2783)). A circuit breaker pattern has been implemented in the library to enhance fault tolerance and prevent cascading failures by introducing a delay before retrying requests to a failed service. This feature is configurable and allows users to specify which services should be protected by the circuit breaker pattern via a whitelist in the `known.json` configuration file. A new entry for `circuitbreaker` is added to the configuration, containing an empty list for the circuit breaker whitelist. This development partially addresses issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), aimed at improving system resilience and fault tolerance, and is a significant stride towards building a more robust and reliable open-source library.
* Whitelists cloudpathlib ([2784](https://github.com/databrickslabs/ucx/issues/2784)). In this release, we have whitelisted the cloudpathlib library by adding it to the known.json file. Cloudpathlib is a Python library for manipulating cloud paths, and includes several modules for interacting with various cloud storage systems. Each module has been added to the known.json file with an empty list, indicating that no critical issues have been found in these modules. However, we have added warnings for the use of direct filesystem references in specific classes and methods within the cloudpathlib.azure.azblobclient, cloudpathlib.azure.azblobpath, cloudpathlib.cloudpath, cloudpathlib.gs.gsclient, cloudpathlib.gs.gspath, cloudpathlib.local.implementations.azure, cloudpathlib.local.implementations.gs, cloudpathlib.local.implementations.s3, cloudpathlib.s3.s3client, and cloudpathlib.s3.sspath modules. The warning message indicates that the use of direct filesystem references is deprecated and will be removed in a future release. This change addresses a portion of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931).
* Whitelists colorful ([2785](https://github.com/databrickslabs/ucx/issues/2785)). In this release, we have added support for the `colorful` library, a Python package for generating ANSI escape codes to colorize terminal output. The library contains several modules, including "ansi", "colors", "core", "styles", "terminal", and "utils", all of which have been whitelisted and added to the "known.json" file. This change resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931) and broadens the range of approved libraries that can be used in the project, enabling more flexible and visually appealing terminal output.
* Whitelists cymem ([2793](https://github.com/databrickslabs/ucx/issues/2793)). In this release, we have made changes to the known.json file to whitelist the use of the cymem package in our project. This new entry includes sub-entries such as "cymem", "cymem.about", "cymem.tests", and "cymem.tests.test_import", which likely correspond to specific components or aspects of the package that require whitelisting. This change partially addresses issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which may have been caused by the use or testing of the cymem package. It is important to note that this commit does not modify any existing functionality or add any new methods; rather, it simply grants permission for the cymem package to be used in our project.
* Whitelists dacite ([2795](https://github.com/databrickslabs/ucx/issues/2795)). In this release, we have whitelisted the dacite library in our known.json file. Dacite is a library that enables the instantiation of Python classes with type hints, providing more robust and flexible object creation. By whitelisting dacite, users of our project can now utilize this library in their code without encountering any compatibility issues. This change partially addresses issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which may have involved dacite or type hinting more generally, thereby enhancing the overall functionality and flexibility of our project for software engineers.
* Whitelists databricks-automl-runtime ([2794](https://github.com/databrickslabs/ucx/issues/2794)). A new change has been implemented to whitelist the `databricks-automl-runtime` in the "known.json" file, enabling several nested packages and modules related to Databricks' auto ML runtime for forecasting and hyperparameter tuning. The newly added modules provide functionalities for data preprocessing and model training, including handling time series data, missing values, and one-hot encoding. This modification addresses a portion of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), improving the library's compatibility with Databricks' auto ML runtime.
* Whitelists dataclasses-json ([2792](https://github.com/databrickslabs/ucx/issues/2792)). A new configuration has been added to the "known.json" file, whitelisting the `dataclasses-json` library, which provides serialization and deserialization functionality to Python dataclasses. This change partially resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931) and introduces new methods for serialization and deserialization through this library. Additionally, the libraries `marshmallow` and its associated modules, as well as "typing-inspect," have also been whitelisted, adding further serialization and deserialization capabilities. It's important to note that these changes do not affect existing functionality, but instead provide new options for handling these data structures.
* Whitelists dbl-tempo ([2791](https://github.com/databrickslabs/ucx/issues/2791)). A new library, dbl-tempo, has been whitelisted and is now approved for use in the project. This library provides functionality related to tempo, including interpolation, intervals, resampling, and utility methods. These new methods have been added to the known.json file, indicating that they are now recognized and approved for use. This change is critical for maintaining backward compatibility and project maintainability. It addresses part of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931) and ensures that any new libraries or methods are thoroughly vetted and documented before implementation. Software engineers are encouraged to familiarize themselves with the new library and its capabilities.
* whitelist blis ([2776](https://github.com/databrickslabs/ucx/issues/2776)). In this release, we have added the high-performance computing library `blis` to our whitelist, partially addressing issue [#1931](https://github.com/databrickslabs/ucx/issues/1931). The blis library is optimized for various CPU architectures and provides dense linear algebra capabilities, which can improve the performance of workloads that utilize these operations. With this change, the blis library and its components have been included in our system's whitelist, enabling users to leverage its capabilities. Familiarity with high-performance libraries and their impact on system performance is essential for software engineers, and the addition of blis to our whitelist is a testament to our commitment to providing optimal performance.
* whitelists brotli ([2777](https://github.com/databrickslabs/ucx/issues/2777)). In this release, we have partially addressed issue [#1931](https://github.com/databrickslabs/ucx/issues/1931) by adding support for the Brotli data compression algorithm in our project. The Brotli JSON object and an empty array for `brotli` have been added to the "known.json" configuration file to recognize and support its use. This change does not modify any existing functionality or introduce new methods, but rather whitelists Brotli as a supported algorithm for future use in the project. This enhancement allows for more flexibility and options when working with data compression, providing software engineers with an additional tool for optimization and performance improvements.

Dependency updates:

* Updated sqlglot requirement from <25.23,>=25.5.0 to >=25.5.0,<25.25 ([2765](https://github.com/databrickslabs/ucx/pull/2765)).

0.38.0

Not secure

* Added Py4j implementation of tables crawler to retrieve a list of HMS tables in the assessment workflow ([2579](https://github.com/databrickslabs/ucx/issues/2579)). In this release, we have added a Py4j implementation of a tables crawler to retrieve a list of Hive Metastore tables in the assessment workflow. A new `FasterTableScanCrawler` class has been introduced, which can be used in the Assessment Job based on a feature flag to replace the old Scala code, allowing for better logging during table scans. The existing `assessment.crawl_tables` workflow now utilizes the new py4j crawler instead of the scala one. Integration tests have been added to ensure the functionality works correctly. The commit also includes a new method for listing table names in the specified database and improvements to error handling and logging mechanisms. The new Py4j tables crawler enhances the functionality of the assessment workflow by improving error handling, resulting in better logging and faster table scanning during the assessment process. This change is part of addressing issue [#2190](https://github.com/databrickslabs/ucx/issues/2190) and was co-authored by Serge Smertin.
* Added `create-ucx-catalog` cli command ([2694](https://github.com/databrickslabs/ucx/issues/2694)). A new CLI command, `create-ucx-catalog`, has been added to create a catalog for migration tracking that can be used across multiple workspaces. The command creates a UCX catalog for tracking migration status and artifacts, and is created by running `databricks labs ucx create-ucx-catalog` and specifying the storage location for the catalog. Relevant user documentation, unit tests, and integration tests have been added for this command. The `assign-metastore` command has also been updated to allow for the selection of a metastore when multiple metastores are available in the workspace region. This change improves the migration tracking feature and enhances the user experience.
* Added experimental `migration-progress-experimental` workflow ([2658](https://github.com/databrickslabs/ucx/issues/2658)). This commit introduces an experimental workflow, `migration-progress-experimental`, which refreshes the inventory for various resources such as clusters, grants, jobs, pipelines, policies, tables, TableMigrationStatus, and UDFs. The workflow can be triggered using the `databricks labs ucx migration-progress` CLI command and uses a new implementation of a Scala-based crawler, `TablesCrawler`, which will eventually replace the current implementation. The new workflow is a duplicate of most of the `assessment` pipeline's functionality but with some differences, such as the use of `TablesCrawler`. Relevant user documentation has been added, along with unit tests, integration tests, and a screenshot of a successful staging environment run. The new workflow is expected to run on a schedule in the future. This change resolves [#2574](https://github.com/databrickslabs/ucx/issues/2574) and progresses [#2074](https://github.com/databrickslabs/ucx/issues/2074).
* Added handling for `InternalError` in `Listing.__iter__` ([2697](https://github.com/databrickslabs/ucx/issues/2697)). This release introduces improved error handling in the `Listing.__iter__` method of the `Generic` class, located in the `workspace_access/generic.py` file. Previously, only `NotFound` exceptions were handled, but now both `InternalError` and `NotFound` exceptions are caught and logged appropriately. This change enhances the robustness of the method, which is responsible for listing objects of a specific type and returning them as `GenericPermissionsInfo` objects. To ensure the correct functionality, we have added new unit tests and manual testing. The logging of the `InternalError` exception is properly handled in the `GenericPermissionsSupport` class when listing serving endpoints. This behavior is verified by the newly added test function `test_internal_error_in_serving_endpoints_raises_warning` and the updated `test_serving_endpoints_not_enabled_raises_warning`.
* Added handling for `PermissionDenied` when listing accessible workspaces ([2733](https://github.com/databrickslabs/ucx/issues/2733)). A new `can_administer` method has been added to the `Workspaces` class in the `workspaces.py` file, which allows for more fine-grained control over which users can administer workspaces. This method checks if the user has access to a given workspace and is a member of the workspace's `admins` group, indicating that the user has administrative privileges for that workspace. If the user does not have access to the workspace or is not a member of the `admins` group, the method returns `False`. Additionally, error handling in the `get_accessible_workspaces` method has been improved by adding a `PermissionDenied` exception to the list of exceptions that are caught and logged. New unit tests have been added for the `AccountWorkspaces` class of the `databricks.labs.blueprint.account` module to ensure that the new method is functioning as intended, specifically checking if a user is a workspace administrator based on whether they belong to the `admins` group. The linked issue [#2732](https://github.com/databrickslabs/ucx/issues/2732) is resolved by this change. All changes have been manually and unit tested.
* Added static code analysis results to assessment dashboard ([2696](https://github.com/databrickslabs/ucx/issues/2696)). This commit introduces two new tasks, `assess_dashboards` and `assess_workflows`, to the existing assessment dashboard for identifying migration problems in dashboards and workflows. These tasks analyze embedded queries and notebooks for migration issues and collect direct filesystem access patterns requiring attention. Upon completion, the results are stored in the inventory database and displayed on the Migration dashboard. Additionally, two new widgets, job/query problem widgets and directfs access widgets, have been added to enhance the dashboard's functionality by providing additional information related to code compatibility and access control. Integration tests using mock data have been added and manually tested to ensure the proper functionality of these new features. This update improves the overall assessment and compatibility checking capabilities of the dashboard, making it easier for users to identify and address issues related to Unity Catalog compatibility in their workflows and dashboards.
* Added unskip CLI command to undo a skip on schema or a table ([2727](https://github.com/databrickslabs/ucx/issues/2727)). This pull request introduces a new CLI command, "unskip", which allows users to reverse a previously applied `skip` on a schema or table. The `unskip` command accepts a required `--schema` parameter and an optional `--table` parameter. A new function, also named "unskip", has been added, which takes the same parameters as the `skip` command. The function checks for the required `--schema` parameter and creates a new WorkspaceContext object to call the appropriate method on the table_mapping object. Two new methods, `unskip_schema` and "unskip_table_or_view", have been added to the HiveMapping class. These methods remove the skip mark from a schema or table, respectively, and handle exceptions such as NotFound and BadRequest. The get_tables_to_migrate method has been updated to consider the unskipped tables or schemas. Currently, the feature is tested manually and has not been added to the user documentation.
* Added unskip CLI command to undo a skip on schema or a table ([2734](https://github.com/databrickslabs/ucx/issues/2734)). A new `unskip` CLI command has been added to the project, which allows users to remove the `skip` mark set by the existing `skip` command on a specified schema or table. This command takes an optional `--table` flag, and if not provided, it will unskip the entire schema. The new functionality is accompanied by a unit test and relevant user documentation, and addresses issue [#1938](https://github.com/databrickslabs/ucx/issues/1938). The implementation includes the addition of the `unskip_table_or_view` method, which generates the appropriate `ALTER TABLE/VIEW` statement to remove the skip marker, and updates to the `unskip_schema` method to include the schema name in the `ALTER SCHEMA` statement. Additionally, exception handling has been updated to include `NotFound` and `BadRequest` exceptions. This feature simplifies the process of undoing a skip on a schema, table, or view in the Hive metastore, which previously required manual editing of the Hive metastore properties.
* Assess source code as part of the assessment ([2678](https://github.com/databrickslabs/ucx/issues/2678)). This commit introduces enhancements to the assessment workflow, including the addition of two new tasks for evaluating source code from SQL queries in dashboards and from notebooks/files in jobs and tasks. The existing `databricks labs install ucx` command has been modified to incorporate linting during the assessment. The `QueryLinter` class has been updated to accept an additional argument for linting source code. These changes have been thoroughly tested through integration tests to ensure proper functionality. Co-authored by Eric Vergnaud.
* Bump astroid version, pylint version and drop our f-string workaround ([2746](https://github.com/databrickslabs/ucx/issues/2746)). In this update, we have bumped the versions of astroid and pylint to 3.3.1 and removed workarounds related to f-string inference limitations in previous versions of astroid (< 3.3). These workarounds were necessary for handling issues such as uninferrable sys.path values and the lack of f-string inference in loops. We have also updated corresponding tests to reflect these changes and improve the overall code quality and maintainability of the project. These changes are part of a larger effort to update dependencies and simplify the codebase by leveraging the latest features of updated tools and removing obsolete workarounds.
* Delete temporary files when running solacc ([2750](https://github.com/databrickslabs/ucx/issues/2750)). This commit includes changes to the `solacc.py` script to improve the linting process for the `solacc` repository, specifically targeting the issue of excessive temporary files that were exceeding CI storage capacity. The modifications include linting the repository on a per-top-level `solution` basis, where each solution resides within the top folders and is independent of others. Post-linting, temporary files and directories registered in `PathLookup` are deleted to enhance storage efficiency. Additionally, this commit prepares for improving false positive detection and introduces a new `SolaccContext` class that tracks various aspects of the linting process, providing more detailed feedback on the linting results. This change does not introduce new functionality or modify existing functionality, but rather optimizes the linting process for the `solacc` repository, maintaining CI storage capacity levels within acceptable limits.
* Don't report direct filesystem access for API calls ([2689](https://github.com/databrickslabs/ucx/issues/2689)). This release introduces enhancements to the Direct File System Access (DFSA) linter, resolving false positives in API call reporting. The `ws.api_client.do` call previously triggered inaccurate direct filesystem access alerts, which have been addressed by adding new methods to identify HTTP call parameters and specific API calls. The linter now disregards DFSA patterns within known API calls, eliminating false positives with relative URLs and duplicate advice from SparkSqlPyLinter. Additionally, improvements in the `python_ast.py` and `python_infer.py` files include the addition of `is_instance_of` and `is_from_module` methods, along with safer inference methods to prevent infinite recursion and enhance value inference. These changes significantly improve the DFSA linter's accuracy and effectiveness when analyzing code containing API calls.
* Enables cli cmd `databricks labs ucx create-catalog-schemas` to apply catalog/schema acl from legacy hive_metastore ([2676](https://github.com/databrickslabs/ucx/issues/2676)). The new release introduces a `databricks labs ucx create-catalog-schemas` command, which applies catalog/schema Access Control List (ACL) from a legacy hive_metastore. This command modifies the existing `table_mapping` method to include a new `grants_crawler` parameter in the `CatalogSchema` constructor, enabling the application of ACLs from the legacy hive_metastore. A corresponding unit test is included to ensure proper functionality. The `CatalogSchema` class in the `databricks.labs.ucx.hive_metastore.catalog_schema` module has been updated with a new argument `hive_acl` and the integration of the `GrantsCrawler` class. The `GrantsCrawler` class is responsible for crawling the Hive metastore and retrieving grants for catalogs, schemas, and tables. The `prepare_test` function has been updated to include the `hive_acl` argument and the `test_catalog_schema_acl` function has been updated to test the new functionality, ensuring that the correct grant statements are generated for a wider range of principals and catalogs/schemas. These changes improve the functionality and usability of the `databricks labs ucx create-catalog-schemas` command, allowing for a more seamless transition from a legacy hive metastore.
* Fail `make test` on coverage below 90% ([2682](https://github.com/databrickslabs/ucx/issues/2682)). A new change has been introduced to the pyproject.toml file to enhance the codebase's quality and robustness by ensuring that the test coverage remains above 90%. This has been accomplished by adding the `--cov-fail-under=90` flag to the `test` and `coverage` scripts in the `[tool.hatch.envs.default.scripts]` section. This flag will cause the `make test` command to fail if the coverage percentage falls below the specified value of 90%, ensuring that all new changes are thoroughly tested and that the codebase maintains a minimum coverage threshold. This is a best practice for maintaining code coverage and improving the overall quality and reliability of the codebase.
* Fixed DFSA false positives from f-string fragments ([2679](https://github.com/databrickslabs/ucx/issues/2679)). This commit addresses false positive DataFrame API Scanning Antipattern (DFSA) reports in Python code, specifically in f-string fragments containing forward slashes and curly braces. The linter has been updated to accurately detect DFSA paths while avoiding false positives, and it now checks for `JoinedStr` fragments in string constants. Additionally, the commit rectifies issues with duplicate advices reported by `SparkSqlPyLinter`. No new features or major functionality changes have been introduced; instead, the focus has been on improving the reliability and accuracy of DFSA detection. Co-authored by Eric Vergnaud, this commit includes new unit tests and refinements to the DFSA linter, specifically addressing false positive patterns like `f"/Repos/{thing1}/sdk-{thing2}-{thing3}"`. To review these changes, consult the updated tests in the `tests/unit/source_code/linters/test_directfs.py` file, such as the new test case for the f-string pattern causing false positives. By understanding these improvements, you'll ensure your project adheres to the latest updates, maintaining quality and accurate DFSA detection.
* Fixed failing integration tests that perform a real assessment ([2736](https://github.com/databrickslabs/ucx/issues/2736)). In this release, we have made significant improvements to the integration tests in the `assessment` workflow, by reducing the scope of the assessment and improving efficiency and reliability. We have removed several object creation functions and added a new function `populate_for_linting` for linting purposes. The `populate_for_linting` function adds necessary information to the installation context, and is used to ensure that the integration tests still have the required data for linting. We have also added a pytest fixture `populate_for_linting` to set up a minimal amount of data in the workspace for linting purposes. These changes have been implemented in the `test_workflows.py` file in the integration/assessment directory. This will help to ensure that the tests are not unnecessarily extensive, and that they are able to accurately assess the functionality of the library.
* Fixed sqlglot crasher with 'drop schema ...' statement ([2758](https://github.com/databrickslabs/ucx/issues/2758)). In this release, we have addressed a crash issue in the `sqlglot` library caused by the `drop schema` statement. A new method, `_unsafe_lint_expression`, has been introduced to prevent the crash by checking if the current expression is a `Use`, `Create`, or `Drop` statement and updating the `schema` attribute accordingly. The library now correctly handles the `drop schema` statement and returns a `Deprecation` warning if the table being processed is in the `hive_metastore` catalog and has been migrated to the Unity Catalog. Unit tests have been added to ensure the correct behavior of this code, and the linter for `from table` SQL has been updated to parse and handle the `drop schema` statement without raising any errors. These changes improve the library's overall reliability and stability, allowing it to operate smoothly with the `drop schema` statement.
* Fixed test failure: `test_table_migration_job_refreshes_migration_status[regular-migrate-tables]` ([2625](https://github.com/databrickslabs/ucx/issues/2625)). In this release, we have addressed two issues ([#2621](https://github.com/databrickslabs/ucx/issues/2621) and [#2537](https://github.com/databrickslabs/ucx/issues/2537)) and fixed a test failure in `test_table_migration_job_refreshes_migration_status[regular-migrate-tables]`. The `index` and `index_full_refresh` methods in `table_migrate.py` have been updated to accept a new `force_refresh` flag. When set to `True`, these methods will ensure that the migration status is up-to-date. This change also affects the `ViewsMigrationSequencer` class, which now passes `force_refresh=True` to the `index` method. Additionally, we have fixed a test failure by reusing the `force_refresh` flag to ensure the migration status is up-to-date. The `TableMigrationStatus` class in `table_migration_status.py` has been modified to accept an optional `force_refresh` parameter in the `index` method, and a unit test has been updated to assert the correct behavior when updating the migration status.
* Fixes error message ([2759](https://github.com/databrickslabs/ucx/issues/2759)). The `load` method of the `mapping.py` file in the `databricks/labs/ucx/hive_metastore` package has been updated to correct an error message displayed when a `NotFound` exception is raised. The previous message suggested running an incorrect command, which has been updated to the correct one: "Please run: databricks labs ucx create-table-mapping". This change does not add any new methods or alter existing functionality, but instead focuses on improving the user experience by providing accurate information when an error occurs. The scope of this change is limited to updating the error message, and no other modifications have been made.
* Fixes issue of circular dependency of migrate-location ACL ([2741](https://github.com/databrickslabs/ucx/issues/2741)). In this release, we have resolved two issues ([#274](https://github.com/databrickslabs/ucx/issues/274)
* Fixes source table alias dissapearance during migrate_views ([2726](https://github.com/databrickslabs/ucx/issues/2726)). This release introduces a fix to preserve the alias for the source table during the conversion of CREATE VIEW SQL from the legacy Hive metastore to the Unity Catalog. The issue was addressed by adding a new test case, `test_migrate_view_alias_test`, to verify the correct handling of table aliases during migration. The changes also include a fix for the SQL conversion and new test cases to ensure the correct handling of table aliases, reflected in accurate SQL conversion. A new parameter, `alias`, has been added to the Table class, and the `apply` method in the `from_table.py` file has been updated. The migration process has been updated to retain the original alias of the table. Unit tests have been added and thoroughly tested to confirm the correctness of the changes, including handling potential intermittent failures caused by external dependencies.
* Py4j table crawler: suggestions/fixes for describing tables ([2684](https://github.com/databrickslabs/ucx/issues/2684)). This release introduces significant improvements and fixes to the Py4J-based table crawler, enhancing its capability to describe tables effectively. The code for fetching table properties over the bridge has been updated, and error tracing has been improved through individual fetching of each table property and providing python backtrace on JVM side errors. Scala `Option` values unboxing issues have been resolved, and a small optimization has been implemented to detect partitioned tables without materializing the collection. The table's `.viewText()` property is now properly handled as a Scala `Option`. The `catalog` argument is now explicitly verified to be `hive_metastore`, and a new static method `_option_as_python` has been introduced for safely extracting values from Scala `Option`. The `_describe` method has been refactored to handle exceptions more gracefully and improved code readability. These changes result in better functionality, error handling, logging, and performance when describing tables within a specified catalog and database. The linked issues [#2658](https://github.com/databrickslabs/ucx/issues/2658) and [#2579](https://github.com/databrickslabs/ucx/issues/2579) are progressed through these updates, and appropriate testing has been conducted to ensure the improvements' effectiveness.
* Speedup assessment workflow by making DBFS root table size calculation parallel ([2745](https://github.com/databrickslabs/ucx/issues/2745)). In this release, the assessment workflow for calculating DBFS root table size has been optimized through the parallelization of the calculation process, resulting in improved performance. This has been achieved by updating the `pipelines_crawler` function in `src/databricks/labs/ucx/contexts/workflow_task.py`, specifically the `cached_property table_size_crawler`, to include an additional argument `self.config.include_databases`. The `TablesCrawler` class has also been modified to include a generic type parameter `Table`, enabling type hinting and more robust type checking. Furthermore, the unit test file `test_table_size.py` in the `hive_metastore` directory has been updated to handle corrupt tables and invalid delta format errors more effectively. Additionally, a new entry `databricks-pydabs` has been added to the "known.json" file, potentially enabling better integration with the `databricks-pydabs` library or providing necessary configuration information for parallel processing. Overall, these changes improve the efficiency and scalability of the codebase and optimize the assessment workflow for calculating DBFS root table size.
* Updated databricks-labs-blueprint requirement from <0.9,>=0.8 to >=0.8,<0.10 ([2747](https://github.com/databrickslabs/ucx/issues/2747)). In this update, the requirement for `databricks-labs-blueprint` has been updated to version `>=0.8,<0.10` in the `pyproject.toml` file. This change allows the project to utilize the latest features and bug fixes included in version 0.9.0 of the `databricks-labs-blueprint` library. Notable updates in version 0.9.0 consist of the addition of Databricks CLI version as part of routed command telemetry and support for Unicode Byte Order Mark (BOM) in file upload and download operations. Additionally, various bug fixes and improvements have been implemented for the `WorkspacePath` class, including the addition of `stat()` methods and improved compatibility with different versions of Python.
* Updated databricks-labs-lsql requirement from <0.12,>=0.5 to >=0.5,<0.13 ([2688](https://github.com/databrickslabs/ucx/issues/2688)). In this update, the version requirement of the `databricks-labs-lsql` library has been changed from a version greater than or equal to 0.5 and less than 0.12 to a version greater than or equal to 0.5 and less than 0.13. This allows the project to utilize the latest version of 'databricks-labs-lsql', which includes new methods for differentiating between a table that has never been written to and one with zero rows in the MockBackend class. Additionally, the update adds support for various filter types and improves testing coverage and reliability. The release notes and changelog for the updated library are provided in the commit message for reference.
* Updated documentation to explain the usage of collections and eligible commands ([2738](https://github.com/databrickslabs/ucx/issues/2738)). The latest update to the Databricks Labs Unified CLI (UCX) tool introduces the `join-collection` command, which enables users to join two or more workspaces into a collection, allowing for streamlined and consolidated command execution across multiple workspaces. This feature is available to Account admins on the Databricks account, Workspace admins on the workspaces to be joined, and requires UCX installation on the workspace. To run collection-eligible commands, users can simply pass the `--run-as-collection=True` flag. This enhancement enhances the UCX tool's functionality, making it easier to manage and execute commands on multiple workspaces.
* Updated sqlglot requirement from <25.22,>=25.5.0 to >=25.5.0,<25.23 ([2687](https://github.com/databrickslabs/ucx/issues/2687)). In this pull request, we have updated the version requirement for the `sqlglot` library in the pyproject.toml file. The previous requirement specified a version greater than or equal to 25.5.0 and less than 25.22, but we have updated it to allow for versions greater than or equal to 25.5.0 and less than 25.23. This change allows us to use the latest version of 'sqlglot', while still ensuring compatibility with other dependencies. Additionally, this pull request includes a detailed changelog from the `sqlglot` repository, which provides information on the features, bug fixes, and changes included in each version. This can help us understand the scope of the update and how it may impact our project.
* [DOCUMENTATION] Improve documentation on using account profile for `sync-workspace-info` cli command ([2683](https://github.com/databrickslabs/ucx/issues/2683)). The `sync-workspace-info` CLI command has been added to the Databricks Labs UCX package, which uploads the workspace configuration to all workspaces in the Databricks account where the `ucx` tool is installed. This feature requires Databricks Account Administrator privileges and is necessary to create an immutable default catalog mapping for the table migration process. It also serves as a prerequisite for the `create-table-mapping` command. To utilize this command, users must configure the Databricks CLI profile with access to the Databricks account console, available at "accounts.cloud.databricks.com" or "accounts.azuredatabricks.net". Additionally, the documentation for using the account profile with the `sync-workspace-info` command has been enhanced, addressing issue [#1762](https://github.com/databrickslabs/ucx/issues/1762).
* [DOCUMENTATION] Improve documentation when installing UCX from a machine with restricted internet access ([2690](https://github.com/databrickslabs/ucx/issues/2690)). "A new section has been added to the `ADVANCED` installation section of the UCX library documentation, providing detailed instructions for installing UCX with a company-hosted PyPI mirror. This feature is intended for environments with restricted internet access, allowing users to bypass the public PyPI index and use a company-controlled mirror instead. Users will need to add all UCX dependencies to the company-hosted PyPI mirror and set the `PIP_INDEX_URL` environment variable to the mirror URL during installation. The solution also includes a prompt asking the user if their workspace blocks internet access. Additionally, the documentation has been updated to clarify that UCX requires internet access to connect to GitHub for downloading the tool, specifying the necessary URLs that need to be accessible. This update aims to improve the installation process for users with restricted internet access and provide clear instructions and prompts for installing UCX on machines with limited internet connectivity."

Dependency updates:

* Updated sqlglot requirement from <25.22,>=25.5.0 to >=25.5.0,<25.23 ([2687](https://github.com/databrickslabs/ucx/pull/2687)).
* Updated databricks-labs-lsql requirement from <0.12,>=0.5 to >=0.5,<0.13 ([2688](https://github.com/databrickslabs/ucx/pull/2688)).
* Updated databricks-labs-blueprint requirement from <0.9,>=0.8 to >=0.8,<0.10 ([2747](https://github.com/databrickslabs/ucx/pull/2747)).

0.37.0

Not secure

* Added ability to run create-missing-principals command as collection ([2675](https://github.com/databrickslabs/ucx/issues/2675)). This release introduces the capability to run the `create-missing-principals` command as a collection in the UCX (Unified Cloud Experience) tool with the new optional flag `run-as-collection`. This allows for more control and flexibility when managing cloud resources, particularly in handling multiple workspaces. The existing `create-missing-principals` command has been modified to accept a new `run_as_collection` parameter, enabling the command to run on multiple workspaces when set to True. The function has been updated to handle a list of `WorkspaceContext` objects, allowing it to iterate over each object and execute necessary actions for each workspace. Additionally, a new `AccountClient` parameter has been added to facilitate the retrieval of all workspaces associated with a specific account. New test functions have been added to `test_cli.py` to test this new functionality on AWS and Azure cloud providers. The `acc_client` argument has been added to the test functions to enable running the tests with an authenticated AWS or Azure client, and the `MockPrompts` object is used to simulate user responses to the prompts displayed during the execution of the command.
* Added storage for direct filesystem references in code ([2526](https://github.com/databrickslabs/ucx/issues/2526)). The open-source library has been updated with a new table `directfs_in_paths` to store Direct File System Access (DFSA) records, extending support for managing and collecting DFSAs as part of addressing issue [#2350](https://github.com/databrickslabs/ucx/issues/2350) and [#2526](https://github.com/databrickslabs/ucx/issues/2526). The changes include a new class `DirectFsAccessCrawlers` and methods for handling DFSAs, as well as linting, testing, and a manually verified schema upgrade. Additionally, a new SQL query deprecates the use of direct filesystem references. The commit is co-authored by Eric Vergnaud, Serge Smertin, and Andrew Snare.
* Added task for linting queries ([2630](https://github.com/databrickslabs/ucx/issues/2630)). This commit introduces a new `QueryLinter` class for linting SQL queries in the workspace, similar to the existing `WorkflowLinter` for jobs. The `QueryLinter` checks for any issues in dashboard queries and reports them in a new `query_problems` table. The commit also includes the addition of unit tests, integration tests, and manual testing of the schema upgrade. The `QueryLinter` method has been updated to include a `TableMigrationIndex` object, which is currently set to an empty list and will be updated in a future commit. This change improves the quality of the codebase by ensuring that all SQL queries are properly linted and any issues are reported, allowing for better maintenance and development of the system. The commit is co-authored by multiple developers, including Eric Vergnaud, Serge Smertin, Andrew Snare, and Cor. Additionally, a new linting rule, "direct-filesystem-access", has been introduced to deprecate the use of direct filesystem references in favor of more abstracted file access methods in the project's codebase.
* Adopt `databricks-labs-pytester` PyPI package ([2663](https://github.com/databrickslabs/ucx/issues/2663)). In this release, we have made updates to the `pyproject.toml` file, removing the `pytest` package version 8.1.0 and updating it to 8.3.3. We have also added the `databricks-labs-pytester` package with a minimum version of 0.2.1. This update also includes the adoption of the `databricks-labs-pytester` PyPI package, which moves fixture usage from `mixins.fixtures` into its own top-level library. This affects various test files, including `test_jobs.py`, by replacing the `get_purge_suffix` fixture with `watchdog_purge_suffix` to standardize the approach to creating and managing temporary directories and files used in tests. Additionally, new fixtures have been introduced in a separate PR for testing the `databricks.labs.ucx` package, including `debug_env_name`, `product_info`, `inventory_schema`, `make_lakeview_dashboard`, `make_dashboard`, `make_dbfs_data_copy`, `make_mounted_location`, `make_storage_dir`, `sql_exec`, and `migrated_group`. These fixtures simplify the testing process by providing preconfigured resources that can be used in the tests. The `redash.py` file has been removed from the `databricks/labs/ucx/mixins` directory as the Redash API is being deprecated and replaced with a new library.
* Assessment: crawl UDFs as a task in parallel to tables instead of implicitly during grants ([2642](https://github.com/databrickslabs/ucx/issues/2642)). This release introduces changes to the assessment workflow, specifically in how User Defined Functions (UDFs) are crawled/scanned. Previously, UDFs were crawled/scanned implicitly by the GrantsCrawler, which requested a snapshot from the UDFSCrawler that hadn't executed yet. With this update, UDFs are now crawled/scanned as their own task, running in parallel with tables before grants crawling begins. This modification addresses issue [#2574](https://github.com/databrickslabs/ucx/issues/2574), which requires grants and UDFs to be refreshable but only once within a given workflow run. A new method, crawl_udfs, has been introduced to iterate over all UDFs in the Hive Metastore of the current workspace and persist their metadata in a table named .udfs. This inventory is utilized when scanning securable objects for issues with grants that cannot be migrated to Unit Catalog. The crawl_grants task now depends on crawl_udfs, crawl_tables, and setup_tacl, ensuring that UDFs are crawled/scanned before grants are.
* Collect direct filesystem access from queries ([2599](https://github.com/databrickslabs/ucx/issues/2599)). This commit introduces support for extracting Direct File System Access (DirectFsAccess) records from workspace queries, adding a new table `directfs_in_queries` and a new view `directfs` that unions `directfs_in_paths` with the new table. The DirectFsAccessCrawlers class has been refactored into two separate classes: `DirectFsAccessCrawler.for_paths` and `DirectFsAccessCrawler.for_queries`, and a new `QueryLinter` class has been introduced to check queries for DirectFsAccess records. Unit tests and manual tests have been conducted to ensure the correct functioning of the schema upgrade. The commit is co-authored by Eric Vergnaud, Serge Smertin, and Andrew Snare.
* Fixed failing integration test: `test_reflect_account_groups_on_workspace_skips_groups_that_already_exists_in_the_workspace` ([2624](https://github.com/databrickslabs/ucx/issues/2624)). In this release, we have made updates to the group migration workflow, addressing an issue ([#2623](https://github.com/databrickslabs/ucx/issues/2623)) where the integration test `test_reflect_account_groups_on_workspace_skips_groups_that_already_exists_in_the_workspace` failed due to unhandled scenarios where a workspace group already existed with the same name as an account group to be reflected. The changes include the addition of a new method, `_workspace_groups_in_workspace()`, which checks for the existence of workspace groups. We have also modified the `group-migration` workflow and integrated test `test_reflect_account_groups_on_workspace_skips_account_groups_when_a_workspace_group_has_same_name`. To enhance consistency and robustness, the `GroupManager` class has been updated with two new methods: `test_reflect_account_groups_on_workspace_warns_skipping_when_a_workspace_group_has_same_name` and `test_reflect_account_groups_on_workspace_logs_skipping_groups_when_already_reflected_on_workspace`. These new methods check if a group is skipped when a workspace group with the same name exists and log a warning message, as well as log skipping groups that are already reflected on the workspace. These improvements ensure that the system behaves as expected during the group migration process, handling cases where workspace groups and account groups share the same name.
* Fixed failing solution accelerator verification tests ([2648](https://github.com/databrickslabs/ucx/issues/2648)). This release includes a fix for an issue in the LocalCodeLinter class that was unable to normalize Python code at the notebook cell level. The solution involved modifying the LocalCodeLinter constructor to include a notebook loader, as well as adding a conditional block to the lint_path method to determine the correct loader to use based on whether the path is a notebook or not. These changes allow the linter to handle Python code more effectively within Jupyter notebook cells. The tests for this change were manually verified using `make solacc` on the files that failed in CI. This commit has been co-authored by Eric Vergnaud. The functionality of the linter remains unchanged, and there is no impact on the overall software functionality. The target audience for this description includes software engineers who adopt this open-source library.
* Fixed handling of potentially corrupt `state.json` of UCX workflows ([2673](https://github.com/databrickslabs/ucx/issues/2673)). This commit introduces a fix for potential corruption of `state.json` files in UCX workflows, addressing issue [#2673](https://github.com/databrickslabs/ucx/issues/2673) and resolving [#2667](https://github.com/databrickslabs/ucx/issues/2667). It updates the import statement in `install.py`, introduces a new `with_extra` function, and centralizes the deletion of jobs, improving code maintainability. Two new methods are added to check if a job is managed by UCX. Additionally, the commit removes deprecation warnings for direct filesystem references in pytester fixtures and adjusts the known.json file to accurately reflect the project's state. A new `Task` method is added for defining UCX workflow tasks, and several test cases are updated to ensure the correct handling of jobs during the uninstallation process. Overall, these changes enhance the reliability and user-friendliness of the UCX workflow installation process.
* Let `create-catalog-schemas` command run as collection ([2653](https://github.com/databrickslabs/ucx/issues/2653)). The `create-catalog-schemas` and `validate-external-locations` commands in the `databricks labs ucx` package have been updated to operate as collections, allowing for simultaneous execution on multiple workspaces. These changes, which resolve issue [#2609](https://github.com/databrickslabs/ucx/issues/2609), include the addition of new parameters and flags to the command signatures and method signatures, as well as updates to the existing functionality for creating catalogs and schemas. The changes have been manually tested and accompanied by unit tests, with integration tests to be added in a future update. The `create-catalog-schemas` command now accepts a list of workspace clients and a `run_as_collection` parameter, and skips existing catalogs and schemas while logging a message. The `validate-external-locations` command also operates as a collection, though specific details about this change are not provided.
* Let `create-uber-principal` command run on collection of workspaces ([2640](https://github.com/databrickslabs/ucx/issues/2640)). The `create-uber-principal` command has been updated to support running on a collection of workspaces, allowing for more efficient management of service principals across multiple workspaces. This change includes the addition of a new flag, `run-as-collection`, which, when set to true, allows the command to run on a collection of workspaces with UCX installed. The command continues to grant STORAGE_BLOB_READER access to Azure storage accounts and identify S3 buckets used in AWS workspaces. The changes also include updates to the testing strategy, with manual testing and unit tests added. Integration tests will be added in a future PR. These modifications enhance the functionality and reliability of the command, improving the user experience for managing workspaces. In terms of implementation, the `create_uber_principal` method in the `access.py` and `cli.py` files has been updated to support running on a collection of workspaces. The modification includes the addition of a new parameter, `run_as_collection`, which, when set to True, allows the method to retrieve a collection of workspace contexts and execute the necessary operations for each context. The changes also include updates to the underlying methods, such as the `aws_profile` method, to ensure the correct cloud provider is being utilized. The behavior of the command has been isolated from the underlying `ucx` functionality by introducing mock values for the uber service principal ID and policy ID. The changes also include updates to the tests to reflect these modifications, with new tests added to ensure that the command behaves correctly when run on a collection of workspaces and to test the error handling for unsupported cloud providers and missing subscription IDs.
* Let `migrate-acls` command run as collection ([2664](https://github.com/databrickslabs/ucx/issues/2664)). The `migrate-acls` command in the `labs.yml` file has been updated to facilitate the migration of access control lists (ACLs) from a legacy metastore to a UC metastore for a collection of workspaces with Unity Catalog (UC) installed. This command now supports running as a collection, enabled by a new optional flag `run-as-collection`. When set to true, the command will run for all workspaces with UC installed, enhancing efficiency and ease of use. The new functionality has been manually tested and verified with added unit tests. However, integration tests are yet to be added. The command is part of the `databricks/labs/ucx` module and is implemented in the `cli.py` file. This update addresses issue [#2611](https://github.com/databrickslabs/ucx/issues/2611) and includes both manual and unit tests.
* Let `migrate-dbsql-dashboards` command to run as collection ([2656](https://github.com/databrickslabs/ucx/issues/2656)). The `migrate-dbsql-dashboards` command in the `databricks labs ucx` command group has been updated to support running as a collection, allowing it to migrate queries for all dashboards in one or more workspaces. This new feature is achieved by adding an optional flag `run-as-collection` to the command. If set to True, the command will be executed for all workspaces with ucx installed, resolving issue [#2612](https://github.com/databrickslabs/ucx/issues/2612). The `migrate-dbsql-dashboards` function has been updated to take additional parameters `ctx`, `run_as_collection`, and `a`. The `ctx` parameter is an optional `WorkspaceContext` object, which can be used to specify the context for a single workspace. If not provided, the function will retrieve a list of `WorkspaceContext` objects for all workspaces. The `run_as_collection` parameter is a boolean flag indicating whether the command should run as a collection. If set to True, the function will iterate over all workspaces and migrate queries for all dashboards in each workspace. The `a` parameter is an optional `AccountClient` object for authentication. Unit tests have been added to ensure that the new functionality works as expected. This feature will be useful for users who need to migrate many dashboards at once. Integration tests will be added in a future update after issue [#2507](https://github.com/databrickslabs/ucx/issues/2507) is addressed.
* Let `migrate-locations` command run as collection ([2652](https://github.com/databrickslabs/ucx/issues/2652)). The `migrate-locations` command in the `databricks labs ucx` library for AWS and Azure has been enhanced to support running as a collection of workspaces, allowing for more efficient management of external locations. This has been achieved by modifying the existing `databricks labs ucx migrate-locations` command and adding a `run_as_collection` flag to specify that the command should run for a collection of workspaces. The changes include updates to the `run` method in `locations.py` to return a list of strings containing the URLs of missing external locations, and the addition of the `_filter_unsupported_location` method to filter out unsupported locations. A new `_get_workspace_contexts` function has been added to return a list of `WorkspaceContext` objects based on the provided `WorkspaceClient`, `AccountClient`, and named parameters. The commit also includes new test cases for handling unsupported cloud providers and testing the `run as collection` functionality with multiple workspaces, as well as manual and unit tests. Note that due to current limitations in unit testing, the `run as collection` tests for both Azure and AWS raise exceptions.
* Let `migrate-tables` command run as collection ([2654](https://github.com/databrickslabs/ucx/issues/2654)). The `migrate-tables` command in the `labs.yml` configuration file has been updated to support running as a collection of workspaces with UCX installed. This change includes adding a new flag `run_as_collection` that, when set to `True`, allows the command to run on all workspaces in the collection, and modifying the existing command to accept an `AccountClient` object and `WorkspaceContext` objects. The function `_get_workspace_contexts` is used to retrieve the `WorkspaceContext` objects for each workspace in the collection. Additionally, the `migrate_tables` command now checks for the presence of hiveserde and external tables and prompts the user to run the `migrate-external-hiveserde-tables-in-place-experimental` and `migrate-external-tables-ctas` workflows, respectively. The command's documentation and tests have also been updated to reflect this new functionality. Integration tests will be added in a future update. These changes improve the scalability and efficiency of the `migrate-tables` command, allowing for easier and more streamlined execution across multiple workspaces.
* Let `validate-external-locations` command run as collection ([2649](https://github.com/databrickslabs/ucx/issues/2649)). In this release, the `validate-external-locations` command has been updated to support running as a collection, allowing it to operate on multiple workspaces simultaneously. This change includes the addition of new parameters `ctx`, `run_as_collection`, and `a` to the `validate-external-locations` command in the `cli.py` file. The `ctx` parameter determines the current workspace context, obtained through the `_get_workspace_contexts` function when `run_as_collection` is set to True. The function queries for all available workspaces associated with the given account client `a`. The `save_as_terraform_definitions_on_workspace` method is then called to save the external locations as Terraform definitions on the workspace. This enhancement improves the validation process for external locations across multiple workspaces. Additionally, the command's implementation has been updated to include the `run_as_collection` parameter, which controls whether the command is executed as a collection, ensuring sequential execution of each statement within the command. The unit tests have been updated to include a test case that verifies this functionality. The `validate_external_locations` function has also been updated to include a `ctx` parameter, which is used to specify the workspace context. These changes improve the functionality of the `validate-external-locations` command, ensuring sequential execution of statements across workspaces.
* Let `validate-groups-membership` command to run as collection ([2657](https://github.com/databrickslabs/ucx/issues/2657)). The latest commit introduces an optional `run-as-collection` flag to the `validate-groups-membership` command in the `labs.yml` configuration file. This flag, when set to true, enables the command to run for a collection of workspaces equipped with UCX. The updated `validate-groups-membership` command in `databricks/labs/ucx/cli.py` now accepts new arguments: `ctx`, `run_as_collection`, and `a`. This change resolves issue [#2613](https://github.com/databrickslabs/ucx/issues/2613) and includes updated unit and manual tests, ensuring thorough functionality verification. The new feature allows software engineers to validate group memberships across multiple workspaces simultaneously, enhancing efficiency and ease of use. When run as a collection, the command validates groups at both the account and workspace levels, comparing memberships for each specified workspace context.
* Removed installing on workspace log message in `_get_installer` ([2641](https://github.com/databrickslabs/ucx/issues/2641)). In this enhancement, the `_get_installer` function in the `install.py` file has undergone modification to improve the clarity of the installation process for users. Specifically, a confusing log message that incorrectly indicated that UCX was being installed when it was not, has been removed. The log message has been relocated to a more accurate position in the codebase. It is important to note that the `_get_installer` function itself has not been modified, only the log message has been removed. This change eliminates confusion about the installation of UCX, thus enhancing the overall user experience.
* Support multiple subscription ids for command line commands ([2647](https://github.com/databrickslabs/ucx/issues/2647)). The `databricks labs ucx` tool now supports multiple subscription IDs for the `create-uber-principal`, `guess-external-locations`, `migrate-credentials`, and `migrate-locations` commands. This change allows users to specify multiple subscriptions for scanning storage accounts, improving management for users who handle multiple subscriptions simultaneously. Relevant flags in the `labs.yml` configuration file have been updated, and unit tests, as well as manual testing, have been conducted to ensure proper functionality. In the `cli.py` file, the `create_uber_principal` and `principal_prefix_access` functions have been updated to accept a list of subscription IDs, affecting the `create_uber_principal` and `principal_prefix_access` commands. The `azure_subscription_id` property has been renamed to `azure_subscription_ids`, modifying the `azureResources` constructor and ensuring correct handling of the subscription IDs.
* Updated databricks-labs-lsql requirement from <0.11,>=0.5 to >=0.5,<0.12 ([2666](https://github.com/databrickslabs/ucx/issues/2666)). In this release, we have updated the version requirement for the `databricks-labs-lsql` library in the 'pyproject.toml' file from a version greater than or equal to 0.5 and less than 0.11 to a version greater than or equal to 0.5 and less than 0.12. This change allows us to use the latest version of the `databricks-labs-lsql` library while still maintaining a version range constraint. This library provides functionality for managing and querying data in Databricks, and this update ensures compatibility with the project's existing dependencies. No other changes are included in this commit.
* Updated sqlglot requirement from <25.21,>=25.5.0 to >=25.5.0,<25.22 ([2633](https://github.com/databrickslabs/ucx/issues/2633)). In this pull request, we have updated the `sqlglot` dependency requirement in the `pyproject.toml` file. The previous requirement was for a minimum version of 25.5.0 and less than 25.21, which has now been changed to a minimum version of 25.5.0 and less than 25.22. This update allows us to utilize the latest version of `sqlglot`, up to but not including version 25.22. While the changelog and commits for the latest version of `sqlglot` have been provided for reference, the specific changes made to the project as a result of this update are not detailed in the pull request description. Therefore, as a reviewer, it is essential to verify the compatibility of the updated `sqlglot` version with our project and ensure that any necessary modifications have been made to accommodate the new version.
* fix test_running_real_remove_backup_groups_job timeout ([2651](https://github.com/databrickslabs/ucx/issues/2651)). In this release, we have made an adjustment to the `test_running_real_remove_backup_groups_job` test case by increasing the timeout of an inner task from 90 seconds to 3 minutes. This change is implemented to address the timeout issue reported in issue [#2639](https://github.com/databrickslabs/ucx/issues/2639). Furthermore, to ensure the correct functioning of the code, we have incorporated integration tests. It is important to note that the functionality of the code remains unaffected. This enhancement aims to provide a more reliable and efficient testing process, thereby improving the overall quality of the open-source library.

Dependency updates:

* Updated sqlglot requirement from <25.21,>=25.5.0 to >=25.5.0,<25.22 ([2633](https://github.com/databrickslabs/ucx/pull/2633)).
* Updated databricks-labs-lsql requirement from <0.11,>=0.5 to >=0.5,<0.12 ([2666](https://github.com/databrickslabs/ucx/pull/2666)).

0.36.0

Not secure

* Added `upload` and `download` cli commands to `upload` and `download` a file to/from a collection of workspaces ([2508](https://github.com/databrickslabs/ucx/issues/2508)). In this release, the Databricks Labs Unified CLI (Command Line Interface) for UCX (Unified CLI for Workspaces, Clusters, and Tables) has been updated with new `upload` and `download` commands. The `upload` command allows users to upload a file to a single workspace or a collection of workspaces, while the `download` command enables users to download a CSV file from a single workspace or a collection of workspaces. This enhances the efficiency of uploading or downloading the same file to multiple workspaces. Both commands display a warning or information message upon completion, and ensure the file schema is correct before uploading CSV files. This feature includes new methods for uploading and downloading files for multiple workspaces, as well as new unit and integration tests. Users can refer to the contributing instructions to help improve the project.
* Added ability to run `create-table-mapping` command as collection ([2602](https://github.com/databrickslabs/ucx/issues/2602)). This PR introduces the capability to run the `create-table-mapping` command as a collection in the `databricks labs ucx` CLI, providing increased flexibility and automation for workflows. A new optional boolean flag, `run-as-collection`, has been added to the `create-table-mapping` command, allowing users to indicate if they want to run it as a collection with a default value of False. The updated `create_table_mapping` function now accepts additional arguments, enabling efficient creation of table mappings for multiple workspaces. Users are encouraged to test this feature in various scenarios and provide feedback for further improvements.
* Added comment on the source tables to capture that they have been deprecated ([2548](https://github.com/databrickslabs/ucx/issues/2548)). A new method, `_sql_add_migrated_comment(self, table: Table, target_table_key: str)`, has been added to the `table_migrate.py` file to mark deprecated source tables with a comment indicating their deprecated status and directing users to the new table. This method is currently being used in three existing methods within the same file to add comments to deprecated tables as part of the migration process. In addition, a new SQL query has been added to set a comment on the source table `hive_metastore.db1_src.managed_dbfs`, indicating that it is deprecated and directing users to the new table `ucx_default.db1_dst.managed_dbfs`. A unit test has also been updated to ensure that the migration process correctly adds the deprecation comment to the source table. This change is part of a larger effort to deprecate and migrate data from old tables to new tables and provides guidance for users to migrate to the new table.
* Added documentation for PrincipalACl migration and delete-missing-principal cmd ([2552](https://github.com/databrickslabs/ucx/issues/2552)). In this open-source library release, the UCX project has added a new command `delete-missing-principals`, applicable only for AWS, to delete IAM roles created by UCX. This command lists all IAM roles generated by the `principal-prefix-access` command and allows for the selection of multiple roles to delete. It checks if the selected roles are mapped to any storage credentials and seeks confirmation before deleting the role and its associated inline policy. Additionally, updates have been made to the `create-uber-principal` and `migrate-locations` commands to apply location ACLs from existing clusters and grant necessary permissions to users. The `create-catalogs-schemas` command has been updated to apply catalog and schema ACLs from existing clusters for both Azure and AWS. The `migrate-tables` command has also been updated to apply table and view ACLs from existing clusters for both Azure and AWS. The documentation of commands that require admin privileges in the UCX project has also been updated.
* Added linting for `spark.sql(...)` calls ([2558](https://github.com/databrickslabs/ucx/issues/2558)). This commit introduces linting for `spark.sql(...)` calls to enhance code quality and consistency by addressing issue [#2558](https://github.com/databrickslabs/ucx/issues/2558). The previous SparkSqlPyLinter linter only checked for table migration, but not other SQL linters like DirectFsAccess linters. This has been rectified by incorporating additional SQL linters for `spark.sql(...)` calls, improving the overall linting functionality of the system. The commit also introduces an abstract base class called Fixer, which enforces the inclusion of a `name` property for all derived classes. Additionally, minor improvements and changes have been made to the codebase. The commit resolves issue [#2551](https://github.com/databrickslabs/ucx/issues/2551), and updates the testing process in `test_functional.py` to test `spark-sql-directfs.py`, ensuring the proper functioning of the linted `spark.sql(...)` calls.
* Document: clarify that the `assessment` job is not intended to be re-run ([2560](https://github.com/databrickslabs/ucx/issues/2560)). In this release, we have updated the behavior of the `assessment` job for Databricks Labs Unity Catalog (UCX) to address confusion around its re-run functionality. Moving forward, the `assessment` job should only be executed once during the initial setup of UCX and should not be re-run to refresh the inventory or findings. If a re-assessment is necessary, UCX will need to be reinstalled first. This change aligns the actual functionality of the `assessment` job and will not affect the daily job that updates parts of the inventory. The `assessment` workflow is designed to detect incompatible entities and provide information for the migration process. It can be executed in parallel or sequentially, and its output is stored in Delta tables for further analysis and decision-making through the assessment report.
* Enabled `migrate-credentials` command to run as collection ([2532](https://github.com/databrickslabs/ucx/issues/2532)). In this pull request, the `migrate-credentials` command in the UCX project's CLI has been updated with a new optional flag, `run_as_collection`, which allows the command to operate on multiple workspaces as a collection. This change introduces the `get_contexts` function and modifies the `delete_missing_principals` function to support the new functionality. The `migrate-credentials` command's behavior for Azure and AWS has been updated to accept an additional `acc_client` argument in its tests. Comprehensive tests and documentation have been added to ensure the reliability and robustness of the new functionality. It is recommended to review the attached testing evidence and ensure the new functionality works as intended without introducing any unintended side effects.
* Escape column names in target tables of the table migration ([2563](https://github.com/databrickslabs/ucx/issues/2563)). In this release, the `escape_sql_identifier` function in the `utils.py` file has been enhanced with a new `maxsplit` parameter, providing more control over the maximum number of splits performed on the input string. This addresses issue [#2544](https://github.com/databrickslabs/ucx/issues/2544) and is part of the existing workflow "-migration-ones". The "tables.py" file in the "databricks/labs/ucx/hive_metastore" directory has been updated to escape column names in target tables, preventing SQL injection attacks. Additionally, a new `ColumnInfo` class and several utility functions have been added to the `fixtures.py` file in the `databricks.labs.ucx` project for generating SQL schemas and column casting. The integration tests for migrating Hive Metastore tables have been updated with new tests to handle column names that require escaping. Lastly, the `test_manager.py` file in the `tests/unit/workspace_access` directory has been refactored by removing the `mock_backend` fixture and adding the `test_inventory_permission_manager_init` method to test the initialization of the `PermissionManager` class. These changes improve security, functionality, and test coverage for software engineers utilizing these libraries in their projects.
* Explain why metastore is checked to exists in group migration workflow in docstring ([2614](https://github.com/databrickslabs/ucx/issues/2614)). In the updated `workflows.py` file, the docstring for the `verify_metastore_attached` method has been revised to explain the necessity of checking if a metastore is attached to the workspace. The reason for this check is that account level groups are only available when a metastore is attached, which is crucial for the group migration workflow to function properly. The method itself remains the same, only verifying the presence of a metastore attached to the workspace and causing the workflow to fail if no metastore is found. This modification enhances the clarity of the metastore check's importance in the context of the group migration workflow.
* Fixed infinite recursion when visiting a dependency graph ([2562](https://github.com/databrickslabs/ucx/issues/2562)). This change addresses an issue of infinite recursion that can occur when visiting a dependency graph, particularly when many files in a package import the package itself. The `visit` method has been modified to only visit each parent/child pair once, preventing the recursion that can occur in such cases. The `dependencies` property has been added to the DependencyGraph class, and the `DependencyGraphVisitor` class has been introduced to handle visiting nodes and tracking visited pairs. These modifications improve the robustness of the library by preventing infinite recursion during dependency resolution. The change includes added unit tests to ensure correct behavior and addresses a blocker for a previous pull request. The functionality of the code remains unchanged.
* Fixed migrate acls CLI command ([2617](https://github.com/databrickslabs/ucx/issues/2617)). In this release, the `migrate acls` command in the ucx project's CLI has been updated to address issue [#2617](https://github.com/databrickslabs/ucx/issues/2617). The changes include the removal of ACL type parameters from the `migrate ACL` command, simplifying its usage and eliminating the need for explicit type specifications. The `legacy_table_acl` and `principal` parameters have been removed from the `migrate_acls` function, while the `hms_fed` parameter remains unchanged and retains its default value if not explicitly provided. These modifications streamline the ACL migration process in the ucx CLI, making it easier for users to manage access control lists.
* Fixes pip install statement in debug notebook ([2545](https://github.com/databrickslabs/ucx/issues/2545)). In this release, we have addressed an issue in the debug notebook where the pip install statement for wheel was incorrectly surrounded by square brackets, causing the notebook run to fail. We have removed the superfluous square brackets and modified the `remote_wheels` list to be joined as a string before being passed to the DEBUG_NOTEBOOK format. It is important to note that this change solely affects the debug notebook and does not involve any alterations to user documentation, CLI commands, workflows, or tables. Furthermore, no new methods have been added, and existing functionality remains unchanged. The change has been manually tested for accuracy, but it does not include any unit tests, integration tests, or staging environment verification.
* More escaping of SQL identifiers ([2530](https://github.com/databrickslabs/ucx/issues/2530)). This commit includes updates to SQL identifier escaping, addressing a missed SQL statement in one of the crawlers and adding support for less-known Spark/Databricks corner cases where backticks in names of identifiers need to be doubled when quoting. The `escape_sql_identifier` function has been modified to consider this new case, and the changes affect the existing `migrate-data-reconciliation` workflow. Additionally, the `TableIdentifier` class has been updated to properly escape identifiers, handling the backticks-in-names scenario. These improvements ensure better handling of SQL identifiers, improving the overall functionality of the codebase. Unit tests have been updated to reflect these changes.
* Retry deploy workflow on `InternalError` ([2525](https://github.com/databrickslabs/ucx/issues/2525)). In the 'workflows.py' file, the `_deploy_workflow` function has been updated to include a retry mechanism using the `retried` decorator, which handles `InternalError` exceptions during workflow creation. This enhancement aims to improve the resilience of deploying workflows by automatically retrying in case of internal errors, thereby addressing issue [#2522](https://github.com/databrickslabs/ucx/issues/2522). This change is part of our ongoing efforts to ensure a robust and fault-tolerant deployment process. The retry mechanism is configured with a timeout of 2 minutes to prevent extended waiting in case of persistent issues, thus enhancing overall system efficiency and reliability.
* Updated databricks-labs-lsql requirement from <0.10,>=0.5 to >=0.5,<0.11 ([2580](https://github.com/databrickslabs/ucx/issues/2580)). In this release, we have updated the requirement for the databricks-labs-lsql package to version 0.10 or lower, with an upper limit of 0.11. Previously, the package version was constrained to be greater than or equal to 0.5 and less than 0.10. This update will allow users to utilize the latest version of the package, which includes new features and bug fixes. For more detailed information on the changes included in this update, please refer to the changelog and release notes provided in the commit message.
* Updated sqlglot requirement from <25.20,>=25.5.0 to >=25.5.0,<25.21 ([2549](https://github.com/databrickslabs/ucx/issues/2549)). In this pull request, we are updating the sqlglot requirement in the pyproject.toml file from a range of >=25.5.0,<25.20 to >=25.5.0,<25.21. This change allows for the installation of the latest version of sqlglot, while ensuring that the version does not exceed 25.21. The update was made in response to a pull request from Dependabot, which identified a new version of sqlglot. The PR includes details of the sqlglot changelog and commits, but as reviewers, we can focus on the specific change made to our project. The sqlglot package is a SQL parser and transpiler that we use as a dependency in this project. This update will ensure that our project is using the latest version of this package, which may include bug fixes, new features, or improvements in performance.

Dependency updates:

* Updated sqlglot requirement from <25.20,>=25.5.0 to >=25.5.0,<25.21 ([2549](https://github.com/databrickslabs/ucx/pull/2549)).
* Updated databricks-labs-lsql requirement from <0.10,>=0.5 to >=0.5,<0.11 ([2580](https://github.com/databrickslabs/ucx/pull/2580)).

0.35.0

Not secure

* Added `databricks labs ucx delete-credential` cmd to delete the UC roles created by UCX ([2504](https://github.com/databrickslabs/ucx/issues/2504)). In this release, we've added several new commands to the `labs.yml` file for managing Unity Catalog (UC) roles in Databricks, specifically for AWS. The new commands include `databricks labs ucx delete-missing-principals` and `databricks labs ucx delete-credential`. The `databricks labs ucx delete-missing-principals` command helps manage UC roles created through the `create-missing-principals` cmd by listing all the UC roles in AWS and allowing users to select roles to delete. It also checks for unused roles before deletion. The `databricks labs ucx delete-credential` command deletes UC roles created by UCX and is equipped with an optional `aws-profile` flag for authentication purposes. Additionally, we've added a new method `delete_uc_role` in the `access.py` file for deleting UC roles and introduced new test cases to ensure correct behavior. These changes resolve issue [#2359](https://github.com/databrickslabs/ucx/issues/2359), improving the overall management of UC roles in AWS.
* Added basic documentation for linter message codes ([2536](https://github.com/databrickslabs/ucx/issues/2536)). A new section, "Linter Message Codes," has been added to the README file, providing detailed explanations, examples, and resolution instructions for various linter message codes related to Unity Catalog (UC) migration. To help users familiarize themselves with the different message codes that may appear during linting, a new command, `python tests/integration/source_code/message_codes.py`, has been implemented. Running this command will display a list of message codes, including `cannot-autofix-table-reference`, `catalog-api-in-shared-clusters`, `changed-result-format-in-uc`, `dbfs-read-from-sql-query`, `dbfs-usage`, `dependency-not-found`, `direct-filesystem-access`, `implicit-dbfs-usage`, `jvm-access-in-shared-clusters`, `legacy-context-in-shared-clusters`, `not-supported`, `notebook-run-cannot-compute-value`, `python-udf-in-shared-clusters`, `rdd-in-shared-clusters`, `spark-logging-in-shared-clusters`, `sql-parse-error`, `sys-path-cannot-compute-value`, `table-migrated-to-uc`, `to-json-in-shared-clusters`, and `unsupported-magic-line`. Users are encouraged to review these message codes and their corresponding explanations to ensure a smooth migration to Unity Catalog.
* Added linters for direct filesystem access in Python and SQL code ([2519](https://github.com/databrickslabs/ucx/issues/2519)). In this release, linters have been added for detecting direct file system access (DFSA) in Python and SQL code, specifically addressing Direct File System Access in the Unity Catalog. Initially, the linters only detect DBFS, but the plan is to expand detection to all DSFSAs. This change is part of issue [#2519](https://github.com/databrickslabs/ucx/issues/2519) and includes new unit tests. The linters will flag code that accesses the file system directly, which is not allowed in Unity Catalog, including SQL queries that read from DBFS and Python code that reads from or displays data using DBFS paths. Developers are required to modify such code to use Unity Catalog tables or volumes instead, ensuring that their code is compatible with Unity Catalog's deprecation of direct file system access and DBFS, ultimately resulting in better project integration and performance.
* Clean up left over uber principal resources for AWS ([2449](https://github.com/databrickslabs/ucx/issues/2449)). In this release, we have made significant improvements to our open-source library, particularly in the management of AWS resources and permissions. We have added new methods to handle the creation and deletion of Uber instance profiles and external locations in AWS. The `create_uber_principal` method in the `AWSResourcePermissions` class has been updated to allow more fine-grained control over the migration process and ensure proper creation and configuration of all necessary resources. Additionally, we have introduced a new `AWSResources` class and `aws_resource_permissions` property in the `aws_cli_ctx` fixture to improve the management of AWS resources and permissions. We have also added new unit tests to ensure proper error handling when creating AWS IAM roles for Unity Catalog Migration (ucx) in specific scenarios. These changes enhance the functionality, test coverage, and overall quality of our library.
* Improve log warning about skipped grants ([2517](https://github.com/databrickslabs/ucx/issues/2517)). In this release, we have implemented improvements to the warning messages displayed during the verification of Unified Client Context (UCX) behavior for Access Control List (ACL) migration and User-Defined Function (UDF) behavior migration. Previously, generic warning messages were logged when specific Hive metastore grants could not be identified. This release enhances the warning messages by providing more specific information about the skipped grants, including the action type of the Hive metastore grant that failed to be mapped. Additionally, unittest.mock has been utilized to create a mock object for the GroupManager class, and a new method called MigrateGrants has been introduced, which applies a list of grant loaders to a specific table. These changes improve the logging and error handling, ensuring that software engineers have a clear understanding of any skipped grants during UCX and UDF behavior migration.
* Support sql notebooks in functional tests ([2513](https://github.com/databrickslabs/ucx/issues/2513)). This pull request introduces support for SQL notebooks in the functional testing framework, expanding its capabilities beyond Python notebooks. The changes include migrating relevant tests from `test_notebook_linter` to functional tests, as well as introducing new classes and methods to support SQL notebook testing. These changes improve the flexibility and scope of the testing framework, enabling developers to test SQL notebooks and ensuring that they meet quality standards. The commit also includes the addition of a new SQL notebook for demonstrating Unity Catalog table migrations, as well as modifications to various tests and regular expressions to accommodate SQL notebooks. Note that environment variables `DATABRICKS_HOST` and `DATABRICKS_TOKEN` are currently hardcoded as `any`, requiring further updates based on the specific testing environment.
* Updated sqlglot requirement from <25.19,>=25.5.0 to >=25.5.0,<25.20 ([2533](https://github.com/databrickslabs/ucx/issues/2533)). In this update, we have updated the `sqlglot` dependency to a version greater than or equal to 25.5.0 and less than 25.20. The previous requirement allowed for versions up to 25.19, but we have chosen to update to a newer version that includes new features and bug fixes. The changelog and commit history for version 25.19 of `sqlglot` are provided in the pull request, highlighting the breaking changes, new features, and bug fixes. By updating to this version, we will benefit from the latest improvements and bug fixes in the `sqlglot` library. We encourage users to review the changelog and test their code to ensure compatibility with the new version.
* [chore] fixed `make fmt` warnings related to sdk upgrade ([2534](https://github.com/databrickslabs/ucx/issues/2534)). In this change, warnings related to a recent SDK upgrade have been addressed by modifying the `create` function in the `fixtures.py` file. The `create` function is responsible for generating a `Wait[ServingEndpointDetailed]` object, which contains information about the endpoint name and its core configuration. The `ServedModelInput` instance was previously created using positional arguments, but has been updated to use named arguments instead (`model_name`, `model_version`, `workload_size`, `scale_to_zero_enabled`). This modification enhances code readability and maintainability, making it easier for software engineers to understand and modify the codebase.

Dependency updates:

* Updated sqlglot requirement from <25.19,>=25.5.0 to >=25.5.0,<25.20 ([2533](https://github.com/databrickslabs/ucx/pull/2533)).

Page 4 of 12

Releases

Has known vulnerabilities

Previous Next

Databricks-labs-ucx

Page 4 of 12

0.40.0

0.39.0

0.38.0

0.37.0

0.36.0

0.35.0

Page 4 of 12

Links

Releases