Databricks-labs-ucx

Latest version: v0.57.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 12

0.46.0

Not secure

* Added `lazy_loader` to known list ([2991](https://github.com/databrickslabs/ucx/issues/2991)). With this commit, the `lazy_loader` module has been added to the known list in the configuration file, addressing a portion of issue [#193](https://github.com/databrickslabs/ucx/issues/193), which may have been caused by the discovery or loading of this module. The `lazy_loader` is a package or module that, once added to the known list, will be recognized and loaded by the system. This change does not affect any existing functionality or introduce new methods. The commit solely updates the known.json file to include `lazy_loader` with an empty list, indicating that it is ready for use. This modification will enable the correct loading and recognition of the `lazy_loader` module in the system.
* Added `librosa` to known list ([2992](https://github.com/databrickslabs/ucx/issues/2992)). In this update, we have added several open-source libraries to the known list in the configuration file, including `librosa`, `llvmlite`, `msgpack`, `pooch`, `soundfile`, and `soxr`. These libraries are commonly used in data engineering, machine learning, and scientific computing tasks. `librosa` is a Python library for audio and music analysis, while `llvmlite` is a lightweight Python interface to the LLVM compiler infrastructure. `msgpack` is a binary serialization format like JSON, `pooch` is a package for managing external data files, `soundfile` is a library for reading and writing audio files, and `soxr` is a library for high-quality audio resampling. Each library has an empty list next to it for specifying additional configuration related to the library. This update partially resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931) by adding `librosa` to the known list, ensuring that these libraries will be properly recognized and utilized by the codebase.
* Added `linkify-it-py` to known list ([2993](https://github.com/databrickslabs/ucx/issues/2993)). In this release, we have added support for two new open-source packages, `linkify-it-py` and `uc-micro-py`, to enhance the software's functionality and compatibility. The addition of `linkify-it-py` and its constituent modules, as well as the incorporation of `uc-micro-py` with its modules and classes, aims to expand the software's capabilities. These changes are related to the resolution of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), and they will enable the software to work seamlessly with these packages, thereby providing a better user experience.
* Added `lz4` to known list ([2994](https://github.com/databrickslabs/ucx/issues/2994)). In this release, we have added support for the LZ4 lossless data compression algorithm, which is known for its focus on compression and decompression speed. The implementation includes four variants: lz4, lz4.block, lz4.frame, and lz4.version, each providing different levels of compression and decompression speed and flexibility. This addition expands the range of supported compression algorithms, providing more options for users to choose from and partially addressing issue [#1931](https://github.com/databrickslabs/ucx/issues/1931) related to supporting additional compression algorithms. This improvement will be beneficial to software engineers working with data compression in their projects.
* Fixed `SystemError: AST constructor recursion depth mismatch` failing the entire job ([3000](https://github.com/databrickslabs/ucx/issues/3000)). This PR introduces more deterministic, Go-style, error handling for parsing Python code, addressing issues that caused the entire job to fail due to a `SystemError: AST constructor recursion depth mismatch` ([#3000](https://github.com/databrickslabs/ucx/issues/3000)) and bug [#2976](https://github.com/databrickslabs/ucx/issues/2976). It includes removing the `AstroidSyntaxError` import, adding an import for `SqlglotError`, and updating the `SqlParseError` exception to `SqlglotError` in the `lint` method of the `SqlLinter` class. Additionally, abstract classes `TablePyCollector` and `DfsaPyCollector` and their respective methods for collecting tables and direct file system accesses have been removed. The `PythonSequentialLinter` class, previously handling multiple responsibilities, has also been removed, enhancing code modularity, understandability, maintainability, and testability. The changes affect the `base.py`, `python_ast.py`, and `python_sequential_linter.py` modules.
* Skip applying permissions for workspace system groups to Unity Catalog resources ([2997](https://github.com/databrickslabs/ucx/issues/2997)). This commit introduces changes to the ACL-related code in the `databricks labs ucx create-catalog-schemas` command and the `migrate-table-*` workflow, skipping the application of permissions for workspace system groups in the Unity Catalog. These system groups, which include 'admins', do not exist at the account level. To ensure the correctness of these modifications, unit and integration tests have been added, including a test that checks the proper handling of user privileges in system groups during catalog schema creation. The `AccessControlResponse` object has been updated for the `admins` and `users` groups, granting them specific permissions for a workspace and warehouse object, respectively, enhancing the system's functionality in multi-user environments with system groups.

0.45.0

Not secure

* Added DBFS Root resolution when HMS Federation is enabled ([2947](https://github.com/databrickslabs/ucx/issues/2947)). This commit introduces a DBFS resolver for use with HMS (Hive Metastore) federation, enabling accurate resolution of DBFS root locations when HMS federation is enabled. A new `_resolve_dbfs_root()` class method is added to the `MountsCrawler` class, and a boolean argument `enable_hms_federation` is included in the `MountsCrawler` constructor, providing better handling of federation functionality. The commit also adds a test function, `test_resolve_dbfs_root_in_hms_federation`, to validate the resolution of DBFS roots with HMS federation. The test covers special cases, such as the `/user/hive/metastore` path, and utilizes `LocationTrie` for more accurate location guessing. These changes aim to improve the overall DBFS root resolution when using HMS federation.
* Added `jax-jumpy` to known list ([2959](https://github.com/databrickslabs/ucx/issues/2959)). In this release, we have added the `jax-jumpy` package to the list of known packages in our system. `jax-jumpy` is a Python-based numerical computation library, which includes modules such as `jumpy`, `jumpy._base_fns`, `jumpy.core`, `jumpy.lax`, `jumpy.numpy`, `jumpy.numpy._factory_fns`, `jumpy.numpy._transform_data`, `jumpy.numpy._types`, `jumpy.numpy.linalg`, `jumpy.ops`, and `jumpy.random`. These modules are now recognized by our system, which partially resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which may have been caused by the integration of the `jax-jumpy` package. Engineers can now utilize the capabilities of this library in their numerical computations.
* Added `joblibspark` to known list ([2960](https://github.com/databrickslabs/ucx/issues/2960)). In this release, we have added support for the `joblibspark` library in our system by updating the `known.json` file, which keeps track of various libraries and their associated components. This change is a part of the resolution for issue [#1931](https://github.com/databrickslabs/ucx/issues/1931) and includes new elements such as `doc`, `doc.conf`, `joblibspark`, `joblibspark.backend`, and `joblibspark.utils`. These additions enable the system to recognize and manage the new components related to `joblibspark`, allowing for improved compatibility and functionality.
* Added `jsonpatch` to known list ([2969](https://github.com/databrickslabs/ucx/issues/2969)). In this release, we have added `jsonpatch` to the list of known libraries in the `known.json` file. Jsonpatch is a library used for applying JSON patches, which allow for partial updates to a JSON document. By including jsonpatch in the known list, developers can now easily utilize its functionality for JSON patching, and any necessary dependencies will be handled automatically. This change partially addresses issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which may have been caused by the use or integration of jsonpatch. We encourage developers to take advantage of this new addition to enhance their code and efficiently make partial updates to JSON documents.
* Added `langchain-community` to known list ([2970](https://github.com/databrickslabs/ucx/issues/2970)). A new entry for `langchain-community` has been added to the configuration file for known language chain components in this release. This entry includes several sub-components such as 'langchain_community.agents', 'langchain_community.callbacks', 'langchain_community.chat_loaders', 'langchain_community.chat_message_histories', 'langchain_community.chat_models', 'langchain_community.cross_encoders', 'langchain_community.docstore', 'langchain_community.document_compressors', 'langchain_community.document_loaders', 'langchain_community.document_transformers', 'langchain_community.embeddings', 'langchain_community.example_selectors', 'langchain_community.graph_vectorstores', 'langchain_community.graphs', 'langchain_community.indexes', 'langchain_community.llms', 'langchain_community.memory', 'langchain_community.output_parsers', 'langchain_community.query_constructors', 'langchain_community.retrievers', 'langchain_community.storage', 'langchain_community.tools', 'langchain_community.utilities', and 'langchain_community.utils'. Currently, these sub-components are empty and have no additional configuration or code. This change partially resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), but the specifics of the issue and how these components will be used are still unclear.
* Added `langcodes` to known list ([2971](https://github.com/databrickslabs/ucx/issues/2971)). A new `langcodes` library has been added to the project, addressing part of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931). This library includes several modules that provide functionalities related to language codes and their manipulation, including `langcodes`, `langcodes.build_data`, `langcodes.data_dicts`, `langcodes.language_distance`, `langcodes.language_lists`, `langcodes.registry_parser`, `langcodes.tag_parser`, and `langcodes.util`. Additionally, the memory-efficient trie (prefix tree) data structure library, `marisa-trie`, has been included in the known list. It is important to note that no existing functionality has been altered in this commit.
* Addressing Ownership Conflict when creating catalog/schemas ([2956](https://github.com/databrickslabs/ucx/issues/2956)). This release introduces new functionality to handle ownership conflicts during catalog/schema creation in our open-source library. The `_apply_from_legacy_table_acls` method has been enhanced with two loops to address non-own grants and own grants separately. This ensures proper handling of ownership conflicts by generating and executing UC grant SQL for each grant type, with appropriate exceptions. Additionally, a new helper function, `this_type_and_key()`, has been added to improve code readability. The release also introduces new methods, GrantsCrawler and Rule, in the hive_metastore package of the labs.ucx module, responsible for populating views and mapping source and destination objects. The test_catalog_schema.py file has been updated to include tests for creating catalogs and schemas with legacy ACLs, utilizing the new Rule method and GrantsCrawler. Issue [#2932](https://github.com/databrickslabs/ucx/issues/2932) has been addressed with these changes, which include adding new methods and updating existing tests for hive_metastore.
* Clarify `skip` and `unskip` commands work on views ([2962](https://github.com/databrickslabs/ucx/issues/2962)). In this release, the `skip` and `unskip` commands in the databricks labs UCX tool have been updated to clarify their functionality on views and to make it more explicit with the addition of the `--view` flag. These commands allow users to skip or unskip certain schemas, tables, or views during the table migration process. This is useful for temporarily disabling migration of a particular schema, table, or view. Unit tests have been added to ensure the correct behavior of these commands when working with views. Two new methods have been added to test the behavior of the `unskip` command when a schema or table is specified, and two additional methods test the behavior of the `unskip` command when a view or no schema is specified. Finally, two methods test that an error message is logged when both the `--table` and `--view` flags are specified.
* Fixed issue with migrating MANAGED hive_metastore table to UC ([2928](https://github.com/databrickslabs/ucx/issues/2928)). This commit addresses an issue with migrating Hive Metastore (HMS) MANAGED tables to Unity Catalog (UC) as EXTERNAL, where deleting a MANAGED table can result in data loss. To prevent this, a new option `CONVERT_TO_EXTERNAL` has been added to the `migrate_tables` method for migrating managed tables to UC as external, ensuring that the HMS managed table is converted to an external table in HMS and UC, and protecting against data loss when deleting a managed table that has been migrated to UC as external. Additionally, new caching properties have been added for better performance, and existing methods have been modified to handle the migration of managed tables to UC as external. Tests, including unit and integration tests, have been added to ensure the proper functioning of these changes. It is important to note that changing MANAGED tables to EXTERNAL can have potential consequences on regulatory data cleanup, and the impact of this change should be carefully validated for existing workloads.
* Let `create-catalogs-schemas` reuse `MigrateGrants` so that it applies group renaming ([2955](https://github.com/databrickslabs/ucx/issues/2955)). The `create-catalogs-schemas` command in the `databricks labs ucx` package has been enhanced to reuse the `MigrateGrants` function, enabling group renaming and eliminating redundant code. The `migrate-tables` workflow remains functionally the same. Changes include modifying the `CatalogSchema` class to accept a `migrate_grants` argument, introducing new `Catalog` and `Schema` dataclasses, and updating various methods in the `hive_metastore` module. Unit and integration tests have been added and manually verified to ensure proper functionality. The `MigrateGrants` class has been updated to accept two `SecurableObject` arguments and sort matched grants. The `from_src_dst` function in `mapping.py` now includes a new `as_uc_table` method and updates to `as_uc_table_key`. Addressing issues [#2934](https://github.com/databrickslabs/ucx/issues/2934), [#2932](https://github.com/databrickslabs/ucx/issues/2932), and [#2955](https://github.com/databrickslabs/ucx/issues/2955), the changes also include a new `key` property for the `tables.py` file, and updates to the `test_create_catalogs_schemas` and `test_migrate_tables` test functions.
* Updated sqlglot requirement from <25.25,>=25.5.0 to >=25.5.0,<25.26 ([2968](https://github.com/databrickslabs/ucx/issues/2968)). A update has been made to the sqlglot requirement in the pyproject.toml file, changing the version range from allowing versions 25.5.0 and later, but less than 25.25, to a range that allows for versions 25.5.0 and later, but less than 25.26. This change was implemented to allow for the latest version of sqlglot compatible with the project's requirements. The commit message includes a detailed changelog for sqlglot version 25.25.0, highlighting breaking changes, new features, bug fixes, and refactors. The commits included in the pull request are also outlined in the message, providing a comprehensive overview of the updates made.
* Use `LocationTrie` to infer a list of UC external locations ([2965](https://github.com/databrickslabs/ucx/issues/2965)). This pull request introduces the `LocationTrie` class to improve the efficiency of determining external locations in UC storage. The `LocationTrie` class, implemented as a dataclass, maintains a hierarchical tree structure to store location paths and offers methods for inserting, finding, and checking if a table exists in the trie. It also refactors the existing `_parse_location` method and adds the `_parse_url` method to handle specific cases of JDBC URLs. The `find` method and `_external_locations` method are updated to use the new `LocationTrie` class, ensuring a more efficient way of determining overlapping storage prefixes. Additionally, the test file for external locations has been modified to include new URL formats and correct previously incorrect ones, enhancing compatibility with various URL formats. Overall, these changes are instrumental in preparing the codebase for future federated SQL connections by optimizing the determination of external locations and improving compatibility with diverse URL formats.
* Warn when table has column without no name during table migration ([2984](https://github.com/databrickslabs/ucx/issues/2984)). In this release, we have introduced changes to the table migration feature in the databricks project to address issue [#2891](https://github.com/databrickslabs/ucx/issues/2891). We have renamed the `_migrate_table` method in the `TableMigration` class to `_safe_migrate_table` and added a new `_migrate_table` method that includes a try-except block to catch a specific Spark AnalysisException caused by an invalid column name. If this exception is caught, a warning is logged, and the function returns False. This change ensures that the migration process continues even when it encounters a table with a column without a name, generating a warning instead of raising an error. Additionally, we have added unit tests to verify the new functionality, including the introduction of a new mock backend `MockBackendWithGeneralException` to raise a general exception for testing purposes and a new test case `test_migrate_tables_handles_table_with_empty_column` to validate the warning message generated when encountering a table with an empty column during migration.

Dependency updates:

* Updated sqlglot requirement from <25.25,>=25.5.0 to >=25.5.0,<25.26 ([2968](https://github.com/databrickslabs/ucx/pull/2968)).

0.44.0

Not secure

* Added `imbalanced-learn` to known list ([2943](https://github.com/databrickslabs/ucx/issues/2943)). A new open-source library, "imbalanced-learn," has been added to the project's known list of libraries, providing various functionalities for handling imbalanced datasets. The addition includes modules such as "imblearn", "imblearn._config", "imblearn._min_dependencies", "imblearn._version", "imblearn.base", and many others, enabling features such as over-sampling, under-sampling, combining sampling techniques, and creating ensembles. This change partially resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which may have been related to the handling of imbalanced datasets, thereby enhancing the project's ability to manage such datasets.
* Added `importlib_resources` to known list ([2944](https://github.com/databrickslabs/ucx/issues/2944)). In this update, we've added the `importlib_resources` package to the known list in the `known.json` file. This package offers a consistent and straightforward interface for accessing resources such as data files and directories in Python packages. It includes several modules, including `importlib_resources`, `importlib_resources._adapters`, `importlib_resources._common`, `importlib_resources._functional`, `importlib_resources._itertools`, `importlib_resources.abc`, `importlib_resources.compat`, `importlib_resources.compat.py38`, `importlib_resources.compat.py39`, `importlib_resources.future`, `importlib_resources.future.adapters`, `importlib_resources.readers`, and `importlib_resources.simple`. These modules provide various functionalities for handling resources within a Python package. By adding this package to the known list, we enable its usage and integration with the project's codebase. This change partially addresses issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), improving the management and accessibility of resources within our Python packages.
* Dependency update: ensure we install with at least version 0.9.1 of `databricks-labs-blueprint` ([2950](https://github.com/databrickslabs/ucx/issues/2950)). In the updated `pyproject.toml` file, the version constraint for the `databricks-labs-blueprint` dependency has been revised to range between 0.9.1 and 0.10, specifically targeting 0.9.1 or higher. This modification ensures the incorporation of a fixed upstream issue (databrickslabs/blueprint[#157](https://github.com/databrickslabs/ucx/issues/157)), which was integrated in the 0.9.1 release. This adjustment was triggered by a preceding change ([#2920](https://github.com/databrickslabs/ucx/issues/2920)) that standardized notebook paths, thereby addressing issue [#2882](https://github.com/databrickslabs/ucx/issues/2882), which was dependent on this upstream correction. By embracing this upgrade, users can engage the most recent dependency version, thereby ensuring the remediation of the aforementioned issue.
* Fixed an issue with source table deleted after migration ([2927](https://github.com/databrickslabs/ucx/issues/2927)). In this release, we have addressed an issue where a source table was marked as migrated even after it was deleted following migration. An exception handling mechanism has been added to the `is_migrated` method to return `True` and log a warning message if the source table does not exist, indicating that it has been migrated. A new test function, `test_migration_index_deleted_source`, has also been included to verify the migration index behavior when the source table no longer exists. This function creates a source and destination table, sets the destination table's `upgraded_from` property to the source table, drops the source table, and checks if the migration index contains the source table and if an error message was recorded, indicating that the source table no longer exists. The `get_seen_tables` method remains unchanged in this diff.
* Improve robustness of `sqlglot` failure handling ([2952](https://github.com/databrickslabs/ucx/issues/2952)). This PR introduces changes to improve the robustness of error handling in the `sqlglot` library, specifically targeting issues with inadequate parsing quality. The `collect_table_infos` method has been updated and renamed to `collect_used_tables` to accurately gather information about tables used in a SQL expression. The `lint_expression` and `collect_tables` methods have also been updated to use the new `collect_used_tables` method for better accuracy. Additionally, methods such as `find_all`, `walk_expressions`, and the test suite for the SQL parser have been enhanced to handle potential failures and unsupported SQL syntax more gracefully, by returning empty lists or logging warning messages instead of raising errors. These changes aim to improve the reliability and robustness of the `sqlglot` library, enabling it to handle unexpected input more effectively.
* Log warnings when mounts are discovered on incorrect cluster type ([2929](https://github.com/databrickslabs/ucx/issues/2929)). The `migrate-tables` command in the ucx project's CLI now includes a verification step to ensure the successful completion of a prerequisite assessment workflow before execution. If this workflow has not been completed, a warning message is logged and the command is not executed. A new exception handling mechanism has been implemented for the `dbutils.fs.mounts()` method, which logs a warning and skips mount point discovery if an exception is raised. A new unit test has been added to verify that a warning is logged when attempting to discover mounts on an incompatible cluster type. The diff also includes a new method `VerifyProgressTracking` for verifying progress tracking and updates to existing test methods to include verification of successful runs and error handling before assessment. These changes improve the handling of edge cases in the mount point discovery process, add warnings for mounts on incorrect cluster types, and increase test coverage with progress tracking verification.
* `create-uber-principal` fixes and improvements ([2941](https://github.com/databrickslabs/ucx/issues/2941)). This change introduces fixes and improvements to the `create-uber-principal` functionality within the `databricks-sdk-py` project, specifically targeting the Azure access module. The main enhancements include addressing an issue with the Databricks warehouses API by adding the `set_workspace_warehouse_config_wrapper` function, modifying the command to request the uber principal name only when necessary, improving storage account crawl logic, and introducing new methods to manage workspace-level configurations. Error handling mechanisms have been fortified through added and modified try-except blocks. Additionally, several unit and integration tests have been implemented and verified to ensure the functionality is correct and running smoothly. These changes improve the overall robustness and versatility of the `create-uber-principal` command, directly addressing issues [#2764](https://github.com/databrickslabs/ucx/issues/2764), [#2771](https://github.com/databrickslabs/ucx/issues/2771), and progressing on [#2949](https://github.com/databrickslabs/ucx/issues/2949).

0.43.0

Not secure

* Added `imageio` to known list ([2942](https://github.com/databrickslabs/ucx/issues/2942)). In this release, we have added `imageio` to our library's known list, which includes all its modules, sub-modules, testing, and typing packages. This change addresses issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which may have been caused by a dependency or compatibility issue. The `imageio` library offers I/O functionality for scientific imaging data, and its addition is expected to expand the library's supported formats and functionality. As a result, software engineers can leverage the enhanced capabilities to handle scientific imaging data more effectively.
* Added `ipyflow-core` to known list ([2945](https://github.com/databrickslabs/ucx/issues/2945)). In this release, the project has expanded its capabilities by adding two open-source libraries to a known list contained in a JSON file. The first library, `ipyflow-core`, brings a range of modules for the data model, experimental features, frontend, kernel, patches, shell, slicing, tracing, types, and utils. The second library, `pyccolo`, offers fast and adaptable code transformation using abstract syntax trees, with functionalities including code rewriting, import hooks, syntax augmentation, and tracing, along with various utility functions. By incorporating these libraries into the project, we aim to enhance its overall efficiency and versatility, providing software engineers with access to a broader set of tools and capabilities.
* Added `isodate` to known list ([2946](https://github.com/databrickslabs/ucx/issues/2946)). In this release, we have added the `isodate` package to our library's known package list, which resolves part of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931). The `isodate` package provides several modules for parsing and manipulating ISO 8601 dated strings, including `isodate`, `isodate.duration`, `isodate.isodates`, `isodate.isodatetime`, `isodate.isoduration`, `isodate.isoerror`, `isodate.isostrf`, `isodate.isotime`, `isodate.isotzinfo`, and `isodate.tzinfo`. This addition enhances our compatibility and integration with the `isodate` package in the larger system, enabling users to utilize the full functionality of the `isodate` package in their applications.
* Experimental command for enabling HMS federation ([2939](https://github.com/databrickslabs/ucx/issues/2939)). In this release, we have introduced an experimental feature for enabling HMS (Hive Metastore) federation through a new `enable-hms-federation` command in the labs.yml file. This command, when enabled, will create a federated HMS catalog synced with the workspace HMS in a hierarchical manner, facilitating migration and integration of HMS models. Additionally, we have added an optional `enable_hms_federation` constructor argument to the `Locations` class in the locations.py file. Setting this flag to True enables a fallback mode for AWS resources to use HMS for data access. The `HiveMetastoreFederationEnabler` class is introduced with an `enable()` method to modify the workspace configuration and enable HMS federation. These changes aim to provide a more streamlined experience for users working with complex modeling systems, and careful testing and feedback are encouraged on this experimental feature.
* Experimental support for HMS federation ([2283](https://github.com/databrickslabs/ucx/issues/2283)). In this release, we introduce experimental support for Hive Metastore (HMS) federation in our open-source library. A new `HiveMetastoreFederation` class has been implemented, enabling the registration of an internal HMS as a federated catalog. This class utilizes the `WorkspaceClient` object from the `databricks.sdk` library to create necessary connections and handles permissions for successful federation. Additionally, a new file `test_federation.py` has been added, containing unit tests to demonstrate the functionality of HMS federation, including the creation of federated catalogs and handling of existing connections. As this is an experimental feature, users should expect potential issues and are encouraged to provide feedback to help improve its functionality.
* Fixed `InvalidParameterValue` failure for scanning jobs running on interactive clusters that got deleted ([2935](https://github.com/databrickslabs/ucx/issues/2935)). In this release, we have addressed an issue where an `InvalidParameterValue` error was not being handled properly during scanning jobs run on interactive clusters that were deleted. This error has now been added to the exceptions handled in the `_register_existing_cluster_id` and `_register_cluster_info` methods. These methods retrieve information about an existing cluster or its ID, and if the cluster is not found or an invalid parameter value is provided, they now yield a `DependencyProblem` object with an appropriate error message. This `DependencyProblem` object is used to indicate that there is a problem with the dependencies required for the job, preventing it from running successfully. By handling this error, the code ensures that the job can fail gracefully and provide a clear and informative error message to the user, avoiding any potential confusion or unexpected behavior.
* Improve logging when skipping legacy grant in `create-catalogs-schemas` ([2933](https://github.com/databrickslabs/ucx/issues/2933)). In this update, the `create-catalogs-schemas` process has been improved with enhanced logging for skipped legacy grants. This change is a follow-up to previous issue [#2917](https://github.com/databrickslabs/ucx/issues/2917) and progresses issue [#2932](https://github.com/databrickslabs/ucx/issues/2932). The `_apply_from_legacy_table_acls` and `_update_principal_acl` methods now include more descriptive logging when a legacy grant is skipped, providing information about the type of grant being skipped and clarifying that it is not supported in the Unity Catalog. Additionally, a new method `get_interactive_cluster_grants` has been added to the `principal_acl` object, returning a list of grants specific to the interactive cluster. The `hive_acl` object is now autospec'd after the `principal_acl.get_interactive_cluster_grants` call. The `test_catalog_schema_acl` function has been updated to reflect these changes. New grants have been added to the `hive_grants` list, including grants for `user1` with `USE` action type on `hive_metastore` catalog and grants for `user2` with `USAGE` action type on `schema3` database. A new grant for `user4` with `DENY` action type on `schema3` database has also been added, but it is skipped in the logging due to it not being supported in UC. Skipped legacy grants for `DENY` action type on `catalog2` catalog and 'catalog2.schema2' database are also included in the commit. These updates improve the clarity and usefulness of the logs, making it easier for users to understand what is happening during the migration of grants to UC and ensuring that unsupported grants are not inadvertently included in the UC.
* Notebook linting: ensure path-type is preserved during linting ([2923](https://github.com/databrickslabs/ucx/issues/2923)). In this release, we have enhanced the type safety of the `NotebookResolver` class in the `loaders.py` module by introducing a new type variable `PathT`. This change includes an update to the `_adjust_path` method, which ensures the preservation of the original file suffix when adding the ".py" suffix for Python notebooks. This addresses a potential issue where a `WorkspacePath` instance could be incorrectly converted to a generic `Path` instance, causing downstream errors. Although this change may potentially resolve issue [#2888](https://github.com/databrickslabs/ucx/issues/2888), the reproduction steps for that issue were not provided in the commit message. It is important to note that while this change has been manually tested, it does not include any new unit tests, integration tests, or staging environment verification.

0.42.0

Not secure

* Added `google-cloud-storage` to known list ([2827](https://github.com/databrickslabs/ucx/issues/2827)). In this release, we have added the `google-cloud-storage` library, along with its various modules and sub-modules, to our project's known list in a JSON file. Additionally, we have included the `google-crc32c` and `google-resumable-media` libraries. These libraries provide functionalities such as content addressable storage, checksum calculation, and resumable media upload and download. This change is a partial resolution to issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which is likely related to the integration or usage of these libraries in the project. Software engineers should take note of these additions and how they may impact the project's functionality.
* Added `google-crc32c` to known list ([2828](https://github.com/databrickslabs/ucx/issues/2828)). With this commit, we have added the `google-crc32c` library to our system's known list, addressing part of issue [#1931](https://github.com/databrickslabs/ucx/issues/1931). This addition enhances the overall functionality of the system by providing efficient and high-speed CRC32C computation when utilized. The `google-crc32c` library is known for its performance and reliability, and by incorporating it into our system, we aim to improve the efficiency and robustness of the CRC32C computation process. This enhancement is part of our ongoing efforts to optimize the system and ensure a more efficient experience for our end-users. With this change, users can expect faster and more reliable CRC32C computations in their applications.
* Added `holidays` to known list ([2906](https://github.com/databrickslabs/ucx/issues/2906)). In this release, we have expanded the known list in our open-source library to include a new `holidays` category, aimed at supporting tracking of holidays for different countries, religions, and financial institutions. This category includes several subcategories, such as calendars, countries, deprecation, financial holidays, groups, helpers, holiday base, mixins, observed holiday base, registry, and utils. Each subcategory contains an empty list, allowing for future data storage related to holidays. This change partially resolves issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), and represents a significant step towards supporting a more comprehensive range of holiday tracking needs in our library. Software engineers may utilize this new feature to build applications that require tracking and management of various holidays and related data.
* Added `htmlmin` to known list ([2907](https://github.com/databrickslabs/ucx/issues/2907)). In this update, we have added the `htmlmin` library to the `known.json` configuration file's list of known libraries. This addition enables the use and management of `htmlmin` and its components, including `htmlmin.command`, `htmlmin.decorator`, `htmlmin.escape`, `htmlmin.main`, `htmlmin.middleware`, `htmlmin.parser`, `htmlmin.python3html`, and `htmlmin.python3html.parser`. This change partially addresses issue [#1931](https://github.com/databrickslabs/ucx/issues/1931), which may have been caused by the integration or usage of `htmlmin`. Software engineers can now utilize `htmlmin` and its features in their projects, thanks to this enhancement.
* Document preparing external locations when creating catalogs ([2915](https://github.com/databrickslabs/ucx/issues/2915)). Databricks Labs' UCX tool has been updated to incorporate the preparation of external locations when creating catalogs during the upgrade to Unity Catalog (UC). This enhancement involves the addition of new documentation outlining how to physically separate data in storage within UC, adhering to Databricks' best practices. The `create-catalogs-schemas` command has been updated to create UC catalogs and schemas based on a mapping file, allowing users to reuse previously created external locations or establish new ones outside of UCX. For data separation, users can leverage external locations when using subpaths, providing flexibility in data management during the upgrade process.
* Fixed `KeyError` from `assess_workflows` task ([2919](https://github.com/databrickslabs/ucx/issues/2919)). In this release, we have made significant improvements to error handling in our open-source library. We have fixed a KeyError in the `assess_workflows` task and modified the `_safe_infer_internal` and `_unsafe_infer_internal` methods to handle both `InferenceError` and `KeyError` during inference. When an error occurs, we now log the error message with the node and yield a `Uninferable` object. Additionally, we have updated the `do_infer_values` method of the `_LocalInferredValue` class to yield an iterator of iterables of `NodeNG` objects. We have added multiple unit tests for inferring values in Python code, including cases for handling externally defined values and their absence. These changes ensure that our library can handle errors more gracefully and provide more informative feedback during inference, making it more robust and easier to use in software engineering projects.
* Fixed `OSError: [Errno 95]` bug in `assess_workflows` task by skipping GIT-sourced workflows from static code analysis ([2924](https://github.com/databrickslabs/ucx/issues/2924)). In this release, we have resolved the `OSError: [Errno 95]` bug in the `assess_workflows` task that occurred while performing static code analysis on GIT-sourced workflows. A new attribute `Source` has been introduced in the `jobs` module of the `databricks.sdk.service` package to identify the source of a notebook task. If the notebook task source is GIT, a new `DependencyProblem` is raised, indicating that notebooks in GIT should be analyzed using the `databricks labs ucx lint-local-code` CLI command. The `_register_notebook` method has been updated to check if the notebook task source is GIT and return an appropriate `DependencyProblem` message. This change enhances the reliability of the `assess_workflows` task by avoiding the aforementioned bug and provides a more informative message when notebooks are sourced from GIT. This change is part of our ongoing effort to improve the project's quality and reliability and benefits software engineers who adopt the project.
* Fixed absolute path normalisation in source code analysis ([2920](https://github.com/databrickslabs/ucx/issues/2920)). In this release, we have addressed an issue with the Workspace API not supporting relative subpaths such as "/a/b/../c", which has been resolved by resolving workspace paths before calling the API. This fix is backward compatible and ensures the correct behavior of the source code analysis. Additionally, we have added integration tests and co-authored this commit with Eric Vergnaud and Serge Smertin. Furthermore, we have added a new test case that supports relative grand-parent paths in the dependency graph construction, utilizing a new `NotebookLoader` class. This loader is responsible for loading the notebook content and metadata given a path, and this new test case exercises the path resolution logic when a notebook depends on another notebook located two levels up in the directory hierarchy. These changes improve the robustness and reliability of the source code analysis in the presence of relative paths.
* Fixed downloading wheel libraries from DBFS on mounted Azure Storage fail with access denied ([2918](https://github.com/databrickslabs/ucx/issues/2918)). In this release, we have introduced enhancements to the library's handling of registering and downloading wheel libraries from DBFS on mounted Azure Storage, addressing an issue that resulted in access denied errors. The changes include improved error handling with the addition of a `try-except` block to handle potential `BadRequest` exceptions and the inclusion of three new methods to register different types of libraries. The `_register_requirements_txt` method reads requirements files and registers each library specified in the file, logging a warning message for any references to other requirements or constraints files. The `_register_whl` method creates a temporary copy of the given wheel file in the local file system and registers it, while the `_register_egg` method checks the runtime version and yields a `DependencyProblem` if the version is greater than (14, 0). These changes simplify the code and enhance error handling while addressing the reported issues related to registering libraries. The changes are implemented in the `jobs.py` file located in the `databricks/labs/ucx/source_code` directory, which also includes the import of the `BadRequest` exception class from `databricks.sdk.errors`.
* Fixed issue with migrating MANAGED hive_metastore table to UC ([2892](https://github.com/databrickslabs/ucx/issues/2892)). In this release, we have implemented changes to address the issue of migrating HMS (Hive Metastore) managed tables to UC (Unity Catalog) as EXTERNAL. Historically, deleting a managed table also removed the underlying data, leading to potential data loss and making the UC table unusable. The new approach provides options to mitigate these issues, including migrating as EXTERNAL or cloning the data to maintain integrity. These changes aim to prevent accidental data deletion, ensure data recoverability, and avoid inconsistencies when new data is added to either HMS or UC. We have introduced new class attributes, methods, and parameters in relevant modules such as `WorkspaceConfig`, `Table`, `migrate_tables`, and `install.py`. These modifications support the new migration strategies and allow for more flexibility in managing how tables are migrated and how data is handled. The upgrade process can be triggered using the `migrate-tables` UCX command or by running the table migration workflows deployed to the workspace. Thorough testing and documentation have been performed to minimize risks of data inconsistencies during migration. It is crucial to understand the implications of these changes and carefully consider the trade-offs before migrating managed tables to UC as EXTERNAL.
* Improve creating UC catalogs ([2898](https://github.com/databrickslabs/ucx/issues/2898)). In this release, the process of creating Unity Catalog (UC) catalogs has been significantly improved with the resolution of several issues discovered during manual testing. The `databricks labs ucx create-ucx-catalog/create-catalogs-schemas` command has been updated to ensure a better user experience and enhance consistency. Changes include requesting the catalog location even if the catalog already exists, eliminating multiple loops over storage locations, and improving logging and matching storage locations. The code now includes new checks to avoid requesting a catalog's storage location if it already exists and updates the behavior of the `_create_catalog_validate` and `_validate_location` methods. Additionally, new unit tests have been added to verify these changes. Under the hood, a new method, `get_catalog`, has been introduced to the `WorkspaceClient` class, and several test functions, such as `test_create_ucx_catalog_skips_when_ucx_catalogs_exists` and `test_create_all_catalogs_schemas_creates_catalogs`, have been implemented to ensure the proper functioning of the updated command. This release addresses issue [#2879](https://github.com/databrickslabs/ucx/issues/2879) and enhances the overall process of creating UC catalogs, making it more efficient and reliable.
* Improve logging when skipping grant a in `create-catalogs-schemas` ([2917](https://github.com/databrickslabs/ucx/issues/2917)). In this release, the logging for skipping grants in the `_update_principal_acl` method of the `CatalogSchema` class has been improved. The code now logs a more detailed message when it cannot identify a UC grant for a specific grant object, indicating that the grant is a legacy grant that is not supported in UC, along with the grant's action type and associated object. This change provides more context for debugging and troubleshooting purposes. Additionally, the functionality of using a `DENY` grant instead of a `USAGE` grant for a specific principal and schema in the hive metastore has been introduced. The test case `test_catalog_schema_acl()` in the `test_catalog_schema.py` file has been updated to reflect this new behavior. A new test case `test_create_all_catalogs_schemas_logs_untranslatable_grant(caplog)` has also been added to verify the new logging behavior for skipping legacy grants that are not supported in UC. These changes improve the logging system and enhance the `CatalogSchema` class functionality in the open-source library.
* Verify migration progress prerequisites during UCX catalog creation ([2912](https://github.com/databrickslabs/ucx/issues/2912)). In this update, a new method `verify()` has been added to the `verify_progress_tracking` object in the `workspace_context` object to verify the prerequisites for UCX catalog creation. The prerequisites include the existence of a UC metastore, a UCX catalog, and a successful `assessment` job run. If the assessment job is pending or running, the code will wait up to 1 hour for it to finish before considering the prerequisites unmet. This feature includes modifications to the `create-ucx-catalog` CLI command and adds unit tests. This resolves issue [#2816](https://github.com/databrickslabs/ucx/issues/2816) and ensures that the migration progress prerequisites are met before creating the UCX catalog. The `VerifyProgressTracking` class has been added to the `databricks.labs.ucx.progress.install` module and is used in the `Application` class. The changes include a new `timeout` argument to specify the waiting time for pending or running assessment jobs. The commit also includes several new unit tests for the `VerifyProgressTracking` class and modifications to the `test_install.py` file in the `tests/unit/progress` directory. The code has been manually tested and meets the requirements.

0.41.0

Not secure

* Added UCX history schema and table for storing UCX's artifact ([2744](https://github.com/databrickslabs/ucx/issues/2744)). In this release, we have introduced a new dataclass `Historical` to store UCX artifacts for migration progress tracking, including attributes such as workspace identifier, job run identifier, object type, object identifier, data, failures, owner, and UCX version. The `ProgressTrackingInstallation` class has been updated to include a new method for deploying a table for historical records using the `Historical` dataclass. Additionally, we have modified the `databricks labs ucx create-ucx-catalog` command, and updated the integration test file `test_install.py` to include a parametrized test function for checking if the `workflow_runs` and `historical` tables are created by the UCX installation. We have also renamed the function `test_progress_tracking_installation_run_creates_workflow_runs_table` to `test_progress_tracking_installation_run_creates_tables` to reflect the addition of the new table. These changes add necessary functionality for tracking UCX migration progress and provide associated tests to ensure correctness, thereby improving UCX's progress tracking functionality and resolving issue [#2572](https://github.com/databrickslabs/ucx/issues/2572).
* Added `hjson` to known list ([2899](https://github.com/databrickslabs/ucx/issues/2899)). In this release, we are excited to announce the addition of support for the Hjson library, addressing partial resolution for issue [#1931](https://github.com/databrickslabs/ucx/issues/1931) related to configuration. This change integrates the following Hjson modules: hjson, hjson.compat, hjson.decoder, hjson.encoder, hjson.encoderH, hjson.ordered_dict, hjson.scanner, and hjson.tool. Hjson is a powerful library that enhances JSON functionality by providing comments and multi-line strings. By incorporating Hjson into our library's known list, users can now leverage its advanced features in a more streamlined and cohesive manner, resulting in a more versatile and efficient development experience.
* Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 ([2894](https://github.com/databrickslabs/ucx/issues/2894)). In this version bump from acceptance/v0.3.0 to 0.3.1 of the databrickslabs/sandbox library, several enhancements and bug fixes have been implemented. These changes include updates to the README file with instructions on how to use the library with the databricks labs sandbox command, fixes for the `unsupported protocol scheme` error, and the addition of more git-related libraries. Additionally, dependency updates for golang.org/x/crypto from version 0.16.0 to 0.17.0 have been made in the /go-libs and /runtime-packages directories. This version also introduces new commits that allow larger logs from acceptance tests and implement experimental OIDC refresh token rotation. The tests using this library have been updated to utilize the new version to ensure compatibility and functionality.
* Fixed `AttributeError: `UsedTable` has no attribute 'table'` by adding more type checks ([2895](https://github.com/databrickslabs/ucx/issues/2895)). In this release, we have made significant improvements to the library's type safety and robustness in handling `UsedTable` objects. We fixed an AttributeError related to the `UsedTable` class not having a `table` attribute by adding more type checks in the `collect_tables` method of the `TablePyCollector` and `CollectTablesVisit` classes. We also introduced `AstroidSyntaxError` exception handling and logging. Additionally, we renamed the `table_infos` variable to `used_tables` and changed its type to 'list[JobProblem]' in the `collect_tables_from_tree` and '_SparkSqlAnalyzer.collect_tables' functions. We added conditional statements to check for the presence of required attributes before yielding a new 'TableInfoNode'. A new unit test file, 'test_context.py', has been added to exercise the `tables_collector` method, which extracts table references from a given code snippet, improving the linter's table reference extraction capabilities.
* Fixed `TokenError` in assessment workflow ([2896](https://github.com/databrickslabs/ucx/issues/2896)). In this update, we've implemented a bug fix to improve the robustness of the assessment workflow in our open-source library. Previously, the code only caught parse errors during the execution of the workflow, but parse errors were not the only cause of failures. This commit changes the exception being caught from `ParseError` to the more general `SqlglotError`, which is the common ancestor of both `ParseError` and `TokenError`. By catching the more general `SqlglotError`, the code is now able to handle both parse errors and tokenization errors, providing a more robust solution. The `walk_expressions` method has been updated to catch `SqlglotError` instead of `ParseError`. This change allows the assessment workflow to handle a wider range of issues that may arise during the execution of SQL code, making it more versatile and reliable. The `SqlglotError` class has been imported from the `sqlglot.errors` module. This update enhances the assessment workflow's ability to handle more complex SQL queries, ensuring smoother execution.
* Fixed `assessment` workflow failure for jobs running tasks on existing interactive clusters ([2889](https://github.com/databrickslabs/ucx/issues/2889)). In this release, we have implemented changes to address a failure in the `assessment` workflow when jobs are run on existing interactive clusters (issue [#2886](https://github.com/databrickslabs/ucx/issues/2886)). The fix includes modifying the `jobs.py` file by adding a try-except block when loading libraries for an existing cluster, utilizing a new exception type `ResourceDoesNotExist` to handle cases where the cluster does not exist. Furthermore, the `_register_cluster_info` function has been enhanced to manage situations where the existing cluster is not found, raising a `DependencyProblem` with the message 'cluster-not-found'. This ensures the workflow can continue running jobs on other clusters or with other configurations. Overall, these enhancements improve the system's robustness by gracefully handling edge cases and preventing workflow failure due to non-existent clusters.
* Ignore UCX inventory database in HMS while scanning tables ([2897](https://github.com/databrickslabs/ucx/issues/2897)). In this release, changes have been implemented in the 'tables.py' file of the 'databricks/labs/ucx/hive_metastore' directory to address the issue of mistakenly scanning the UCX inventory database during table scanning. The `_all_databases` method has been updated to exclude the UCX inventory database by checking if the database name matches the schema name and skipping it if so. This change affects the `_crawl` and `_get_table_names` methods, which no longer process the UCX inventory schema when scanning for tables. A TODO comment has been added to the `_get_table_names` method, suggesting potential removal of the UCX inventory schema check in future releases. This change ensures accurate and efficient table scanning, avoiding the `hallucination` of mistaking the UCX inventory schema as a database to be scanned.
* Tech debt: fix situations where `next()` isn't being used properly ([2885](https://github.com/databrickslabs/ucx/issues/2885)). In this commit, technical debt related to the proper usage of Python's built-in `next()` function has been addressed in several areas of the codebase. Previously, there was an assumption that `None` would be returned if there is no next value, which is incorrect. This commit updates and fixes the implementation to correctly handle cases where `next()` is used. Specifically, the `get_dbutils_notebook_run_path_arg`, `of_language` class method in the `CellLanguage` class, and certain methods in the `test_table_migrate.py` file have been updated to correctly handle situations where there is no next value. The `has_path()` method has been removed, and the `prepend_path()` method has been updated to insert the given path at the beginning of the list of system paths. Additionally, a test case for checking table in mount mapping with table owner has been included. These changes improve the robustness and reliability of the code by ensuring that it handles edge cases related to the `next()` function and paths correctly.
* [chore] apply `make fmt` ([2883](https://github.com/databrickslabs/ucx/issues/2883)). In this release, the `make_random` parameter has been removed from the `save_locations` method in the `conftest.py` file for the integration tests. This method is used to save a list of `ExternalLocation` objects to the `external_locations` table in the inventory database, and it no longer requires the `make_random` parameter. In the updated implementation, the `save_locations` method creates a single `ExternalLocation` object with a specific string and priority based on the workspace environment (Azure or AWS), and then uses the SQL backend to save the list of `ExternalLocation` objects to the database. This change simplifies the `save_locations` method and makes it more reusable throughout the test suite.

Dependency updates:

* Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 ([2894](https://github.com/databrickslabs/ucx/pull/2894)).

Page 3 of 12

Releases

Has known vulnerabilities

Previous Next

Databricks-labs-ucx

Page 3 of 12

0.46.0

0.45.0

0.44.0

0.43.0

0.42.0

0.41.0

Page 3 of 12

Links

Releases