Databricks-labs-remorph

Latest version: v0.8.0

Safety actively analyzes 682244 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.8.0

* Added IR for stored procedures ([1161](https://github.com/databrickslabs/remorph/issues/1161)). In this release, we have made significant enhancements to the project by adding support for stored procedures. We have introduced a new `CreateVariable` case class to manage variable creation within the intermediate representation (IR), and removed the `SetVariable` case class as it is now redundant. A new `CaseStatement` class has been added to represent SQL case statements with value match, and a `CompoundStatement` class has been implemented to enable encapsulation of a sequence of logical plans within a single compound statement. The `DeclareCondition`, `DeclareContinueHandler`, and `DeclareExitHandler` case classes have been introduced to handle conditional logic and exit handlers in stored procedures. New classes `DeclareVariable`, `ElseIf`, `ForStatement`, `If`, `Iterate`, `Leave`, `Loop`, `RepeatUntil`, `Return`, `SetVariable`, and `Signal` have been added to the project to provide more comprehensive support for procedural language features and control flow management in stored procedures. We have also included SnowflakeCommandBuilder support for stored procedures and updated the `visitExecuteTask` method to handle stored procedure calls using the `SetVariable` method.
* Added Variant Support ([998](https://github.com/databrickslabs/remorph/issues/998)). In this commit, support for the Variant datatype has been added to the create table functionality, enhancing the system's compatibility with Snowflake's datatypes. A new VariantType has been introduced, which allows for more comprehensive handling of data during create table operations. Additionally, a `remarks VARIANT` line is added in the CREATE TABLE statement and the corresponding spec test has been updated. The Variant datatype is a flexible datatype that can store different types of data, such as arrays, objects, and strings, offering increased functionality for users working with variant data. Furthermore, this change will enable the use of the Variant datatype in Snowflake tables and improves the data modeling capabilities of the system.
* Added `PySpark` generator ([1026](https://github.com/databrickslabs/remorph/issues/1026)). The engineering team has developed a new `PySpark` generator for the `com.databricks.labs.remorph.generators` package. This addition introduces a new parameter, `logical`, of type `Generator[ir.LogicalPlan, String]`, in the `SQLGenerator` for SQL queries. A new abstract class `BasePythonGenerator` has been added, which extends the `Generator` class and generates Python code. A `ExpressionGenerator` class has also been added, which extends `BasePythonGenerator` and is responsible for generating Python code for `ir.Expression` objects. A new `LogicalPlanGenerator` class has been added, which extends `BasePythonGenerator` and is responsible for generating Python code for a given `ir.LogicalPlan`. A new `StatementGenerator` class has been implemented, which converts `Statement` objects into Python code. A new Python-generating class, `PythonGenerator`, has been added, which includes the implementation of an abstract syntax tree (AST) for Python in Scala. This AST includes classes for various Python language constructs. Additionally, new implicit classes for `PythonInterpolator`, `PythonOps`, and `PythonSeqOps` have been added to allow for the creation of PySpark code using the Remorph framework. The `AndOrToBitwise` rule has been implemented to convert `And` and `Or` expressions to their bitwise equivalents. The `DotToFCol` rule has been implemented to transform code that references columns using dot notation in a DataFrame to use the `col` function with a string literal of the column name instead. A new `PySparkStatements` object and `PySparkExpressions` class have been added, which provide functionality for transforming expressions in a data processing pipeline to PySpark equivalents. The `SnowflakeToPySparkTranspiler` class has been added to transpile Snowflake queries to PySpark code. A new `PySpark` generator has been added to the `Transpiler` class, which is implemented as an instance of the `SqlGenerator` class. This change enhances the `Transpiler` class with a new `PySpark` generator and improves serialization efficiency.
* Added `debug-bundle` command for folder-to-folder translation ([1045](https://github.com/databrickslabs/remorph/issues/1045)). In this release, we have introduced a `debug-bundle` command to the remorph project's CLI, specifically added to the `proxy_command` function, which already includes `debug-script`, `debug-me`, and `debug-coverage` commands. This new command enhances the tool's debugging capabilities, allowing developers to generate a bundle of translated queries for folder-to-folder translation tasks. The `debug-bundle` command accepts three flags: `dialect`, `src`, and `dst`, specifying the SQL dialect, source directory, and destination directory, respectively. Furthermore, the update includes refactoring the `FileSetGenerator` class in the `orchestration` package of the `com.databricks.labs.remorph.generators` package, adding a `debug-bundle` command to the `Main` object, and updating the `FileQueryHistoryProvider` method in the `ApplicationContext` trait. These improvements focus on providing a convenient way to convert folder-based SQL scripts to other formats like SQL and PySpark, enhancing the translation capabilities of the project.
* Added `ruff` Python formatter proxy ([1038](https://github.com/databrickslabs/remorph/issues/1038)). In this release, we have added support for the `ruff` Python formatter in our project's continuous integration and development workflow. We have also introduced a new `FORMAT` stage in the `WorkflowStage` object in the `Result` Scala object to include formatting as a separate step in the workflow. A new `RuffFormatter` class has been added to format Python code using the `ruff` tool, and a `StandardInputPythonSubprocess` class has been included to run a Python subprocess and capture its output and errors. Additionally, we have added a proxy for the `ruff` formatter to the SnowflakeToPySparkTranspilerTest for Scala to improve the readability of the transpiled Python code generated by the SnowflakeToPySparkTranspiler. Lastly, we have introduced a new `ruff` formatter proxy in the test code for the transpiler library to enforce format and style conventions in Python code. These changes aim to improve the development and testing experience for the project and ensure that the code follows the desired formatting and style standards.
* Added baseline for translating workflows ([1042](https://github.com/databrickslabs/remorph/issues/1042)). In this release, several new features have been added to the open-source library to improve the translation of workflows. A new dependency for the Jackson YAML data format library, version 2.14.0, has been added to the pom.xml file to enable processing YAML files and converting them to Java objects. A new `FileSet` class has been introduced, which provides an in-memory data structure to manage a set of files, allowing users to add, retrieve, and remove files by name and persist the contents of the files to the file system. A new `FileSetGenerator` class has been added that generates a `FileSet` object from a `JobNode` object, enabling the translation of workflows by generating all necessary files for a workspace. A new `DefineJob` class has been developed to define a new rule for processing `JobNode` objects in the Remorph system, converting instances of `SuccessPy` and `SuccessSQL` into `PythonNotebookTask` and `SqlNotebookTask` objects, respectively. Additionally, various new classes, such as `GenerateBundleFile`, `QueryHistoryToQueryNodes`, `ReformatCode`, `TryGeneratePythonNotebook`, `TryGenerateSQL`, `TrySummarizeFailures`, `InformationFile`, `SuccessPy`, `SuccessSQL`, `FailedQuery`, `Migration`, `PartialQuery`, `QueryPlan`, `RawMigration`, `Comment`, and `PlanComment`, have been introduced to provide a more comprehensive and nuanced job orchestration framework. The `Library` case class has been updated to better separate concerns between library configuration and code assets. These changes address issue [#1042](https://github.com/databrickslabs/remorph/issues/1042) and provide a more robust and flexible workflow translation solution.
* Added correct generation of `databricks.yml` for `QueryHistory` ([1044](https://github.com/databrickslabs/remorph/issues/1044)). The FileSet class in the FileSet.scala file has been updated to include a new method that correctly generates the `databricks.yml` file for the `QueryHistory` feature. This file is used for orchestrating cross-compiled queries, creating three files in total - two SQL notebooks with translated and formatted queries and a `databricks.yml` file to define an asset bundle for the queries. The new method in the FileSet class writes the content to the file using the `Files.write` method from the `java.nio.file` package instead of the previously used `PrintWriter`. The FileSetGenerator class has been updated to include the new `databricks.yml` file generation, and new rules and methods have been added to improve the accuracy and consistency of schema definitions in the generated orchestration files. Additionally, the DefineJob and DefineSchemas classes have been introduced to simplify the orchestration generation process.
* Added documentation around Transformation ([1043](https://github.com/databrickslabs/remorph/issues/1043)). In this release, the Transformation class in our open-source library has been enhanced with detailed documentation, type parameters, and new methods. The class represents a stateful computation that produces an output of type Out while managing a state of type State. The new methods include map and flatMap for modifying the output and chaining transformations, as well as run and runAndDiscardState for executing the computation with a given initial state and producing a Result containing the output and state. Additionally, we have introduced a new trait called TransformationConstructors that provides constructors for successful transformations, error transformations, lifted results, state retrieval, replacement, and updates. The CodeGenerator trait in our code generation library has also been updated with several new methods for more control and flexibility in the code generation process. These include commas and spaces for formatting output, updateGenCtx for updating the GeneratorContext, nest and unnest for indentation, withIndentedBlock for producing nested blocks of code, and withGenCtx for creating transformations that use the current GeneratorContext.
* Added tests for Snow ARRAY_REMOVE function ([979](https://github.com/databrickslabs/remorph/issues/979)). In this release, we have added tests for the Snowflake ARRAY_REMOVE function in the SnowflakeToDatabricksTranspilerTest. The tests, currently ignored, demonstrate the usage of the ARRAY_REMOVE function with different data types, such as integers and doubles. A TODO comment is included for a test case involving VARCHAR casting, to be enabled once the necessary casting functionality is implemented. This update enhances the library's capabilities and ensures that the ARRAY_REMOVE function can handle a variety of data types. Software engineers can refer to these tests to understand the usage of the ARRAY_REMOVE function in the transpiler and the planned casting functionality.
* Avoid non local return ([1052](https://github.com/databrickslabs/remorph/issues/1052)). In this release, the `render` method of the `generators` package object in the `com.databricks.labs.remorph` package has been updated to avoid using non-local returns and follow recommended coding practices. Instead of returning early from the method, it now uses `Option` to track failures and a `try-catch` block to handle exceptions. In cases of exception during string concatenation, the method sets the `failureOpt` variable to `Some(lift(KoResult(WorkflowStage.GENERATE, UncaughtException(e))))`. Additionally, the test file "CodeInterpolatorSpec.scala" has been modified to fix an issue with exception handling. In the updated code, new variables for each argument are introduced, and the problematic code is placed within an interpolated string, allowing for proper exception handling. This release enhances the robustness and reliability of the code interpolator and ensures that the method follows recommended coding practices.
* Collect errors in `Phase` ([1046](https://github.com/databrickslabs/remorph/issues/1046)). The open-source library Remorph has received significant updates, focusing on enhancing error collection and simplifying the Transformation class. The changes include a new method `recordError` in the abstract Phase trait and its concrete implementations for collecting errors during each phase. The Transformation class has been simplified by removing the unused Phase parameter, while the Generator, CodeGenerator, and FileSetGenerator have been specialized to use Transformation without the Phase parameter. The TryGeneratePythonNotebook, TryGenerateSQL, CodeInterpolator, and TBASeqOps classes have been updated for a more concise and focused state. The imports have been streamlined, and the PySparkGenerator, SQLGenerator, and PlanParser have been modified to remove the unused Phase type parameter. A new test file, TransformationTest, has been added to check the error collection functionality in the Transformation class. Overall, these enhancements improve the reliability, readability, and maintainability of the Remorph library.
* Correctly generate `F.fn_name` for builtin PySpark functions ([1037](https://github.com/databrickslabs/remorph/issues/1037)). This commit introduces changes to the generation of `F.fn_name` for builtin PySpark functions in the PySparkExpressions.scala file, specifically for PySpark's builtin functions (`fn`). It includes a new case to handle these functions by converting them to their lowercase equivalent in Python using `Locale.getDefault`. Additionally, changes have been made to handle window specifications more accurately, such as using `ImportClassSideEffect` with `windowSpec` and generating and applying a window function (`fn`) over it. The `LAST_VALUE` function has been modified to `LAST` in the SnowflakeToDatabricksTranspilerTest.scala file, and new methods such as `First`, `Last`, `Lag`, `Lead`, and `NthValue` have been added to the SnowflakeCallMapper class. These changes improve the accuracy, flexibility, and compatibility of PySpark when working with built-in functions and window specifications, making the codebase more readable, maintainable, and efficient.
* Create Command Extended ([1033](https://github.com/databrickslabs/remorph/issues/1033)). In this release, the open-source library has been updated with several new features related to table management and SQL code generation. A new method `replaceTable` has been added to the `LogicalPlanGenerator` class, which generates SQL code for a `ReplaceTableCommand` IR node and replaces an existing table with the same name if it already exists. Additionally, support has been added for generating SQL code for an `IdentityConstraint` IR node, which specifies whether a column is an auto-incrementing identity column. The `CREATE TABLE` statement has been updated to include the `AUTOINCREMENT` and `REPLACE` constraints, and a new `IdentityConstraint` case class has been introduced to extend the capabilities of the `UnnamedConstraint` class. The `TSqlDDLBuilder` class has also been updated to handle the `IDENTITY` keyword more effectively. A new command implementation with `AUTOINCREMENT` and `REPLACE` constraints has been added, and a new SQL script has been included in the functional tests for testing CREATE DDL statements with identity columns. Finally, the SQL transpiler has been updated to support the `CREATE OR REPLACE PROCEDURE` syntax, providing more flexibility and convenience for users working with stored procedures in their SQL code. These updates aim to improve the functionality and ease of use of the open-source library for software engineers working with SQL code and table management.
* Don't draft automated releases ([995](https://github.com/databrickslabs/remorph/issues/995)). In this release, we have made a modification to the release.yml file in the .github/workflows directory by removing the "draft: true" line. This change removes the creation of draft releases in the automated release process, simplifying it and making it more straightforward for users to access new versions of the software. The job section of the release.yml file now only includes the `release` job, with the "release-signing-artifacts: true" still enabled, ensuring that the artifacts are signed. This improvement enhances the overall release process, making it more efficient and user-friendly.
* Enhance the Snow ARRAY_SORT function support ([994](https://github.com/databrickslabs/remorph/issues/994)). With this release, the Snowflake ARRAY_SORT function now supports Boolean literals as parameters, improving its functionality. The changes include validating Boolean parameters in SnowflakeCallMapper.scala, throwing a TranspileException for unsupported arguments, and simplifying the IR using the DBSQL SORT_ARRAY function. Additionally, new test cases have been added to SnowflakeCallMapperSpec for the ARRAY_SORT and ARRAY_SLICE functions. The SnowflakeToDatabricksTranspilerTest class has also been updated with new test cases that cover the enhanced ARRAY_SORT function's various overloads and combinations of Boolean literals, NULLs, and a custom sorting function. This ensures that invalid usage is caught during transpilation, providing better error handling and supporting more use cases.
* Ensure that unparsable text is not lost in the generated output ([1012](https://github.com/databrickslabs/remorph/issues/1012)). In this release, we have implemented an enhancement to the error handling strategy in the ANTLR-generated parsers for SQL. This change records where parsing failed and gathers un-parsable input, preserving them as custom error nodes in the ParseTree at strategic points. The new custom error strategy allows visitors for higher level rules such as `sqlCommand` in Snowflake and `sqlClauses` in TSQL to check for an error node in the children and generate an Ir node representing the un-parsed text. Additionally, new methods have been introduced to handle error recovery, find the highest context in the tree for the particular parser, and recursively find the context. A separate improvement is planned to ensure the PLanParser no longer stops when syntax errors are discovered, allowing safe traversal of the ParseTree. This feature is targeted towards software engineers working with SQL parsing and aims to improve error handling and recovery.
* Fetch table definitions for TSQL ([986](https://github.com/databrickslabs/remorph/issues/986)). This pull request introduces a new `TableDefinition` case class that encapsulates metadata properties for tables in TSQL, such as catalog name, schema name, table name, location, table format, view definition, columns, table size, and comments. A `TSqlTableDefinitions` class has been added with methods to retrieve table definitions, all schemas, and all catalogs from TSQL. The `SnowflakeTypeBuilder` is updated to parse data types from TSQL. The `SnowflakeTableDefinitions` class has been refactored to use the new `TableDefinition` case class and retrieve table definitions more efficiently. The changes also include adding two new test cases to verify the correct retrieval of table definitions and catalogs for TSQL.
* Fixed handling of projected expressions in `TreeNode` ([1159](https://github.com/databrickslabs/remorph/issues/1159)). In this release, we have addressed the handling of projected expressions in the `TreeNode` class, resolving issues [#1072](https://github.com/databrickslabs/remorph/issues/1072) and [#1159](https://github.com/databrickslabs/remorph/issues/1159). The `expressions` method in the `Plan` abstract class has been modified to include the `final` keyword, restricting overriding in subclasses. This method now returns all expressions present in a query from the current plan operator and its descendants. Additionally, we have introduced a new private method, `seqToExpressions`, used for recursively finding all expressions from a given sequence. The `Project` class, representing a relational algebra operation that projects a subset of columns in a table, now utilizes a new `columns` parameter instead of `expressions`. Similar changes have been applied to other classes extending `UnaryNode`, such as `Join`, `Deduplicate`, and `Hint`. The `values` parameter of the `Values` class has also been altered to accurately represent input values. A new test class, `JoinTest`, has been introduced to verify the correct propagation of expressions in join nodes, ensuring intended data transformations.
* Handling any_keys_match from presto ([1048](https://github.com/databrickslabs/remorph/issues/1048)). In this commit, we have added support for the `any_keys_match` Presto function in Databricks by implementing it using existing Databricks functions. The `any_keys_match` function checks if any keys in a map match a given condition. Specifically, we have introduced two new classes, `MapKeys` and `ArrayExists`, which allow us to extract keys from the input map and check if any of the keys satisfy the given condition using the `exists` function. This is accomplished by renaming `exists` to `array_exists` to better reflect its purpose. Additionally, we have provided a Databricks SQL query that mimics the behavior of the `any_keys_match` function in Presto and added tests to ensure that it works as expected. These changes enable users to perform equivalent operations with a consistent syntax in Databricks and Presto.
* Improve IR for job nodes ([1041](https://github.com/databrickslabs/remorph/issues/1041)). The open-source library has undergone improvements to the Intermediate Representation (IR) for job nodes, as indicated by the commit message "Improve IR for job nodes." This release introduces several significant changes, including: Refactoring of the `JobNode` class to extend the `TreeNode` class and the addition of a new abstract class `LeafJobNode` that overrides the `children` method to always return an empty `Seq`. Enhancements to the `ClusterSpec` case class, which now includes a `toSDK` method that properly initializes and sets the values of the fields in the SDK `ClusterSpec` object. Improvements to the `NewClusterSpec` class, which updates the types of several optional fields and introduces changes to the `toSDK` method for better conversion to the SDK format. Removal of the `Job` class, which previously represented a job in the IR of workflows. Changes to the `JobCluster` case class, which updates the `newCluster` attribute from `ClusterSpec` to `NewClusterSpec`. Update to the `JobEmailNotifications` class, which now extends `LeafJobNode` and includes new methods and overwrites existing ones from `LeafJobNode`. Improvements to the `JobNotificationSettings` class, which replaces the original `toSDK` method with a new implementation for more accurate SDK representation of job notification settings. Refactoring of the `JobParameterDefinition` class, which updates the `toSDK` method for more efficient conversion to the SDK format. These changes simplify the representation of job nodes, align the codebase more closely with the underlying SDK, and improve overall code maintainability and compatibility with other Remorph components.
* Query History From Folder ([991](https://github.com/databrickslabs/remorph/issues/991)). The Estimator class in the Remorph project has been updated to enhance the query history interface by adding metadata from reading from a folder, improving its ability to handle queries from different users and increasing the accuracy of estimation reports. The Anonymizer class has also been updated to handle cases where the user field is missing, ensuring the anonymization process can proceed smoothly and preventing potential errors. A new FileQueryHistory class has been added to provide query history functionality by reading metadata from a specified folder. The SnowflakeQueryHistory class has been updated to directly implement the history() method and include new fields in the ExecutedQuery objects, such as 'id', 'source', 'timestamp', 'duration', 'user', and 'filename'. A new ExecutedQuery case class has been introduced, which now includes optional `user` and `filename` fields, and a new QueryHistoryProvider trait has been added with a method history() that returns a QueryHistory object containing a sequence of ExecutedQuery objects, enhancing the query history interface's flexibility and power. Test suites and test data for the Anonymizer and TableGraph classes have been updated to accommodate these changes, allowing for more comprehensive testing of query history functionality. A FileQueryHistorySpec test file has been added to test the FileQueryHistory class's ability to correctly extract queries from SQL files, ensuring the class works as expected.
* Rework serialization using circe+jackson ([1163](https://github.com/databrickslabs/remorph/issues/1163)). In pull request [#1163](https://github.com/databrickslabs/remorph/issues/1163), the serialization mechanism in the project has been refactored to use the Circe and Jackson libraries, replacing the existing ujson library. This change includes the addition of the Circe, Circe-generic-extras, and Circe-jackson libraries, which are licensed under the Apache 2.0 license. The project now includes the copyright notices and license information for all open-source projects that have contributed code to it, ensuring compliance with open-source licenses. The `CoverageTest` class has been updated to incorporate error encoding using Circe and Jackson libraries, and the `EstimationReport` case classes no longer have implicit `ReadWriter` instances defined using macroRW. Instead, circe and Jackson encode and decode instances are likely defined elsewhere in the codebase. The `BaseQueryRunner` abstract class has been updated to handle both parsing and transpilation errors in a more uniform way, using a `failures` field instead of `transpilation_error` or `parsing_error`. Additionally, a new file, `encoders.scala`, has been introduced, which defines encoders for serializing objects to JSON using the Circe and Jackson libraries. These changes aim to improve serialization and deserialization performance and capabilities, simplify the codebase, and ensure consistent and readable output.
* Some window functions does not support window frame conditions ([999](https://github.com/databrickslabs/remorph/issues/999)). The Snowflake expression builder has been updated to correct the default window frame specifications for certain window functions and modify the behavior of the ORDER BY clause in these functions. This change ensures that the expression builder generates the correct SQL syntax for unsupported functions like "LAG", "DENSE_RANK", "LEAD", "PERCENT_RANK", "RANK", and "ROW_NUMBER", improving the compatibility and reliability of the generated queries. Additionally, a unit test for the `SnowflakeExpressionBuilder` has been updated to account for changes in the way window functions are handled, enhancing the accuracy of the builder in generating valid SQL for window functions in Snowflake.
* Split workflow definitions into sensible packages ([1039](https://github.com/databrickslabs/remorph/issues/1039)). The AutoScale class has been refactored and moved to a new package, `com.databricks.labs.remorph.intermediate.workflows.clusters`, extending `JobNode` from `com.databricks.labs.remorph.intermediate.workflows`. It now includes a case class for auto-scaling that takes optional integer arguments `maxWorkers` and `minWorkers`, and a single method `apply` that creates and configures a cluster using the Databricks SDK's `ComputeService`. The AwsAttributes and AzureAttributes classes have also been moved to the `com.databricks.labs.remorph.intermediate.workflows.clusters` package and now extend `JobNode`. These classes manage AWS and Azure-related attributes for compute resources in a workflow. The ClientsTypes case class has been moved to a new clusters sub-package within the workflows package and now extends `JobNode`, and the ClusterLogConf class has been moved to a new clusters package. The JobDeployment class has been refactored and moved to `com.databricks.labs.remorph.intermediate.workflows.jobs`, and the JobEmailNotifications, JobsHealthRule, and WorkspaceStorageInfo classes have been moved to new packages and now import `JobNode`. These changes improve the organization and maintainability of the codebase, making it easier to understand and navigate.
* TO_NUMBER/TO_DECIMAL/TO_NUMERIC without precision and scale ([1053](https://github.com/databrickslabs/remorph/issues/1053)). This pull request introduces improvements to the transpilation process for handling cases where precision and scale are not specified for TO_NUMBER, TO_DECIMAL, or TO_NUMERIC Snowflake functions. The updated transpiler now automatically applies default values when these parameters are omitted, with precision set to the maximum allowed value of 38 and scale set to 0. A new method has been added to manage these cases, and four new test cases have been included to verify the transpilation of TO_NUMBER and TO_DECIMAL functions without specified precision and scale, and with various input formats. This change ensures consistent behavior across different SQL dialects for cases where precision and scale are not explicitly defined in the conversion functions.
* Table comments captured as part of Snowflake Table Definition ([989](https://github.com/databrickslabs/remorph/issues/989)). In this release, we have added support for capturing table comments as part of Snowflake Table Definitions in the remorph library. This includes modifying the TableDefinition case class to include an optional comment field, and updating the SQL query in the SnowflakeTableDefinitions class to retrieve table comments. A new integration test for Snowflake table definitions has also been introduced to ensure the proper functioning of the new feature. This test creates a connection to the Snowflake database, retrieves a list of table definitions, and checks for the presence of table comments. These changes are part of our ongoing efforts to improve metadata capture for Snowflake tables (Note: The commit message references issue [#945](https://github.com/databrickslabs/remorph/issues/945) on GitHub, which this pull request is intended to close.)
* Use Transformation to get rid of the ctx parameter in generators ([1040](https://github.com/databrickslabs/remorph/issues/1040)). The `Generating` class has undergone significant changes, removing the `ctx` parameter and introducing two new phases, `Parsing` and `BuildingAst`, in the sealed trait `Phase`. The `Parsing` phase extends `Phase` with a previous phase of `Init` and contains the source code and filename. The `BuildingAst` phase extends `Phase` with a previous phase of `Parsing` and contains the parsed tree and the previous phase. The `Optimizing` phase now contains the unoptimized plan and the previous phase. The `Generating` phase now contains the optimized plan, the current node, the total statements, the transpiled statements, the `GeneratorContext`, and the previous phase. Additionally, the `TransformationConstructors` trait has been updated to allow for the creation of Transformation instances specific to a certain phase of a workflow. The `runQuery` method in the `BaseQueryRunner` abstract class has been updated to use a new `transpile` method provided by the `Transpiler` trait, and the `Estimator` class in the `Estimation` module has undergone changes to remove the `ctx` parameter in generators. Overall, these changes simplify the implementation, improve code maintainability, and enable better separation of concerns in the codebase.
* With Recursive ([1000](https://github.com/databrickslabs/remorph/issues/1000)). In this release, we have introduced several enhancements for `With Recursive` statements in SQL parsing and processing for the Snowflake database. A new IR (Intermediate Representation) for With Recursive CTE (Common Table Expression) has been implemented in the SnowflakeAstBuilder.scala file. A new case class, WithRecursiveCTE, has been added to the SnowflakeRelationBuilder class in the databricks/labs/remorph project, which extends RelationCommon and includes two members: ctes and query. The buildColumns method in the SnowflakeRelationBuilder class has been updated to handle cases where columnList is null and extract column names differently. Additionally, a new test has been added in SnowflakeAstBuilderSpec.scala that verifies the correct handling of a recursive CTE query. These enhancements improve the support for recursive queries in the Snowflake database, enabling more powerful and flexible querying capabilities for developers and data analysts working with complex data structures.
* [chore] fixed query coverage report ([1160](https://github.com/databrickslabs/remorph/issues/1160)). In this release, we have addressed the issue [#1160](https://github.com/databrickslabs/remorph/issues/1160) related to the query coverage report. We have implemented changes to the `QueryRunner` abstract class in the `com.databricks.labs.remorph.coverage` package. The `ReportEntryReport` constructor now accepts a new parameter `parsed`, which is set to 1 if there is no transpilation error and 0 otherwise. Previously, `parsed` was always set to 1, regardless of the presence of a transpilation error. We also updated the `extractQueriesFromFile` and `extractQueriesFromFolder` methods in the `FileQueryHistory` class to return a single `ExecutedQuery` instance, allowing for better query coverage reporting. Additionally, we modified the behavior of the `history()` method of the `fileQueryHistory` object in the `FileQueryHistorySpec` test case. The method now returns a query history object with a single query having a `source` including the text "SELECT * FROM table1;" and "SELECT * FROM table2;", effectively merging the previous two queries into one. These changes ensure that the report accurately reflects whether the query was successfully transpiled, parsed, and stored in the query history. It is crucial to test thoroughly any parts of the code that rely on the `history()` method to return separate queries, as the behavior of the method has changed.

0.7.0

* Added private key authentication for sf ([917](https://github.com/databrickslabs/remorph/issues/917)). This commit adds support for private key authentication to the Snowflake data source connector, providing users with more flexibility and security. The `cryptography` library is used to process the user-provided private key, with priority given to the `pem_private_key` secret, followed by the `sfPassword` secret. If neither secret is found, an exception is raised. However, password-based authentication is still used when JDBC options are provided, as Spark JDBC does not currently support private key authentication. A new exception class, `InvalidSnowflakePemPrivateKey`, has been added for handling invalid or malformed private keys. Additionally, new tests have been included for reading data with private key authentication, handling malformed private keys, and checking for missing authentication keys. The notice has been updated to include the `cryptography` library's copyright and license information.
* Added support for `PARSE_JSON` and `VARIANT` datatype ([906](https://github.com/databrickslabs/remorph/issues/906)). This commit introduces support for the `PARSE_JSON` function and `VARIANT` datatype in the Snowflake parser, addressing issue [#894](https://github.com/databrickslabs/remorph/issues/894). The implementation removes the experimental dialect, enabling support for the `VARIANT` datatype and using `PARSE_JSON` for it. The `variant_explode` function is also utilized. During transpilation to Snowflake, whenever the `:` operator is encountered in the SELECT statement, everything will be treated as a `VARIANT` on the Databricks side to handle differences between Snowflake and Databricks in accessing variant types. These changes are implemented using ANTLR.
* Added upgrade script and modified metrics sql ([990](https://github.com/databrickslabs/remorph/issues/990)). In this release, the open-source library has been updated with several improvements to the metrics system, database upgrades, and setup process. The setup_spark_remote.sh script now checks if the Spark server is running by pinging localhost:4040, rather than sleeping for a fixed time, allowing for faster execution and more accurate server status determination. The metrics table's insert statement has been updated to cast values to Bigint for better handling of larger counts. An upgrade script has been added to facilitate required modifications, and the setup_spark_remote.sh script has been modified to validate URLs. A new Python file for upgrading the metrics table's data types has been added, which includes a function to recreate the table with the correct data types for specific columns. The upgrade_common module now includes several functions for upgrading database tables, and a new unit test file, test_upgrade_common.py, has been added with test cases for these functions. Lastly, the upgrade script for v0.4.0 has been updated to simplify the process of checking if the main table in the reconcile metadata needs to be recreated and to add an `operation_name` column. These changes improve the library's functionality, accuracy, and robustness, particularly for larger datasets and upgrading processes, enhancing the overall user experience.
* Basic CTAS Implementation ([968](https://github.com/databrickslabs/remorph/issues/968)). This pull request adds basic support for the CREATE TABLE AS SELECT (CTAS) statement in Snowflake, enabling users to create a new table by selecting data from an existing table or query. In the LogicalPlanGenerator class, a new method has been implemented to handle CTAS statements, which generates the appropriate SQL command for creating a table based on the result of a select query. The SnowflakeDDLBuilder class now includes a relationBuilder class member for building relations based on Snowflake DDL input, and the visitCreateTableAsSelect method has been overridden to parse CTAS statements and construct corresponding IR objects. The test suite has been expanded to include a new spec for CTAS statements and a test case for the CTAS statement "CREATE TABLE t1 AS SELECT c1, c2 FROM t2;". Additionally, a new test file "test_ctas_complex.sql" has been added, containing SQL statements for creating a new table by selecting columns from multiple tables and computing new columns through various data manipulations. The implementation also includes adding new SQL statements for CTAS in both Snowflake and Databricks dialects, allowing for testing the CTAS functionality for both.
* Create repeatable estimator for Snowflake query history ([924](https://github.com/databrickslabs/remorph/issues/924)). This commit introduces a new coverage estimation tool for analyzing query history in a database, initially implemented for Snowflake. The tool parses and transpiles query history into Databricks SQL and reports on the percentage of query history it can process. It includes a new `SnowflakePlanParser` class that handles Snowflake query plans, a `SqlGenerator` class that generates Databricks SQL from optimized logical plans, and a `dialect` method that returns the dialect string. The long-term plan is to extend this functionality to other supported databases and dialects and include a report on SQL complexity. Additionally, test cases have been added to the `AnonymizerTest` class to ensure the correct functionality of the `Anonymizer` class, which anonymizes executed queries when provided with a `PlanParser` object. The `Anonymizer` class is intended to be used as part of the coverage estimation tool, which will provide analysis of query history for various databases.
* Created a mapping dict for algo for each dialect at layer level ([934](https://github.com/databrickslabs/remorph/issues/934)). A series of changes have been implemented to improve the reconciliation process and the handling of hash algorithms in the open-source library. A mapping dictionary algorithm to dialect has been introduced at the layer level to enhance the reconciliation process. The `get_hash_transform` function now accepts a new `layer` argument and returns a list of hash algorithms from the `HashAlgoMapping` dictionary. A new `HashAlgoMapping` class has been added to map algorithms to a dialect for hashing, replacing the previous `DialectHashConfig` class. A new function `get_dialect` has been introduced to retrieve the dialect based on the layer. The `_hash_transform` function and the `build_query` method have been updated to use the `layer` parameter when determining the dialect. These changes provide more precise control over the algorithm used for hash transformation based on the source layer and the target dialect, resulting in improved reconciliation performance and accuracy.
* Fetch TableDefinitions from Snowflake ([904](https://github.com/databrickslabs/remorph/issues/904)). A new `SnowflakeTableDefinitions` class has been added to simplify the discovery of Snowflake table metadata. This class establishes a connection with a Snowflake database through a Connection object, and provides methods such as `getDataType` and `getTableDefinitionQuery` to parse data types and generate queries for table definitions. Moreover, it includes a `getTableDefinitions` method to retrieve all table definitions in a Snowflake database as a sequence of `TableDefinition` objects, which encapsulates various properties of each table. The class also features methods for retrieving all catalogs and schemas in a Snowflake database. Alongside `SnowflakeTableDefinitions`, a new test class, `SnowflakeTableDefinitionTest`, has been introduced to verify the behavior of `getTableDefinitions` and ensure that the class functions as intended, adhering to the desired behavior.
* Guide user on missing configuration file ([930](https://github.com/databrickslabs/remorph/issues/930)). In this commit, the `_verify_recon_table_config` method in the `runner.py` file of the `databricks/labs/remorph` package has been updated to handle missing reconcile table configurations during installation. When the reconcile table configuration is not found, an error message will now display the name of the requested configuration file. This enhancement helps users identify the specific configuration file they need to provide in their workspace, addressing issue [#919](https://github.com/databrickslabs/remorph/issues/919). This commit is co-authored by Ludovic Claude.
* Implement more missing visitor functions for Snowflake and TSQL ([975](https://github.com/databrickslabs/remorph/issues/975)). In this release, we have added several missing visitor methods for the Snowflake and TSQL builder classes to improve the reliability and maintainability of our parser. Previously, when a visitor method was missing, the default visitor was called, causing the visit of all children of the ParseTree, which was not ideal. This could lead to incorrect results due to a slight modification in the ANTLR grammar inadvertently breaking the visitor. In this release, we have implemented several new visitor methods for both Snowflake and TSQL builder classes, including the `visitDdlCommand` method in the `SnowflakeDDLBuilder` class and the `visitDdlClause` method in the `TSqlDDLBuilder` class. These new methods ensure that the ParseTree is traversed correctly and that the correct IR node is returned. The `visitDdlCommand` method checks for different types of DDL commands, such as create, alter, drop, and undrop, and calls the appropriate method for each type. The `visitDdlClause` method contains a sequence of methods corresponding to different DDL clauses and calls the first non-null method in the sequence. These changes significantly improve the robustness of our parser and enhance the reliability of our code.
* Introduce typed errors ([981](https://github.com/databrickslabs/remorph/issues/981)). This commit introduces typed errors in the form of a new class, `UnexpectedNode`, and several case classes including `ParsingError`, `UnsupportedDataType`, `WrongNumberOfArguments`, `UnsupportedArguments`, and `UnsupportedDateTimePart` in various packages, as part of the ongoing effort to replace exception throwing with returning `Result` types in future pull requests. These changes will improve error handling and provide more context and precision for errors, facilitating debugging and maintenance of the remorph library and data type generation functionality. The `TranspileException` class is now constructed with specific typed error instances, and the `ErrorCollector` and `ErrorDetail` classes have been updated to use `ParsingError`. Additionally, the `SnowflakeCallMapper` and `SnowflakeTimeUnits` classes have been updated to use the new typed error mechanism, providing more precise error handling for Snowflake-specific functions and expressions.
* Miscellaneous improvements to Snowflake parser ([952](https://github.com/databrickslabs/remorph/issues/952)). This diff brings several miscellaneous improvements to the Snowflake parser in the open-source library, targeting increased parse and transpilation success rates. The modifications include updating the `colDecl` rule to allow optional data types, introducing an `objectField` rule, and enabling date and timestamp literals as strings. Additionally, the parser has been refined to handle identifiers more efficiently, such as hashes within the AnonymizerTest. The expected Ast for certain test cases has also been updated to improve parser accuracy. These changes aim to create a more robust and streamlined Snowflake parser, minimizing parsing errors and enhancing overall user experience for project adopters. Furthermore, the error handling and reporting capabilities of the Snowflake parser have been improved with new case classes, `IndividualError` and `ErrorsSummary`, and updated error messages.
* Moved intermediate package out of parsers ([972](https://github.com/databrickslabs/remorph/issues/972)). In this release, the `intermediate` package has been refactored out of the `parsers` package, aligning with the design principle that parsers should depend on the intermediate representation instead of the other way around. This change affects various classes and methods across the project, all of which have been updated to import the `intermediate` package from its new location. No new functionality has been introduced, but the refactoring improves the package structure and dependency management. The `EstimationAnalyzer` class in the `coverage/estimation` package has been updated to import classes from the new location of the `intermediate` package, and its `evaluateTree` method has been updated to use the new import path for `LogicalPlan` and `Expression`. Other affected classes include `SnowflakeTableDefinitions`, `SnowflakeLexer`, `SnowflakeParser`, `SnowflakeTypeBuilder`, `GeneratorContext`, `DataTypeGenerator`, `IRHelpers`, and multiple test files.
* Patch Function without Brackets ([907](https://github.com/databrickslabs/remorph/issues/907)). This commit introduces new lexer and parser rules to handle Snowflake SQL functions without parentheses, specifically impacting CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP. The new rules allow these functions to be used without parentheses, consistent with Snowflake SQL. This change fixes functional tests and includes documentation for the affected functions. However, there is a pending task to add or fix more test cases to ensure comprehensive testing of the new rules. Additionally, the syntax of the SELECT statement for the CURRENT_TIMESTAMP function has been updated, removing the need for the parameter 'col1'. This change simplifies the syntax for certain SQL functions in the codebase and improves the consistency and reliability of the functional tests.
* Root Table ([936](https://github.com/databrickslabs/remorph/issues/936)). The PR #\<pr\-number\> introduces a new class `TableGraph` that extends `DependencyGraph` and implements `LazyLogging` trait. This class builds a graph of tables and their dependencies based on query history and table definitions. It provides methods to add nodes and edges, build the graph, and retrieve root, upstream, and downstream tables. The `DependencyGraph` trait offers a more structured and flexible way to handle table dependencies. This change is part of the Root Table feature (issue [936](https://github.com/databrickslabs/remorph/issues/936)) that identifies root tables in a graph of table dependencies, closing issue [#23](https://github.com/databrickslabs/remorph/issues/23). The PR includes a new `TableGraphTest` class that demonstrates the use of these methods and verifies their behavior for better data flow understanding and optimization.
* Snowflake Merge Implementation ([964](https://github.com/databrickslabs/remorph/issues/964)). In this release, we have implemented the Merge statement for the Snowflake parser, which enables updating or deleting rows in a target table based on matches with a source table, and inserting new rows into the target table when there are no matches. This feature includes updates to the SnowflakeDMLBuilder and SnowflakeExpressionBuilder classes, allowing for proper handling of column names and MERGE queries. Additionally, we have added test cases to the SnowflakeASTBuilder, SnowflakeDMLBuilderSpec, and SnowflakeToDatabricksTranspiler to ensure the accurate translation and execution of MERGE statements for the Snowflake dialect. These changes bring important database migration and synchronization capabilities to our open-source library, improving its functionality and usability for software engineers.
* TSQL: Implement CREATE TABLE ([911](https://github.com/databrickslabs/remorph/issues/911)). This commit implements the TSQL CREATE TABLE command and its various options and forms, including CTAS, graph node syntax, and analytics variants, as well as syntactical differences for SQL Server. The DDL and DML visitors have been moved from the AST and Relation visitors to separate classes for better responsibility segregation. The LogicalPlanGenerator class has been updated to generate unique constraints, primary keys, foreign keys, check constraints, default value constraints, and identity constraints for the CREATE TABLE command. Additionally, new classes for generating SQL options and handling unresolved options during transpilation have been added to enhance the parser's capability to manage various options and forms. These changes improve the transpilation of TSQL code and the organization of the codebase, making it easier to maintain and extend.
* Transpile Snow ARRAY_SORT function ([973](https://github.com/databrickslabs/remorph/issues/973)). In this release, we have implemented support for the Snowflake ARRAY_SORT function in our open-source library. This feature has been added as part of issue [#973](https://github.com/databrickslabs/remorph/issues/973), and it involves the addition of two new private methods, `arraySort` and `makeArraySort`, to the `SnowflakeCallMapper` class. The `arraySort` method takes a sequence of expressions as input and sorts the array using the `makeArraySort` method. The `makeArraySort` method handles both null and non-null values, sorts the array in ascending or descending order based on the provided parameter, and determines the position of null or small values based on the nulls first parameter. The sorted array is then returned as an `ir.ArraySort` expression. This functionality allows for the sorting of arrays in Snowflake SQL to be translated to equivalent code in the target language. This enhancement simplifies the process of working with arrays in Snowflake SQL and provides users with a more streamlined experience.
* Transpile Snow MONTHS_BETWEEN function correctly ([963](https://github.com/databrickslabs/remorph/issues/963)). In this release, the remorph library's SnowflakeCallMapper class in the com/databricks/labs/remorph/parsers/snowflake/rules package has been updated to handle the MONTHS_BETWEEN function. A new case has been added that creates a MonthsBetween object with the first two arguments of the function call and a boolean value of true. This change enhances compatibility and ensures that the output accurately reflects the intended functionality. Additionally, new test cases have been introduced to the SnowflakeCallMapperSpec for the transpilation of the MONTHS_BETWEEN function. These test cases demonstrate accurate mapping of the function to the MonthsBetween class and proper casting of inputs as dates or timestamps, improving the reliability and precision of date and time calculations.
* Updated Installation to handle install errors ([962](https://github.com/databrickslabs/remorph/issues/962)). In this release, we've made significant improvements to the `remorph` project, addressing and resolving installation errors that were occurring during the installation process in development mode. We've introduced a new `ProductInfo` class in the `wheels` module, which provides information about the products being installed. This change replaces the use of `WheelsV2` in two test functions. Additionally, we've updated the `workspace_installation` method in `application.py` to handle installation errors more effectively, addressing the dependency on workspace `.remorph` due to wheels. We've also added new methods to `installation.py` to manage local and remote version files, and updated the `_upgrade_reconcile_workflow` function to ensure the correct wheel path is used during installation. These changes improve the overall quality of the codebase, making it easier for developers to adopt and maintain the project, and ensure a more seamless installation experience for users.
* Updated catalog operations logging ([910](https://github.com/databrickslabs/remorph/issues/910)). In this release, the setup process for the catalog, schema, and volume in the configurator module has been simplified and improved. The previous implementation repeatedly prompted the user for input until the correct input was provided or a maximum number of attempts was reached. The updated code now checks if the catalog, schema, or volume already exists and either uses it or prompts the user to create it once. If the user does not have the necessary privileges to use the catalog, schema, or volume, an error message is logged and the installation is aborted. New methods have been added to check for necessary privileges, such as `has_necessary_catalog_access`, `has_necessary_schema_access`, and `has_necessary_volume_access`, which return a boolean indicating whether the user has the necessary privileges and log an error message with the missing privileges if not. The logging for catalog operations in the install.py file has also been updated to check for privileges at the end of the process and list any missing privileges for each catalog object. Additionally, changes have been made to the unit tests for the ResourceConfigurator class to ensure that the system handles cases where the user does not have the necessary permissions to access catalogs, schemas, or volumes, preventing unauthorized access and maintaining the security and integrity of the system.
* Updated remorph reconcile workflow to use wheels instead of pypi ([884](https://github.com/databrickslabs/remorph/issues/884)). In this release, the installation process for the Remorph library has been updated to allow for the use of locally uploaded wheel files instead of downloading the package from PyPI. This change includes updates to the `install` and `_deploy_jobs` methods in the `recon.py` file to accept a new `wheel_paths` argument, which is used to pass the path of the Remorph wheel file to the `deploy_recon_job` method. The `_upgrade_reconcile_workflow` function in the `v0.4.0_add_main_table_operation_name_column.py` file has also been updated to upload the wheel package to the workspace and pass its path to the `deploy_reconcile_job` method. Additionally, the `deploy_recon_job` method in the `JobDeployment` class now accepts a new `wheel_file` argument, which represents the name of the wheel file for the remorph library. These changes address issues faced by customers with no public internet access and enable the use of new features before they are released on PyPI. The `test_recon.py` file in the `tests/unit/deployment` directory has also been updated to reflect these changes.
* Upgrade script Implementation ([777](https://github.com/databrickslabs/remorph/issues/777)). In this release, we've implemented an upgrade script as part of pull request [#777](https://github.com/databrickslabs/remorph/issues/777), which resolves issue [#769](https://github.com/databrickslabs/remorph/issues/769). This change introduces a new `Upgrades` class in `application.py` that accepts `product_info` and `installation` as parameters and includes a cached property `wheels` for improved performance. Additionally, we've added new methods to the `WorkspaceInstaller` class for handling upgrade-related tasks, including the creation of a `ProductInfo` object, interacting with the Databricks SDK, and handling potential errors. We've also added a test case to ensure that upgrades are applied correctly on more recent versions. These changes are part of our ongoing effort to enhance the management and application of upgrades to installed products.
* bug fix for to_array function ([961](https://github.com/databrickslabs/remorph/issues/961)). A bug fix has been implemented to improve the `TO_ARRAY` function in our open-source library. Previously, this function expected only one parameter, but it has been updated to accept two parameters, with the second being optional. This change brings the function in line with other functions in the class, improving flexibility and ensuring backward compatibility. The `TO_ARRAY` function is used to convert a given expression to an array if it is not null and return null otherwise. The commit also includes updates to the `Generator` class, where a new entry for the `ToArray` expression has been added to the `expression_map` dictionary. Additionally, a new `ToArray` class has been introduced as a subclass of `Func`, allowing the function to handle a variable number of arguments more gracefully. Relevant updates have been made to the functional tests for the `to_array` function for both Snowflake and Databricks SQL, demonstrating its handling of null inputs and comparing it with the corresponding ARRAY function in each SQL dialect. Overall, these changes enhance the functionality and adaptability of the `TO_ARRAY` function.
* feat: Implement all of TSQL predicates except for SOME ALL ANY ([922](https://github.com/databrickslabs/remorph/issues/922)). In this commit, we have implemented the IR generation for several TSQL predicates including IN, IS, BETWEEN, LIKE, EXISTS, and FREETEXT, thereby improving the parser's ability to handle a wider range of TSQL syntax. The `TSqlParser` class has been updated with new methods and changes to existing ones, including the addition of new labeled expressions to the `predicate` rule. Additionally, we have corrected an error in the LIKE predicate's implementation, allowing the ESCAPE character to accept a full expression that evaluates to a single character at runtime, rather than assuming it to be a single character at parse time. These changes provide more flexibility and adherence to the TSQL standard, enhancing the overall functionality of the project for our adopters.

0.6.0

* Added query history retrieval from Snowflake ([874](https://github.com/databrickslabs/remorph/issues/874)). This release introduces query history retrieval from Snowflake, enabling expanded compatibility and data source options for the system. The update includes adding the Snowflake JDBC driver and its dependencies to the `pom.xml` file, and the implementation of a new `SnowflakeQueryHistory` class to retrieve query history from Snowflake. The `Anonymizer` object is also added to anonymize query histories by fingerprinting queries based on their structure. Additionally, several case classes are added to represent various types of data related to query execution and table definitions in a Snowflake database. A new `EnvGetter` class is also included to retrieve environment variables for use in testing. Test files for the `Anonymizer` and `SnowflakeQueryHistory` classes are added to ensure proper functionality.
* Added support for `ALTER TABLE`: `ADD COLUMNS`, `DROP COLUMNS`, `RENAME COLUMNS`, and `DROP CONSTRAINTS` ([861](https://github.com/databrickslabs/remorph/issues/861)). In this release, support for various `ALTER TABLE` SQL commands has been added to our open-source library, including `ADD COLUMNS`, `DROP COLUMNS`, `RENAME COLUMNS`, and `DROP CONSTRAINTS`. These features have been implemented in the `LogicalPlanGenerator` class, which now includes a new private method `alterTable` that takes a context and an `AlterTableCommand` object and returns an `ALTER TABLE` SQL statement. Additionally, a new sealed trait `TableAlteration` has been introduced, with four case classes extending it to handle specific table alteration operations. The `SnowflakeTypeBuilder` class has also been updated to parse and build Snowflake-specific SQL types for these commands. These changes provide improved functionality for managing and manipulating tables in Snowflake, making it easier for users to work with and modify their data. The new functionality has been tested using the `SnowflakeToDatabricksTranspilerTest` class, which specifies Snowflake `ALTER TABLE` commands and the expected transpiled results.
* Added support for `STRUCT` types and conversions ([852](https://github.com/databrickslabs/remorph/issues/852)). This change adds support for `STRUCT` types and conversions in the system by implementing new `StructType`, `StructField`, and `StructExpr` classes for parsing, data type inference, and code generation. It also maps the `OBJECT_CONSTRUCT` from Snowflake and introduces updates to various case classes such as `JsonExpr`, `Struct`, and `Star`. These improvements enhance the system's capability to handle complex data structures, ensuring better compatibility with external data sources and expanding the range of transformations available for users. Additionally, the changes include the addition of test cases to verify the functionality of generating SQL data types for `STRUCT` expressions and handling JSON literals more accurately.
* Minor upgrades to Snowflake parameter processing ([871](https://github.com/databrickslabs/remorph/issues/871)). This commit includes minor upgrades to Snowflake parameter processing, enhancing the consistency and readability of the code. The changes normalize parameter generation to use `${}` syntax for clarity and to align with Databricks notebook examples. An extra coverage test for variable references within strings has been added. The specific changes include updating a SELECT statement in a Snowflake SQL query to use ${} for parameter processing. The commit also introduces a new SQL file for functional tests related to Snowflake's parameter processing, which includes commented out and alternate syntax versions of a query. This commit is part of continuous efforts to improve the functionality, reliability, and usability of the Snowflake parameter processing feature.
* Patch/reconcile support temp views ([901](https://github.com/databrickslabs/remorph/issues/901)). The latest update to the remorph-reconcile library adds support for temporary views, a new feature that was not previously available. With this change, the system can now handle `global_temp` for temporary views by modifying the `_get_schema_query` method to return a query for the `global_temp` schema if the schema name is set as such. Additionally, the `read_data` method was updated to correctly handle the namespace and catalog for temporary views. The new variable `namespace_catalog` has been introduced, which is set to `hive_metastore` if the catalog is not set, and to the original catalog with the added schema otherwise. The `table_with_namespace` variable is then updated to use the `namespace_catalog` and table name, allowing for correct querying of temporary views. These modifications enable remorph-reconcile to work seamlessly with temporary views, enhancing its flexibility and functionality. The updated unit tests reflect these changes, with assertions to ensure that the correct SQL statements are being generated and executed for temporary views.
* Reconcile Table Recon JSON filename updates ([866](https://github.com/databrickslabs/remorph/issues/866)). The Remorph project has implemented a change to the naming convention and placement of the configuration file for the table reconciliation process. The configuration file, previously named according to individual preference, must now follow the pattern `recon_config_<DATA_SOURCE>_<SOURCE_CATALOG_OR_SCHEMA>_<REPORT_TYPE>.json` and be placed in the `.remorph` directory within the Databricks Workspace. Examples of Table Recon filenames for Snowflake, Oracle, and Databricks source systems have been provided for reference. Additionally, the `data_source` field in the config file has been updated to accurately reflect the data source. The case of the filename should now match the case of `SOURCE_CATALOG_OR_SCHEMA` as defined in the config. Compliance with this new naming convention and placement is required for the successful execution of the table reconciliation process.
* [snowflake] parse parameters ([855](https://github.com/databrickslabs/remorph/issues/855)). The open-source library has undergone changes related to Scalafmt configuration, Snowflake SQL parsing, and the introduction of a new `ExpressionGenerator` class method. The Scalafmt configuration change introduces a new `docstrings.wrap` option set to `false`, disabling docstring wrapping at the specified column limit. The `danglingParentheses.preset` option is also set to `false`, disabling the formatting rule for unnecessary parentheses. In Snowflake SQL parsing, new token types, lexer modes, and parser rules have been added to improve the parsing of string literals and other elements. A new `variable` method in the `ExpressionGenerator` class generates SQL expressions for `ir.Variable` objects. A new `Variable` case class has been added to represent a variable in an expression, and the `SchemaReference` case class now takes a single child expression. The `SnowflakeDDLBuilder` class has a new method, `extractString`, to safely extract strings from ANTLR4 context objects. The `SnowflakeErrorStrategy` object now includes new parameters for parsing Snowflake syntax, and the Snowflake LexerSpec test class has new methods for filling tokens from an input string and dumping the token list. Tests have been added for various string literal scenarios, and the SnowflakeAstBuilderSpec includes a new test case for handling the `translate amps` functionality. The Snowflake SQL queries in the test file have been updated to standardize parameter referencing syntax, improving consistency and readability.
* fixed current_date() generation ([890](https://github.com/databrickslabs/remorph/issues/890)). This release includes a fix for an issue with the generation of the `current_date()` function in SQL queries, specifically for the Snowflake dialect. A test case in the `sqlglot-incorrect` category has been updated to use the correct syntax for the `CURRENT_DATE` function, which includes parentheses (`SELECT CURRENT_DATE() FROM tabl;`). Additionally, the `current_date()` function is now called consistently throughout the tests, either as `CURRENT_DATE` or `CURRENT_DATE()`, depending on the syntax required by Snowflake. No new methods were added, and the existing functionality was changed only to correct the `current_date()` generation. This improvement ensures accurate and consistent generation of the `current_date()` function across different SQL dialects, enhancing the reliability and accuracy of the tests.

0.5.0

* Added Translation Support for `!` as `commands` and `&` for `Parameters` ([771](https://github.com/databrickslabs/remorph/issues/771)). This commit adds translation support for using "!" as commands and "&" as parameters in Snowflake code within the remorph tool, enhancing compatibility with Snowflake syntax. The "!set exit_on_error=true" command, which previously caused an error, is now treated as a comment and prepended with `--` in the output. The "&" symbol, previously unrecognized, is converted to its Databricks equivalent "$", which represents parameters, allowing for proper handling of Snowflake SQL code containing "!" commands and "&" parameters. These changes improve the compatibility and robustness of remorph with Snowflake code and enable more efficient processing of Snowflake SQL statements. Additionally, the commit introduces a new test suite for Snowflake commands, enhancing code coverage and ensuring proper functionality of the transpiler.
* Added `LET` and `DECLARE` statements parsing in Snowflake PL/SQL procedures ([548](https://github.com/databrickslabs/remorph/issues/548)). This commit introduces support for parsing `DECLARE` and `LET` statements in Snowflake PL/SQL procedures, enabling variable declaration and assignment. It adds new grammar rules, refactors code using ScalaSubquery, and implements IR visitors for `DECLARE` and `LET` statements with Variable Assignment and ResultSet Assignment. The `RETURN` statement and parameterized expressions are also now supported. Note that `CURSOR` is not yet covered. These changes allow for improved processing and handling of Snowflake PL/SQL code, enhancing the overall functionality of the library.
* Added logger statements in get_schema function ([756](https://github.com/databrickslabs/remorph/issues/756)). In this release, enhanced logging has been implemented in the Metadata (Schema) fetch functions, specifically in the `get_schema` function and other metadata fetch functions within Oracle, SnowflakeDataSource modules. The changes include logger statements that log the schema query, start time, and end time, providing better visibility into the performance and behavior of these functions during debugging or monitoring. The logging functionality is implemented using the built-in `logging` module and timestamps are obtained using the `datetime` module. In the SnowflakeDataSource class, RuntimeError or PySparkException will be raised if the user's current role lacks the necessary privileges to access the specified Information Schema object. The INFORMATION_SCHEMA table in Snowflake is used to fetch the schema, with the query modified to handle unquoted and quoted identifiers and the ordinal position of columns. The `get_schema_query` function has also been updated for better formatting for the SQL query used to fetch schema information. The schema fetching method remains unchanged, but these enhancements provide more detailed logging for debugging and monitoring purposes.
* Aggregates Reconcile CLI Implementation ([770](https://github.com/databrickslabs/remorph/issues/770)). The `Aggregates Reconcile CLI Implementation` commit introduces a new command-line interface (CLI) for reconcile jobs, specifically for aggregated data. This change adds a new parameter, "operation_name", to the run method in the runner.py file, which determines the type of reconcile operation to perform. A new function, _trigger_reconcile_aggregates, has been implemented to reconcile aggregate data based on provided configurations and log the reconciliation process outcome. Additionally, new methods for defining job parameters and settings, such as `max_concurrent_runs` and "parameters", have been included. This CLI implementation enhances the customizability and control of the reconciliation process for users, allowing them to focus on specific use cases and data aggregations. The changes also include new test cases in test_runner.py to ensure the proper behavior of the ReconcileRunner class when the `aggregates-reconcile` operation_name is set.
* Aggregates Reconcile Updates ([784](https://github.com/databrickslabs/remorph/issues/784)). This commit introduces significant updates to the `Table Deployment` feature, enabling it to support `Aggregate Tables` deployment and modifying the persistence logic for tables. Notable changes include the addition of a new `aggregates` attribute to the `Table` class in the configuration, which allows users to specify aggregate functions and optionally group by specific columns. The reconcile process now captures mismatch data, missing rows in the source, and missing rows in the target in the recon metrics tables. Furthermore, the aggregates reconcile process supports various aggregate functions like min, max, count, sum, avg, median, mode, percentile, stddev, and variance. The documentation has been updated to reflect these improvements. The commit also removes the `percentile` function from the reconciliation configuration and modifies the `aggregate_metrics` SQL query, enhancing the flexibility of the `Table Deployment` feature for `Aggregate Tables`. Users should note that the `percentile` function is no longer a valid option and should update their code accordingly.
* Aggregates Reconcile documentation ([779](https://github.com/databrickslabs/remorph/issues/779)). In this commit, the Aggregates Reconcile utility has been enhanced with new documentation and visualizations for improved understanding and usability. The utility now includes a flow diagram, visualization, and README file illustrating how it compares specific aggregate metrics between source and target data residing on Databricks. A new configuration sample is added, showcasing the reconciliation of two tables using various aggregate functions, join columns, transformations, filters, and JDBC ReaderOptions configurations. The commit also introduces two Mermaid flowchart diagrams, depicting the reconciliation process with and without a `group by` operation. Additionally, new flow diagram visualizations in PNG and GIF formats have been added, aiding in understanding the process flow of the Aggregates Reconcile feature. The reconcile configuration samples in the documentation have also been updated with a spelling correction for clarity.
* Bump sqlglot from 25.6.1 to 25.8.1 ([749](https://github.com/databrickslabs/remorph/issues/749)). In this version update, the `sqlglot` dependency has been bumped from 25.6.1 to 25.8.1, bringing several bug fixes and new features related to various SQL dialects such as BigQuery, DuckDB, and T-SQL. Notable changes include support for BYTEINT in BigQuery, improved parsing and transpilation of StrToDate in ClickHouse, and support for SUMMARIZE in DuckDB. Additionally, there are bug fixes for DuckDB and T-SQL, including wrapping left IN clause json extract arrow operand and handling JSON_QUERY with a single argument. The update also includes refactors and changes to the ANNOTATORS and PARSER modules to improve dialect-aware annotation and consistency. This pull request is compatible with `sqlglot` version 25.6.1 and below and includes a detailed list of commits and their corresponding changes.
* Generate window functions ([772](https://github.com/databrickslabs/remorph/issues/772)). In this release, we have added support for generating SQL `WINDOW` and `SortOrder` expressions in the `ExpressionGenerator` class. This enhancement includes the ability to generate a `WINDOW` expression with a window function, partitioning and ordering clauses, and an optional window frame, using the `window` and `frameBoundary` methods. The `sortOrder` method now generates the SQL `SortOrder` expression, which includes the expression to sort by, sort direction, and null ordering. Additional methods `orNull` and `doubleQuote` return a string representing a NULL value and a string enclosed in double quotes, respectively. These changes provide increased flexibility for handling more complex expressions in SQL. Additionally, new test cases have been added to the `ExpressionGeneratorTest` to ensure the correct generation of SQL window functions, specifically the `ROW_NUMBER()` function with various partitioning, ordering, and framing specifications. These updates improve the robustness and functionality of the `ExpressionGenerator` class for generating SQL window functions.
* Implement TSQL specific function call mapper ([765](https://github.com/databrickslabs/remorph/issues/765)). This commit introduces several new features to enhance compatibility between TSQL and Databricks SQL. A new method, `interval`, has been added to generate a Databricks SQL compatible string for intervals in a TSQL expression. The `expression` method has been updated to handle certain functions directly, improving translation efficiency. Specifically, the DATEADD function is now translated to Databricks SQL's DATE_ADD, ADD_MONTHS, and xxx + INTERVAL n {days|months|etc} constructs. The changes also include a new sealed trait `KnownIntervalType`, a new case class `KnownInterval`, and a new class `TSqlCallMapper` for mapping TSQL functions to Databricks SQL equivalents. Furthermore, the commit introduces new tests for TSQL specific function call mappers, ensuring proper translation of TSQL functions to Databricks SQL compatible constructs. These improvements collectively facilitate better integration and compatibility between TSQL and Databricks SQL.
* Improve TSQL and Snowflake parser and lexer ([757](https://github.com/databrickslabs/remorph/issues/757)). In this release, the open-source library's Snowflake and TSQL lexers and parsers have been improved for better functionality and robustness. For the Snowflake lexer, unnecessary escape sequence processing has been removed, and various options have been corrected to be simple strings. The lexer now accepts a question mark as a placeholder for prepared statements in Snowflake statements. The TSQL lexer has undergone minor improvements, such as aligning the catch-all rule name with Snowflake. The Snowflake parser now accepts the question mark as a `PARAM` placeholder and simplifies the `typeFileformat` rule to accept a single `STRING` token. Additionally, several new keywords have been added to the TSQL lexer, improving consistency and clarity. These changes aim to simplify lexer and parser rules, enhance option handling and placeholders, and ensure consistency between Snowflake and TSQL.
* Patch Information Schema Predicate Pushdown for Snowflake ([764](https://github.com/databrickslabs/remorph/issues/764)). In this release, we have implemented Information Schema Predicate Pushdown for Snowflake, resolving issue [#7](https://github.com/databrickslabs/remorph/issues/7)
* TSQL: Implement correct grammar for CREATE TABLE in all forms ([796](https://github.com/databrickslabs/remorph/issues/796)). In this release, the TSqlLexer's CREATE TABLE statement grammar has been updated and expanded to support new keywords and improve accuracy. The newly added keywords 'EDGE', 'FILETABLE', 'NODE', and `NODES` enable correct parsing of CREATE TABLE statements using graph nodes and FILETABLE functionality. Existing keywords such as 'DROP_EXISTING', 'DYNAMIC', 'FILENAME', and `FILTER` have been refined for better precision. Furthermore, the introduction of the `tableIndices` rule standardizes the order of columns in the table. These enhancements improve the T-SQL parser's robustness and consistency, benefiting users in creating and managing tables in their databases.
* TSQL: Implement grammar for CREATE DATABASE and CREATE DATABASE SCOPED OPTION ([788](https://github.com/databrickslabs/remorph/issues/788)). In this release, we have implemented the TSQL grammar for `CREATE DATABASE` and `CREATE DATABASE SCOPED OPTION` statements, addressing inconsistencies with TSQL documentation. The implementation was initially intended to cover the entire process from grammar to code generation. However, to simplify other DDL statements, the work was split into separate grammar-only pull requests. The diff introduces new methods such as `createDatabaseScopedCredential`, `createDatabaseOption`, and `databaseFilestreamOption`, while modifying the existing `createDatabase` method. The `createDatabaseScopedCredential` method handles the creation of a database scoped credential, which was previously part of `createDatabaseOption`. The `createDatabaseOption` method now focuses on handling individual options, while `databaseFilestreamOption` deals with filesystem specifications. Note that certain options, like `DEFAULT_LANGUAGE`, `DEFAULT_FULLTEXT_LANGUAGE`, and more, have been marked as TODO and will be addressed in future updates.
* TSQL: Improve transpilation coverage ([766](https://github.com/databrickslabs/remorph/issues/766)). In this update, various enhancements have been made to improve the coverage of TSQL transpilation and address bugs in code generation, particularly for the `ExpressionGenerator` class in the `com/databricks/labs/remorph/generators/sql` package, and the `TSqlExpressionBuilder`, `TSqlFunctionBuilder`, `TSqlCallMapper`, and `QueryRunner` classes. Changes include adding support for new cases, modifying code generation behavior, improving test coverage, and updating existing tests for better TSQL code generation. Specific additions include new methods for handling bitwise operations, converting CHECKSUM_AGG calls to a sequence of MD5 function calls, and handling Fn instances. The `QueryRunner` class has been updated to include both the actual and expected outputs in error messages for better debugging purposes. Additionally, the test file for the `DATEADD` function has been updated to ensure proper syntax and consistency. All these modifications aim to improve the reliability, accuracy, and compatibility of TSQL transpilation, ensuring better functionality and coverage for the Remorph library's transformation capabilities.
* [chore] speedup build process by not running unit tests twice ([842](https://github.com/databrickslabs/remorph/issues/842)). In this commit, the build process for the open-source library has been optimized by removing the execution of unit tests during the build phase in the Maven build process. A new plugin for the Apache Maven Surefire Plugin has been added, with the group ID set to "org.apache.maven.plugins", artifact ID set to "maven-surefire-plugin", and version set to "3.1.2". The configuration for this plugin includes a `skipTests` attribute set to "true", ensuring that tests are not run twice, thereby improving the build process speed. The existing ScalaTest Maven plugin configuration remains unchanged, allowing Scala tests to still be executed during the test phase. Additionally, the Maven Compiler Plugin has been upgraded to version 3.11.0, and the release parameter has been set to 8, ensuring that the Java compiler used during the build process is compatible with Java 8. The version numbers for several libraries, including os-lib, mainargs, ujson, scalatest, and exec-maven-plugin, are now being defined using properties, allowing Maven to manage and cache these libraries more efficiently. These changes improve the build process's performance and reliability without affecting the existing functionality.
* [internal] better errors for call mapper ([816](https://github.com/databrickslabs/remorph/issues/816)). In this release, the `ExpressionGenerator` class in the `com.databricks.labs.remorph.generators.sql` package has been updated to handle exceptions during the conversion of input functions to Databricks expressions. A try-catch block has been added to catch `IndexOutOfBoundsException` and provide a more descriptive error message, including the name of the problematic function and the error message associated with the exception. A `TranspileException` with the message `not implemented` is now thrown when encountering a function for which a translation to Databricks expressions is not available. The `IsTranspiledFromSnowflakeQueryRunner` class in the `com.databricks.labs.remorph.coverage` package has also been updated to include the name of the exception class in the error message for better error identification when a non-fatal error occurs during parsing. Additionally, the import statement for `Formatter` has been moved to ensure alphabetical order. These changes improve error handling and readability, thereby enhancing the overall user experience for developers interacting with the codebase.
* [snowflake] map more functions to Databricks SQL ([826](https://github.com/databrickslabs/remorph/issues/826)). This commit introduces new private methods `andPredicate` and `orPredicate` to the ExpressionGenerator class in the `com.databricks.labs.remorph.generators.sql` package, enhancing the generation of SQL expressions for AND and OR logical operators, and improving readability and correctness of complex logical expressions. The LogicalPlanGenerator class in the `sql` package now supports more flexibility in inserting data into a target relation, enabling users to choose between overwriting the existing data or appending to it. The `FROM_JSON` function in the CallMapper class has been updated to accommodate an optional third argument, providing more flexibility in handling JSON-related transformations. A new class, `CastParseJsonToFromJson`, has been introduced to improve the performance of data processing pipelines that involve parsing JSON data in Snowflake using the `PARSE_JSON` function. Additional Snowflake SQL functions have been mapped to Databricks SQL IR, enhancing compatibility and functionality. The ExpressionGeneratorTest class now generates predicates without parentheses, simplifying and improving readability. Mappings for several Snowflake functions to Databricks SQL have been added, enhancing compatibility with Databricks SQL. The `sqlFiles` sequence in the `NestedFiles` class is now sorted before being mapped to `AcceptanceTest` objects, ensuring consistent order for testing or debugging purposes. A semicolon has been added to the end of a SQL query in a test file for Snowflake DML insert functionality, ensuring proper query termination.
* [sql] generate `INSERT INTO ...` ([823](https://github.com/databrickslabs/remorph/issues/823)). In this release, we have made significant updates to our open-source library. The ExpressionGenerator.scala file has been updated to convert boolean values to lowercase instead of uppercase when generating INSERT INTO statements, ensuring SQL code consistency. A new method `insert` has been added to the `LogicalPlanGenerator` class to generate INSERT INTO SQL statements based on the `InsertIntoTable` input. We have introduced a new case class `InsertIntoTable` that extends `Modification` to simplify the API for DML operations other than SELECT. The SQL ExpressionGenerator now generates boolean literals in lowercase, and new test cases have been added to ensure the correct generation of INSERT and JOIN statements. Lastly, we have added support for generating INSERT INTO statements in SQL for specified database tables, improving cross-platform compatibility. These changes aim to enhance the library's functionality and ease of use for software engineers.
* [sql] generate basic JSON access ([835](https://github.com/databrickslabs/remorph/issues/835)). In this release, we have added several new features and improvements to our open-source library. The `ExpressionGenerator` class now includes a new method, `jsonAccess`, which generates SQL code to access a JSON object's properties, handling different types of elements in the path. The `TO_JSON` function in the `StructsToJson` class has been updated to accept an optional expression as an argument, enhancing its flexibility. The `SnowflakeCallMapper` class now includes a new method, `lift`, and a new feature to generate basic JSON access, with corresponding updates to test cases and methods. The SQL logical plan generator has been refined to generate star projections with escaped identifiers, handling complex table and database names. We have also added new methods and test cases to the `SnowflakeCallMapper` class to convert Snowflake structs into JSON strings and cast Snowflake values to specific data types. These changes improve the library's ability to handle complex JSON data structures, enhance functionality, and ensure the quality of generated SQL code.
* [sql] generate basic `CREATE TABLE` definition ([829](https://github.com/databrickslabs/remorph/issues/829)). In this release, the open-source library's SQL generation capabilities have been enhanced with the addition of a new `createTable` method to the `LogicalPlanGenerator` class. This method generates a `CREATE TABLE` definition for a given `ir.CreateTableCommand`, producing a SQL statement with a comma-separated list of column definitions. Each column definition includes the column name, data type, and any applicable constraints, generated using the `DataTypeGenerator.generateDataType` method and the newly-introduced `constraint` method. Additionally, the `project` method has been updated to incorporate a `FROM` clause in the generated SQL statement when the input of the project node is not `ir.NoTable()`. These improvements extend the functionality of the `LogicalPlanGenerator` class, allowing it to generate `CREATE TABLE` statements for input catalog ASTs, thereby better supporting data transformation use cases. A new test for the `CreateTableCommand` has been added to the `LogicalPlanGeneratorTest` class to validate the correct transpilation of the `CreateTableCommand` to a `CREATE TABLE` SQL statement.
* [sql] generate basic `TABLESAMPLE` ([830](https://github.com/databrickslabs/remorph/issues/830)). In this commit, the open-source library's `LogicalPlanGenerator` class has been updated to include a new method, `tableSample`, which generates SQL representations of table sampling operations. Previously, the class only handled `INSERT`, `DELETE`, and `CREATE TABLE` commands. With this enhancement, the generator can now produce SQL statements using the `TABLESAMPLE` clause, allowing for the selection of a sample of data from a table based on various sampling methods and a seed value for repeatable sampling. The newly supported sampling methods include row-based probabilistic, row-based fixed amount, and block-based sampling. Additionally, a new test case has been added for the `LogicalPlanGenerator` related to the `TableSample` class, validating the correct transpilation of named tables and fixed row sampling into the `TABLESAMPLE` clause with specified parameters. This improvement ensures that the generated SQL code accurately represents the desired table sampling settings.

Dependency updates:

* Bump sqlglot from 25.6.1 to 25.8.1 ([749](https://github.com/databrickslabs/remorph/pull/749)).

0.4.1

* Aggregate Queries Reconciliation ([740](https://github.com/databrickslabs/remorph/issues/740)). This release introduces several changes to enhance the functionality of the project, including the implementation of Aggregate Queries Reconciliation, addressing issue [#503](https://github.com/databrickslabs/remorph/issues/503). A new property, `aggregates`, has been added to the base class of the query builder module to support aggregate queries reconciliation. A `generate_final_reconcile_aggregate_output` function has been added to generate the final reconcile output for aggregate queries. A new SQL file creates a table called `aggregate_details` to store details about aggregate reconciles, and a new column, `operation_name`, has been added to the `main` table in the `installation` reconciliation query. Additionally, new classes and methods have been introduced for handling aggregate queries and their reconciliation, and new SQL tables and columns have been created for storing and managing rules for aggregating data in the context of query reconciliation. Unit tests have been added to ensure the proper functioning of aggregate queries reconciliation and reconcile aggregate data in the context of missing records.
* Generate GROUP BY / PIVOT ([747](https://github.com/databrickslabs/remorph/issues/747)). The LogicalPlanGenerator class in the remorph library has been updated to support generating GROUP BY and PIVOT clauses for SQL queries. A new private method, "aggregate", has been added to handle two types of aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY clause with specified grouping expressions. For Pivot, it generates a PIVOT clause where the specified column is used as the pivot column and the specified values are used as the pivot values, compatible with Spark SQL. If the aggregate type is unsupported, a TranspileException is thrown. Additionally, new test cases have been introduced for the LogicalPlanGenerator class in the com.databricks.labs.remorph.generators.sql package to support testing the transpilation of Aggregate expressions with GROUP BY and PIVOT clauses, ensuring proper handling and transpilation of these expressions.
* Implement error strategy for Snowflake parsing and use error strategy for all parser instances ([760](https://github.com/databrickslabs/remorph/issues/760)). In this release, we have developed an error strategy specifically for Snowflake parsing that translates raw token names and parser rules into more user-friendly SQL error messages. This strategy is applied consistently across all parser instances, ensuring a unified error handling experience. Additionally, we have refined the DBL_DOLLAR rule in the SnowflakeLexer grammar to handle escaped dollar signs correctly. These updates improve the accuracy and readability of error messages for SQL authors, regardless of the parsing tool or transpiler used. Furthermore, we have updated the TSQL parsing error strategy to match the new Snowflake error strategy implementation, providing a consistent error handling experience across dialects.
* Incremental improvement to error messages - article selection ([711](https://github.com/databrickslabs/remorph/issues/711)). In this release, we have implemented an incremental improvement to the error messages generated during T-SQL code parsing. This change introduces a new private method, `articleFor`, which determines whether to use `a` or `an` in the generated messages based on the first letter of the following word. The `generateMessage` method has been updated to use this new method when constructing the initial error message and subsequent messages when there are multiple expected tokens. This improvement ensures consistent use of articles `a` or `an` in the error messages, enhancing their readability for software engineers working with T-SQL code.
* TSQL: Adds tests and support for SELECT OPTION(...) generation ([755](https://github.com/databrickslabs/remorph/issues/755)). In this release, we have added support for generating code for the TSQL `SELECT ... OPTION(...)` clause in the codebase. This new feature includes the ability to transpile any query hints supplied with a SELECT statement as comments in the output code, allowing for easier assessment of query performance after transpilation. The OPTION clause is now generated as comments, including MAXRECURSION, string options, boolean options, and auto options. Additionally, we have added new tests and updated the TSqlAstBuilderSpec test class with new and updated test cases to cover the new functionality. The implementation is focused on generating code for the OPTION clause, and does not affect the actual execution of the query. The changes are limited to the ExpressionGenerator class and its associated methods, and the TSqlRelationBuilder class, without affecting other parts of the codebase.
* TSQL: IR implementation of MERGE ([719](https://github.com/databrickslabs/remorph/issues/719)). The open-source library has been updated to include a complete implementation of the TSQL MERGE statement's IR (Intermediate Representation), bringing it in line with Spark SQL. The `LogicalPlanGenerator` class now includes a `generateMerge` method, which generates the SQL code for the MERGE statement, taking a `MergeIntoTable` object containing the target and source tables, merge condition, and merge actions as input. The `MergeIntoTable` class has been added as a case class to represent the logical plan of the MERGE INTO command and extends the `Modification` trait. The `LogicalPlanGenerator` class also includes a new `generateWithOptions` method, which generates SQL code for the WITH OPTIONS clause, taking a `WithOptions` object containing the input and options as children. Additionally, the `TSqlRelationBuilder` class has been updated to handle the MERGE statement's parsing, introducing new methods and updating existing ones, such as `visitMerge`. The `TSqlToDatabricksTranspiler` class has been updated to include support for the TSQL MERGE statement, and the `ExpressionGenerator` class has new tests for options, columns, and arithmetic expressions. A new optimization rule, `TrapInsertDefaultsAction`, has been added to handle the behavior of the DEFAULT keyword during INSERT statements. The commit also includes test cases for the `MergeIntoTable` logical operator and the T-SQL merge statement in the `TSqlAstBuilderSpec`.

0.4.0

* Added TSql transpiler ([734](https://github.com/databrickslabs/remorph/issues/734)). In this release, we have developed a new open-source library feature that enhances the transpilation of T-SQL code to Databricks-compatible code. The new TSqlToDatabricksTranspiler class has been added, which extends the Transpiler abstract class and defines the transpile method. This method converts T-SQL code to Databricks-compatible code by creating a lexer, token stream, parser, and parsed tree from the input string using TSqlLexer, CommonTokenStream, and tSqlFile. The parsed tree is then passed to the TSqlAstBuilder's visit method to generate a logical plan, which is optimized using an optimizer object with rules such as PullLimitUpwards and TopPercentToLimitSubquery. The optimized plan is then passed to the LogicalPlanGenerator's generate method to generate the final transpiled code. Additionally, a new class, IsTranspiledFromTSqlQueryRunner, has been added to the QueryRunner object to transpile T-SQL queries into the format expected by the Databricks query runner using the new TSqlToDatabricksTranspiler. The AcceptanceTestRunner class in the com.databricks.labs.remorph.coverage package has been updated to replace TSqlAstBuilder with IsTranspiledFromTSqlQueryRunner in the TSqlAcceptanceSuite class, indicating a change in the code responsible for handling TSql queries. This new feature aims to provide a smooth and efficient way to convert T-SQL code to Databricks-compatible code for further processing and execution.
* Added the missing info for the reconciliation documentation ([520](https://github.com/databrickslabs/remorph/issues/520)). In this release, we have made significant improvements to the reconciliation feature of our open-source library. We have added a configuration folder and provided a template for creating configuration files for specific table sources. The config files will contain necessary configurations for table-specific reconciliation. Additionally, we have included a note in the transformation section detailing the usage of user-defined functions (UDFs) in transformation expressions, with an example UDF called `sort_array_input()` provided. The reconcile configuration sample documentation has also been added, with a JSON configuration for reconciling tables using various operations like drop, join, transformation, threshold, filter, and JDBC ReaderOptions. The commit also includes examples of source and target tables, data overviews, and reconciliation configurations for various scenarios, such as basic config and column mapping, user transformations, explicit select, explicit drop, filters, and thresholds comparison, among others. These changes aim to make it easier for users to set up and execute the reconciliation process for specific table sources and provide clear and concise information about using UDFs in transformation expressions for reconciliation. This commit is co-authored by Vijay Pavan Nissankararao and SundarShankar89.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([555](https://github.com/databrickslabs/remorph/issues/555)). In this pull request, the sigstore/gh-action-sigstore-python dependency is being updated from version 2.1.1 to 3.0.0. This new version includes several changes and improvements, such as the addition of recursive globbing with ** to the inputs and the removal of certain settings like fulcio-url, rekor-url, ctfe, and rekor-root-pubkey. The signature, certificate, and bundle output settings have also been removed. Furthermore, the way inputs are parsed has been changed, and they are now made optional under certain conditions. The default suffix has been updated to .sigstore.json. The 3.0.0 version also resolves various deprecations present in sigstore-python's 2.x series and supports CI runners that use PEP 668 to constrain global package prefixes.
* Bump sqlglot from 25.1.0 to 25.5.1 ([534](https://github.com/databrickslabs/remorph/issues/534)). In the latest release, the `sqlglot` package has been updated from version 25.1.0 to 25.5.1, which includes bug fixes, breaking changes, new features, and refactors for parsing, analyzing, and rewriting SQL queries. The new version introduces optimizations for coalesced USING columns, preserves EXTRACT(date_part FROM datetime) calls, decouples NVL() from COALESCE(), and supports FROM CHANGES in Snowflake. It also provides configurable transpilation of Snowflake VARIANT, and supports view schema binding options for Spark and Databricks. The update addresses several issues, such as the use of timestamp with time zone over timestamptz, switch off table alias columns generation, and parse rhs of x::varchar(max) into a type. Additionally, the update cleans up CurrentTimestamp generation logic for Teradata. The `sqlglot` dependency has also been updated in the 'experimental.py' file of the 'databricks/labs/remorph/snow' module, along with the addition of a new private method `_parse_json` to the `Generator` class for parsing JSON data. Software engineers should review the changes and update their code accordingly, as conflicts with existing code will be resolved automatically by Dependabot, as long as the pull request is not altered manually.
* Bump sqlglot from 25.5.1 to 25.6.1 ([585](https://github.com/databrickslabs/remorph/issues/585)). In this release, the `sqlglot` dependency is updated from version 25.5.1 to 25.6.1 in the 'pyproject.toml' file. This update includes bug fixes, breaking changes, new features, and improvements. Breaking changes consist of updates to the QUALIFY clause in queries and the canonicalization of struct and array inline constructor. New features include support for ORDER BY ALL, FROM ROWS FROM (...), RPAD & LPAD functions, and exp.TimestampAdd. Bug fixes address issues related to the QUALIFY clause in queries, expansion of SELECT * REPLACE, RENAME, transpiling UDFs from Databricks, and more. The pull request also includes a detailed changelog, commit history, instructions for triggering Dependabot actions, and commands for reference, with the exception of the compatibility score for the new version, which is not taken into account in this pull request.
* Feature/reconcile table mismatch threshold ([550](https://github.com/databrickslabs/remorph/issues/550)). This commit enhances the reconciliation process in the open-source library with several new features, addressing issue [#504](https://github.com/databrickslabs/remorph/issues/504). A new `get_record_count` method is added to the `Reconcile` class, providing record count data for source and target tables, facilitating comprehensive analysis of table mismatches. A `CountQueryBuilder` class is introduced to build record count queries for different layers and SQL dialects, ensuring consistency in data processing. The `Thresholds` class is refactored into `ColumnThresholds` and `TableThresholds`, allowing for more granular control over comparisons and customizable threshold settings. New methods `_is_mismatch_within_threshold_limits` and `_insert_into_metrics_table` are added to the `recon_capture.py` file, improving fine-grained control over the reconciliation process and preventing false positives. Additionally, new classes, methods, and data structures have been implemented in the `execute` module to handle reconciliation queries and data more efficiently. These improvements contribute to a more accurate and robust reconciliation system.
* Feature: introduce core transpiler ([715](https://github.com/databrickslabs/remorph/issues/715)). A new core transpiler, `SnowflakeToDatabricksTranspiler`, has been introduced to convert Snowflake queries into Databricks SQL, streamlining integration and compatibility between the two systems. This transpiler is integrated into the coverage test suites for thorough testing, and is used to convert various types of logical plans, handling cases such as `Batch`, `WithCTE`, `Project`, `NamedTable`, `Filter`, and `Join`. The `SnowflakeToDatabricksTranspiler` class tokenizes input Snowflake query strings, initializes a `SnowflakeParser` instance, parses the input Snowflake query, generates a logical plan, and applies the `LogicalPlanGenerator` to the logical plan to generate the equivalent Databricks SQL query. Additionally, the `SnowflakeAstBuilder` class has been updated to alter the way `Batch` logical plans are built and improve overall functionality of the transpiler.
* Fixed LEFT and RIGHT JOIN syntax in Snowflake ANTLR grammar ([526](https://github.com/databrickslabs/remorph/issues/526)). A fix has been implemented to address issues with the Snowflake ANTLR grammar related to the proper parsing of LEFT and RIGHT JOIN statements. Previously, the keywords LEFT and RIGHT were incorrectly allowed as identifiers, but they are hard keywords that must be escaped to be used as column names. This change updates the grammar to escape these keywords in JOIN statements, improving the overall parsing of queries that include LEFT and RIGHT JOINs. Additionally, semantic predicates have been suggested to handle cases where LEFT or RIGHT are used as column names without escaping, although this is not yet implemented. To ensure the correctness of the updated grammar, new tests have been added to the SnowflakeAstBuilderSpec for LEFT and RIGHT JOINs, which check that the Abstract Syntax Tree (AST) is built correctly for these queries.
* Fixed Snowflake Acceptance Testcases Failures ([531](https://github.com/databrickslabs/remorph/issues/531)). In this release, updates have been made to the acceptance testcases for various SQL functions in the open-source library. The DENSE RANK function's testcase has been updated with a window specification and ORDER BY clause in both Snowflake and Databricks SQL syntaxes, ensuring accurate test results. The LAG function's testcase now includes a PARTITION BY and ORDER BY clause, as well as the NULLS LAST keyword in Databricks SQL, for improved accuracy and consistency. The SQL queries in the Snowflake testcase for the `last_value` function have been updated with a window specification, ORDER BY clause, and NULLS LAST directive for Databricks SQL. Test case failures in the Snowflake acceptance testsuite have been addressed with updates to the LEAD function, MONTH_NAME to MONTHNAME renaming, and DATE_FORMAT to TO_DATE conversion, improving reliability and consistency. The ntile function's testcase has been updated with PARTITION BY and ORDER BY clauses, and the NULLS LAST keyword has been added to the Databricks SQL query. The SQL query for null-safe equality comparison has been updated with a conditional expression compatible with Snowflake. The ranking function's testcase has been improved with the appropriate partition and order by clauses, and the NULLS LAST keyword has been added to the Databricks SQL query, enhancing accuracy and consistency. Lastly, updates have been made to the ROW_NUMBER function's testcase, ensuring accurate and consistent row numbering for both Snowflake and Databricks.
* Fixed TSQL transpiler ([735](https://github.com/databrickslabs/remorph/issues/735)). In this release, we have implemented a fix for the TSQL transpiler, addressing the issue [#7](https://github.com/databrickslabs/remorph/issues/7). This enhancement allows the library to accurately convert TSQL code into an equivalent format that is compatible with other databases. The fix resolves reported bugs related to incorrect syntax interpretation, thereby improving the overall reliability and functionality of the transpiler. Software engineers and developers relying on TSQL compatibility for cross-database operations will benefit from this improvement. We encourage users to test and provide feedback on this updated feature.
* Fixed `SELECT TOP X PERCENT` IR translation for TSQL ([733](https://github.com/databrickslabs/remorph/issues/733)). In this release, we have made several enhancements to the open-source library to improve compatibility with T-SQL and Catalyst. We have added a new dependency, `pprint_${scala.binary.version}` version 0.8.1 from the `com.lihaoyi` group, to provide advanced pretty-printing functionality for Scala. We have also fixed the translation of the TSQL `SELECT TOP X PERCENT` feature in the parser for intermediate expressions, addressing the difference in syntax between TSQL and SQL for limiting the number of rows returned by a query. Additionally, we have modified the implementation of the `WITH` clause and added a new expression for the `SELECT TOP` clause in T-SQL, improving the compatibility of the codebase with T-SQL and aligning it with Catalyst. We also introduced a new abstract class `Rule` and a case class `Rules` in the `com.databricks.labs.remorph.parsers.intermediate` package to fix the `SELECT TOP X PERCENT` IR translation for TSQL by adding new rules. Furthermore, we have added a new Scala file, `subqueries.scala`, containing abstract class `SubqueryExpression` and two case classes that extend it, and made changes to the `trees.scala` file to improve the tree string representation for better readability and consistency in the codebase. These changes aim to improve the overall functionality of the library and make it easier for new users to understand and adopt the project.
* Fixed invalid null constraint and FQN ([517](https://github.com/databrickslabs/remorph/issues/517)). In this change, we addressed issues [#516](https://github.com/databrickslabs/remorph/issues/516) and [#517](https://github.com/databrickslabs/remorph/issues/517), which involved resolving an invalid null constraint and correcting a fully qualified name (FQN) in our open-source library. The `read_data` function in the `databricks.py` file was updated to improve null constraint handling and ensure the FQN is valid. Previously, the catalog was always appended to the table name, potentially resulting in an invalid FQN and null constraint issues. Now, the code checks if the catalog exists before appending it to the table name, and if not provided, the schema and table name are concatenated directly. Additionally, we removed the NOT NULL constraint from the catalog field in the source_table and target_table structs in the main SQL file, allowing null values for this field. These changes maintain backward compatibility and enhance the overall functionality and robustness of the project, ensuring accurate query results and avoiding potential errors.
* Generate SQL for arithmetic operators ([726](https://github.com/databrickslabs/remorph/issues/726)). In this release, we have introduced a new private method `arithmetic` to the `ExpressionGenerator` class that generates SQL for arithmetic operations, including unary minus, unary plus, multiplication, division, modulo, addition, and subtraction. This improves the readability and maintainability of the code by separating concerns and making the functionality more explicit. Additionally, we have introduced a new trait named `Arithmetic` to group arithmetic expressions together, which enables easier manipulation and identification of arithmetic expressions in the code. A new test suite has also been added for arithmetic operations in the `ExpressionGenerator` class, which improves test coverage and ensures the correct SQL is generated for these operations. These changes provide a welcome addition for developers looking to extend the functionality of the `ExpressionGenerator` class for arithmetic operations.
* Generate SQL for bitwise operators. The ExpressionGenerator class in the remorph project has been updated to support generating SQL for bitwise operators (OR, AND, XOR, NOT) through the addition of a new private method `bitwise` that converts bitwise operations to equivalent SQL expressions. The `expression` method has also been updated to utilize the new `bitwise` method for any input ir.Expression that is a bitwise operation. To facilitate this change, a new trait called `Bitwise` and updated case classes for bitwise operations, including `BitwiseNot`, `BitwiseAnd`, `BitwiseOr`, and `BitwiseXor`, have been implemented. The updated case classes extend the new `Bitwise` trait and include the `dataType` override method to return the data type of the left expression. A new test case in ExpressionGeneratorTest for bitwise operators has been added to validate the functionality, and the `expression` method in ExpressionGenerator now utilizes the GeneratorContext() instead of new GeneratorContext(). These changes enable the ExpressionGenerator to generate SQL code for bitwise operations, expanding its capabilities.
* Generate `.. LIKE ..` ([723](https://github.com/databrickslabs/remorph/issues/723)). In this commit, the `ExpressionGenerator` class has been enhanced with a new method, `like`, which generates SQL `LIKE` expressions. The method takes a `GeneratorContext` object and an `ir.Like` object as arguments, and returns a string representation of the `LIKE` expression. It uses the `expression` method to generate the left and right sides of the `LIKE` operator, and also handles the optional escape character. Additionally, the `timestampLiteral` and `dateLiteral` methods have been updated to take an `ir.Literal` object and better handle `NULL` values. A new test case has also been added for the `ExpressionGenerator` class, which checks the `like` function and includes examples for basic usage and usage with an escape character. This commit improves the functionality of the `ExpressionGenerator` class, allowing it to handle `LIKE` expressions and better handle `NULL` values for timestamps and dates.
* Generate `DISTINCT`, `*` ([739](https://github.com/databrickslabs/remorph/issues/739)). In this release, we've enhanced the ExpressionGenerator class to support generating `DISTINCT` and `*` (star) expressions in SQL queries. Previously, the class did not handle these cases, resulting in incomplete or incorrect SQL queries. With the introduction of the `distinct` method, the class can now generate the `DISTINCT` keyword followed by the expression to be applied to, and the `star` method produces the `*` symbol, optionally followed by the name of the object (table or subquery) to which it applies. These improvements make the ExpressionGenerator class more robust and compatible with various SQL dialects, resulting in more accurate query outcomes. We've also added new test cases for the `ExpressionGenerator` class to ensure that `DISTINCT` and `*` expressions are generated correctly. Additionally, support for generating SQL `DISTINCT` and `*` (wildcard) has been added to the transpilation of Logical Plans to SQL, specifically in the `ir.Project` class. This ensures that the correct SQL `SELECT * FROM table` syntax is generated when a wildcard is used in the expression list. These enhancements significantly improve the functionality and compatibility of our open-source library.
* Generate `LIMIT` ([732](https://github.com/databrickslabs/remorph/issues/732)). A new method has been added to generate a SQL LIMIT clause for a logical plan in the data transformation tool. A new `case` branch has been implemented in the `generate` method of the `LogicalPlanGenerator` class to handle the `ir.Limit` case, which generates the SQL LIMIT clause. If a percentage limit is specified, the tool will throw an exception as it is not currently supported. The `generate` method of the `ExpressionGenerator` class has been replaced with a new `generate` method in the `ir.Project`, `ir.Filter`, and new `ir.Limit` cases to ensure consistent expression generation. A new case class `Limit` has been added to the `com.databricks.labs.remorph.parsers.intermediate.relations` package, which extends the `UnaryNode` class and has four parameters: `input`, `limit`, `is_percentage`, and `with_ties`. This new class enables limiting the number of rows returned by a query, with the ability to specify a percentage of rows or include ties in the result set. Additionally, a new test case has been added to the `LogicalPlanGeneratorTest` class to verify the transpilation of a `Limit` node to its SQL equivalent, ensuring that the `Limit` node is correctly handled during transpilation.
* Generate `OFFSET` SQL clauses ([736](https://github.com/databrickslabs/remorph/issues/736)). The Remorph project's latest update introduces a new OFFSET clause generation feature for SQL queries in the LogicalPlanGenerator class. This change adds support for skipping a specified number of rows before returning results in a SQL query, enhancing the library's query generation capabilities. The implementation includes a new case in the match statement of the generate function to handle ir.Offset nodes, creating a string representation of the OFFSET clause using the provide offset expression. Additionally, the commit includes a new test case in the LogicalPlanGeneratorTest class to validate the OFFSET clause generation, ensuring that the LogicalPlanGenerator can translate Offset AST nodes into corresponding SQL statements. Overall, this update enables the generation of more comprehensive SQL queries with OFFSET support, providing software engineers with greater flexibility for pagination and other data processing tasks.
* Generate `ORDER BY` SQL clauses ([737](https://github.com/databrickslabs/remorph/issues/737)). This commit introduces new classes and enumerations for sort direction and null ordering, as well as an updated SortOrder case class, enabling the generation of ORDER BY SQL clauses. The LogicalPlanGenerator and SnowflakeExpressionBuilder classes have been modified to utilize these changes, allowing for more flexible and customizable sorting and null ordering when generating SQL queries. Additionally, the TSqlRelationBuilderSpec test suite has been updated to reflect these changes, and new test cases have been added to ensure the correct transpilation of ORDER BY clauses. Overall, these improvements enhance the Remorph project's capability to parse and generate SQL expressions with various sorting scenarios, providing a more robust and maintainable codebase.
* Generate `UNION` / `EXCEPT` / `INTERSECT` ([731](https://github.com/databrickslabs/remorph/issues/731)). In this release, we have introduced support for the `UNION`, `EXCEPT`, and `INTERSECT` set operations in our data processing system's generator. A new `unknown` method has been added to the `Generator` trait to return a `TranspileException` when encountering unsupported operations, allowing for better error handling and more informative error messages. The `LogicalPlanGenerator` class in the remorph project has been extended to support generating `UNION`, `EXCEPT`, and `INTERSECT` SQL operations with the addition of a new parameter, `explicitDistinct`, to enable explicit specification of `DISTINCT` for these set operations. A new test suite has been added to the `LogicalPlanGenerator` to test the generation of these operations using the `SetOperation` class, which now has four possible set operations: `UnionSetOp`, `IntersectSetOp`, `ExceptSetOp`, and `UnspecifiedSetOp`. With these changes, our system can handle a wider range of input and provide meaningful error messages for unsupported operations, making it more versatile in handling complex SQL queries.
* Generate `VALUES` SQL clauses ([738](https://github.com/databrickslabs/remorph/issues/738)). The latest commit introduces a new feature to generate `VALUES` SQL clauses in the context of the logical plan generator. A new case branch has been implemented in the `generate` method to manage `ir.Values` expressions, converting input data (lists of lists of expressions) into a string representation compatible with `VALUES` clauses. The existing functionality remains unchanged. Additionally, a new test case has been added for the `LogicalPlanGenerator` class, which checks the correct transpilation to `VALUES` SQL clauses. This test case ensures that the `ir.Values` method, which takes a sequence of sequences of literals, generates the corresponding `VALUES` SQL clause, specifically checking the input `Seq(Seq(ir.Literal(1), ir.Literal(2)), Seq(ir.Literal(3), ir.Literal(4)))` against the SQL clause `"VALUES (1,2), (3,4)"`. This change enables testing the functionality of generating `VALUES` SQL clauses using the `LogicalPlanGenerator` class.
* Generate predicate expressions. This commit introduces the generation of predicate expressions as part of the SQL ExpressionGenerator in the `com.databricks.labs.remorph.generators.sql` package, enabling the creation of more complex SQL expressions. The changes include the addition of a new private method, `predicate(ctx: GeneratorContext, expr: Expression)`, to handle predicate expressions, and the introduction of two new predicate expression types, LessThan and LessThanOrEqual, for comparing the relative ordering of two expressions. Existing predicate expression types have been updated with consistent naming. Additionally, the commit incorporates improvements to the handling of comparison operators in the SnowflakeExpressionBuilder and TSqlExpressionBuilder classes, addressing bugs and ensuring precise predicate expression generation. The `ParserTestCommon` trait has also been updated to reorder certain operators in a logical plan, maintaining consistent comparison results in tests. New test cases have been added to several test suites to ensure the correct interpretation and generation of predicate expressions involving different data types and search conditions. Overall, these enhancements provide more fine-grained comparison of expressions, enable more nuanced condition checking, and improve the robustness and accuracy of the SQL expression generation process.
* Merge remote-tracking branch 'origin/main'. In this update, the `ExpressionGenerator` class in the `com.databricks.labs.remorph.generators.sql` package has been enhanced with two new private methods: `dateLiteral` and `timestampLiteral`. These methods are designed to generate the SQL literal representation of `DateType` and `TimestampType` expressions, respectively. The introduction of these methods addresses the previous limitations of formatting date and timestamp values directly, which lacked extensibility and required duplicated code for handling null values. By extracting the formatting logic into separate methods, this commit significantly improves code maintainability and reusability, enhancing the overall readability and understandability of the `ExpressionGenerator` class for developers. The `dateLiteral` method handles `DateType` values by formatting them using the `dateFormat` `SimpleDateFormat` instance, returning `NULL` if the value is missing. Likewise, the `timestampLiteral` method formats `TimestampType` values using the `timeFormat` `SimpleDateFormat` instance, returning `NULL` if the value is missing. These methods will enable developers to grasp the code's functionality more easily and make future enhancements to the class.
* Modified dataclass for table threshold and added documentation ([714](https://github.com/databrickslabs/remorph/issues/714)). A series of modifications have been implemented to enhance the threshold configuration and validation for table reconciliation in the open-source library. The `TableThresholds` dataclass has been updated to accept a string for the `model` attribute, replacing the previously used `TableThresholdModel` Enum. Additionally, a new `validate_threshold_model` method has been added to `TableThresholds` to ensure proper validation of the `model` attribute. A new exception class, `InvalidModelForTableThreshold`, has been introduced to handle invalid settings. Column-specific thresholds can now be set using the `ColumnThresholds` configuration option. The `recon_capture.py` and `recon_config.py` files have been updated accordingly, and the documentation has been revised to clarify these changes. These improvements offer greater flexibility and control for users configuring thresholds while also refining validation and error handling.
* Support CTAS in TSQL Grammar and add more comparison operators ([545](https://github.com/databrickslabs/remorph/issues/545)). In this release, we have added support for the CTAS (CREATE TABLE AS) statement in the TSQL (T-SQL) grammar, as well as introduced new comparison operators: !=, !<, and !>. The CTAS statement allows for the creation of a new table by selecting from an existing table or query, potentially improving code readability and performance. The new comparison operators provide alternative ways of expressing inequalities, increasing flexibility for developers. The keyword `REPLICATE` has also been added for creating a full copy of a database or availability group. These changes enhance the overall functionality of the TSQL grammar and improve the user's ability to express various operations in TSQL. The CTAS statement is implemented as a new rule, and the new comparison operators are added as methods in the TSqlExpressionBuilder class. These changes provide increased capability and flexibility for TSQL parsing and query handling. The new methods are not adding any new external dependencies, and the project remains self-contained. The additions have been tested with the TSqlExpressionBuilderSpec test suite, ensuring the functionality and compatibility of the TSQL parser.
* Support translation of TSQL INGORE NULLS clause in windowing functions ([511](https://github.com/databrickslabs/remorph/issues/511)). The latest change introduces support for translating the TSQL IGNORE NULLS and RESPECT NULLS clauses in windowing functions to their equivalents in Databricks SQL. In TSQL, these clauses appear after the function name and before the OVER clause, affecting how the functions handle null values. Databricks SQL represents this functionality with an optional trailing boolean parameter for specific windowing functions. With this update, when the IGNORE NULLS clause is specified in TSQL, a boolean option is appended to the corresponding Databicks SQL windowing functions, with RESPECT NULLS as the default. This enhancement is facilitated by a new private method, `buildNullIgnore`, which adds the boolean parameter to the original expression when IGNORE NULLS is specified in the OVER clause. The alteration is exemplified in new test examples for the TSqlFunctionSpec, which include testing the LEAD function with and without the IGNORE NULLS clause, and updates to the translation of functions with non-standard syntax.
* TSQL: Implement TSQL UPDATE/DELETE statements ([540](https://github.com/databrickslabs/remorph/issues/540)). In this release, we have added support for TSQL UPDATE and DELETE statements in all syntactical forms, including UDF column transformations, in the TSqlParser.g4 file. The implementation of both statements is done in a single file and they share many common clauses. We have also introduced two new case classes, UpdateTable and MergeTables, in the extensions.scala file to implement the TSQL UPDATE and DELETE statements, respectively. Additionally, we have added new methods to handle various clauses and elements associated with these statements in the TSqlErrorStrategy class. A series of tests have been included to ensure the correct translation of various UPDATE and DELETE queries to their respective Abstract Syntax Trees (ASTs). These changes bring TSQL UPDATE and DELETE statement functionality to the project and allow for their use in a variety of contexts, providing developers with more flexibility and control when working with TSQL UPDATE and DELETE statements in the parser.
* TSQL: Implement translation of INSERT statement ([515](https://github.com/databrickslabs/remorph/issues/515)). In this release, we have implemented the TSQL INSERT statement in its entirety, including all target options, optional clauses, and Common Table Expressions (CTEs) in our open-source library. The change includes updates to the TSqlParser.g4 file to support the INSERT statement's various clauses, such as TOP, INTO, WITH TABLE HINTS, outputClause, and optionClause. We have also added new case classes to the TSQL AST to handle various aspects of the INSERT statement, including LocalVarTable, Output, InsertIntoTable, DerivedRows, DefaultValues, and Default. The TSqlExpressionBuilder and TSqlRelationBuilder classes have been updated to support the new INSERT statement, including handling output column lists, aliases, and JSON clauses. We have added specification tests to TSqlAstBuilderSpec.scala to demonstrate the various ways that the INSERT statement can be written and to ensure the correct translation of TSQL INSERT statements into the remorph project.
* TSQL: Remove the SIGN fragment from numerical tokens ([547](https://github.com/databrickslabs/remorph/issues/547)). In this release, we have made changes to the TSQL expression builder in the remorph project that affect how negative and positive numerical literals are parsed. Negative literals, such as -2, will now be parsed as UMinus(Literal(2)) instead of Literal(-2), and positive literals, such as +1, will be parsed as UPlus(Literal(1)) instead of Literal(1). This change was made to address issue [#546](https://github.com/databrickslabs/remorph/issues/546), but it is not an ideal solution as it may cause inconvenience in downstream processes. The affected numerical tokens include INT, HEX, FLOAT, REAL, and MONEY, which have been simplified by removing the SIGN fragment. We have updated the buildPrimitive method to handle INT, REAL, and FLOAT token types and ensure that numerical tokens are parsed correctly. We intend to keep issue [#546](https://github.com/databrickslabs/remorph/issues/546) open for further exploration of a better solution. The tests have been updated to reflect these changes.
* TSQL: Simplifies named table tableSource, implements columnAlias list ([512](https://github.com/databrickslabs/remorph/issues/512)). This change introduces significant updates to the TSqlParser's grammar for tableSource, simplifying and consolidating rules related to table aliases and column alias lists. A new Relation called TableWithHints has been added to collect and process table hints, some of which have direct counterparts in the Catalyst optimizer or can be used as comments for migration purposes. The TSQLExpressionBuilder and TSqlRelationBuilder classes have been modified to handle table hints and column aliases, and the TSqlAstBuilderSpec test suite has been updated to include new tests for table hints in T-SQL SELECT statements. These changes aim to improve parsing, handling, and optimization of table sources, table hints, and column aliases in TSQL queries.
* TSQL: Support generic FOR options ([525](https://github.com/databrickslabs/remorph/issues/525)). In this release, we have added support for parsing T-SQL (Transact-SQL) options that contain the keyword `FOR` using the standard syntax `[FOR]`. This change is necessary as `FOR` cannot be used directly as an identifier without escaping, as it would otherwise be seen as a table alias in a `SELECT` statement. The ANTLR rule for parsing generic options has been expanded to handle these special cases correctly, allowing for the proper parsing of options such as `OPTIMIZE FOR UNKNOWN` in a `SELECT` statement. Additionally, a new case has been added to the OptionBuilder class to handle the `FOR` keyword and elide it, converting `OPTIMIZE FOR UNKNOWN` to `OPTIMIZE` with an id of 'UNKNOWN'. This ensures the proper handling of options containing `FOR` and avoids any conflicts with the `FOR` clause in T-SQL statements. This change was implemented by Valentin Kasas and involves adding a new alternative to the `genericOption` rule in the TSqlParser.g4 file, but no new methods have been added.
* Updated Dialect Variable Name ([535](https://github.com/databrickslabs/remorph/issues/535)). In this release, the `source` variable name in the `QueryBuilder` class, which refers to the `Dialect` instance, has been updated to `engine` to accurately reflect its meaning as referring to either `Source` or `Target`. This change includes updating the usage of `source` to `engine` in the `build_query`, `_get_with_clause`, and `build_threshold_query` methods, as well as removing unnecessary parentheses in a list. These changes improve the code's clarity, accuracy, and readability, while maintaining the overall functionality of the affected methods.
* Use Oracle library only if the recon source is Oracle ([532](https://github.com/databrickslabs/remorph/issues/532)). In this release, we have added a new `ReconcileConfig` configuration object and a `SourceType` enumeration in the `databricks.labs.remorph.config` and `databricks.labs.remorph.reconcile.constants` modules, respectively. These objects are introduced to determine whether to include the Oracle JDBC driver library in a reconciliation job's task libraries. The `deploy_job` method and the `_job_recon_task` method have been updated to use the `_recon_config` attribute to decide whether to include the Oracle JDBC driver library. Additionally, the `_deploy_reconcile_job` method in the `install.py` file has been modified to include a new parameter called `reconcile`, which is passed as an argument from the `_config` object. This change enhances the flexibility and customization of the reconcile job deployment. Furthermore, new fixtures `oracle_recon_config` and `snowflake_reconcile_config` have been introduced for `ReconcileConfig` objects with Oracle and Snowflake specific configurations, respectively. These fixtures are used in the test functions for deploying jobs, ensuring that the tests are more focused and better reflect the actual behavior of the code.
* [chore] Make singletons for the relevant `DataType` instances ([705](https://github.com/databrickslabs/remorph/issues/705)). This commit introduces case objects for various data types, such as NullType, StringType, and others, effectively making them singletons. This change simplifies the creation of data type instances and ensures that each type has a single instance throughout the application. The new case objects are utilized in building SQL expressions, affecting functions such as Cast and TRY_CAST in the TSqlExpressionBuilder class. Additionally, the test file TSqlExpressionBuilderSpec.scala has been updated to include the new case objects and handle errors by returning null. The data types tested include integer types, decimal types, date and time types, string types, binary types, and JSON. The primary goal of this change is to improve the management and identification of data types in the codebase, as well as to enhance code readability and maintainability.

Dependency updates:

* Bump sqlglot from 25.1.0 to 25.5.1 ([534](https://github.com/databrickslabs/remorph/pull/534)).
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([555](https://github.com/databrickslabs/remorph/pull/555)).
* Bump sqlglot from 25.5.1 to 25.6.1 ([585](https://github.com/databrickslabs/remorph/pull/585)).

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.