* Added TSql transpiler ([734](https://github.com/databrickslabs/remorph/issues/734)). In this release, we have developed a new open-source library feature that enhances the transpilation of T-SQL code to Databricks-compatible code. The new TSqlToDatabricksTranspiler class has been added, which extends the Transpiler abstract class and defines the transpile method. This method converts T-SQL code to Databricks-compatible code by creating a lexer, token stream, parser, and parsed tree from the input string using TSqlLexer, CommonTokenStream, and tSqlFile. The parsed tree is then passed to the TSqlAstBuilder's visit method to generate a logical plan, which is optimized using an optimizer object with rules such as PullLimitUpwards and TopPercentToLimitSubquery. The optimized plan is then passed to the LogicalPlanGenerator's generate method to generate the final transpiled code. Additionally, a new class, IsTranspiledFromTSqlQueryRunner, has been added to the QueryRunner object to transpile T-SQL queries into the format expected by the Databricks query runner using the new TSqlToDatabricksTranspiler. The AcceptanceTestRunner class in the com.databricks.labs.remorph.coverage package has been updated to replace TSqlAstBuilder with IsTranspiledFromTSqlQueryRunner in the TSqlAcceptanceSuite class, indicating a change in the code responsible for handling TSql queries. This new feature aims to provide a smooth and efficient way to convert T-SQL code to Databricks-compatible code for further processing and execution.
* Added the missing info for the reconciliation documentation ([520](https://github.com/databrickslabs/remorph/issues/520)). In this release, we have made significant improvements to the reconciliation feature of our open-source library. We have added a configuration folder and provided a template for creating configuration files for specific table sources. The config files will contain necessary configurations for table-specific reconciliation. Additionally, we have included a note in the transformation section detailing the usage of user-defined functions (UDFs) in transformation expressions, with an example UDF called `sort_array_input()` provided. The reconcile configuration sample documentation has also been added, with a JSON configuration for reconciling tables using various operations like drop, join, transformation, threshold, filter, and JDBC ReaderOptions. The commit also includes examples of source and target tables, data overviews, and reconciliation configurations for various scenarios, such as basic config and column mapping, user transformations, explicit select, explicit drop, filters, and thresholds comparison, among others. These changes aim to make it easier for users to set up and execute the reconciliation process for specific table sources and provide clear and concise information about using UDFs in transformation expressions for reconciliation. This commit is co-authored by Vijay Pavan Nissankararao and SundarShankar89.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([555](https://github.com/databrickslabs/remorph/issues/555)). In this pull request, the sigstore/gh-action-sigstore-python dependency is being updated from version 2.1.1 to 3.0.0. This new version includes several changes and improvements, such as the addition of recursive globbing with ** to the inputs and the removal of certain settings like fulcio-url, rekor-url, ctfe, and rekor-root-pubkey. The signature, certificate, and bundle output settings have also been removed. Furthermore, the way inputs are parsed has been changed, and they are now made optional under certain conditions. The default suffix has been updated to .sigstore.json. The 3.0.0 version also resolves various deprecations present in sigstore-python's 2.x series and supports CI runners that use PEP 668 to constrain global package prefixes.
* Bump sqlglot from 25.1.0 to 25.5.1 ([534](https://github.com/databrickslabs/remorph/issues/534)). In the latest release, the `sqlglot` package has been updated from version 25.1.0 to 25.5.1, which includes bug fixes, breaking changes, new features, and refactors for parsing, analyzing, and rewriting SQL queries. The new version introduces optimizations for coalesced USING columns, preserves EXTRACT(date_part FROM datetime) calls, decouples NVL() from COALESCE(), and supports FROM CHANGES in Snowflake. It also provides configurable transpilation of Snowflake VARIANT, and supports view schema binding options for Spark and Databricks. The update addresses several issues, such as the use of timestamp with time zone over timestamptz, switch off table alias columns generation, and parse rhs of x::varchar(max) into a type. Additionally, the update cleans up CurrentTimestamp generation logic for Teradata. The `sqlglot` dependency has also been updated in the 'experimental.py' file of the 'databricks/labs/remorph/snow' module, along with the addition of a new private method `_parse_json` to the `Generator` class for parsing JSON data. Software engineers should review the changes and update their code accordingly, as conflicts with existing code will be resolved automatically by Dependabot, as long as the pull request is not altered manually.
* Bump sqlglot from 25.5.1 to 25.6.1 ([585](https://github.com/databrickslabs/remorph/issues/585)). In this release, the `sqlglot` dependency is updated from version 25.5.1 to 25.6.1 in the 'pyproject.toml' file. This update includes bug fixes, breaking changes, new features, and improvements. Breaking changes consist of updates to the QUALIFY clause in queries and the canonicalization of struct and array inline constructor. New features include support for ORDER BY ALL, FROM ROWS FROM (...), RPAD & LPAD functions, and exp.TimestampAdd. Bug fixes address issues related to the QUALIFY clause in queries, expansion of SELECT * REPLACE, RENAME, transpiling UDFs from Databricks, and more. The pull request also includes a detailed changelog, commit history, instructions for triggering Dependabot actions, and commands for reference, with the exception of the compatibility score for the new version, which is not taken into account in this pull request.
* Feature/reconcile table mismatch threshold ([550](https://github.com/databrickslabs/remorph/issues/550)). This commit enhances the reconciliation process in the open-source library with several new features, addressing issue [#504](https://github.com/databrickslabs/remorph/issues/504). A new `get_record_count` method is added to the `Reconcile` class, providing record count data for source and target tables, facilitating comprehensive analysis of table mismatches. A `CountQueryBuilder` class is introduced to build record count queries for different layers and SQL dialects, ensuring consistency in data processing. The `Thresholds` class is refactored into `ColumnThresholds` and `TableThresholds`, allowing for more granular control over comparisons and customizable threshold settings. New methods `_is_mismatch_within_threshold_limits` and `_insert_into_metrics_table` are added to the `recon_capture.py` file, improving fine-grained control over the reconciliation process and preventing false positives. Additionally, new classes, methods, and data structures have been implemented in the `execute` module to handle reconciliation queries and data more efficiently. These improvements contribute to a more accurate and robust reconciliation system.
* Feature: introduce core transpiler ([715](https://github.com/databrickslabs/remorph/issues/715)). A new core transpiler, `SnowflakeToDatabricksTranspiler`, has been introduced to convert Snowflake queries into Databricks SQL, streamlining integration and compatibility between the two systems. This transpiler is integrated into the coverage test suites for thorough testing, and is used to convert various types of logical plans, handling cases such as `Batch`, `WithCTE`, `Project`, `NamedTable`, `Filter`, and `Join`. The `SnowflakeToDatabricksTranspiler` class tokenizes input Snowflake query strings, initializes a `SnowflakeParser` instance, parses the input Snowflake query, generates a logical plan, and applies the `LogicalPlanGenerator` to the logical plan to generate the equivalent Databricks SQL query. Additionally, the `SnowflakeAstBuilder` class has been updated to alter the way `Batch` logical plans are built and improve overall functionality of the transpiler.
* Fixed LEFT and RIGHT JOIN syntax in Snowflake ANTLR grammar ([526](https://github.com/databrickslabs/remorph/issues/526)). A fix has been implemented to address issues with the Snowflake ANTLR grammar related to the proper parsing of LEFT and RIGHT JOIN statements. Previously, the keywords LEFT and RIGHT were incorrectly allowed as identifiers, but they are hard keywords that must be escaped to be used as column names. This change updates the grammar to escape these keywords in JOIN statements, improving the overall parsing of queries that include LEFT and RIGHT JOINs. Additionally, semantic predicates have been suggested to handle cases where LEFT or RIGHT are used as column names without escaping, although this is not yet implemented. To ensure the correctness of the updated grammar, new tests have been added to the SnowflakeAstBuilderSpec for LEFT and RIGHT JOINs, which check that the Abstract Syntax Tree (AST) is built correctly for these queries.
* Fixed Snowflake Acceptance Testcases Failures ([531](https://github.com/databrickslabs/remorph/issues/531)). In this release, updates have been made to the acceptance testcases for various SQL functions in the open-source library. The DENSE RANK function's testcase has been updated with a window specification and ORDER BY clause in both Snowflake and Databricks SQL syntaxes, ensuring accurate test results. The LAG function's testcase now includes a PARTITION BY and ORDER BY clause, as well as the NULLS LAST keyword in Databricks SQL, for improved accuracy and consistency. The SQL queries in the Snowflake testcase for the `last_value` function have been updated with a window specification, ORDER BY clause, and NULLS LAST directive for Databricks SQL. Test case failures in the Snowflake acceptance testsuite have been addressed with updates to the LEAD function, MONTH_NAME to MONTHNAME renaming, and DATE_FORMAT to TO_DATE conversion, improving reliability and consistency. The ntile function's testcase has been updated with PARTITION BY and ORDER BY clauses, and the NULLS LAST keyword has been added to the Databricks SQL query. The SQL query for null-safe equality comparison has been updated with a conditional expression compatible with Snowflake. The ranking function's testcase has been improved with the appropriate partition and order by clauses, and the NULLS LAST keyword has been added to the Databricks SQL query, enhancing accuracy and consistency. Lastly, updates have been made to the ROW_NUMBER function's testcase, ensuring accurate and consistent row numbering for both Snowflake and Databricks.
* Fixed TSQL transpiler ([735](https://github.com/databrickslabs/remorph/issues/735)). In this release, we have implemented a fix for the TSQL transpiler, addressing the issue [#7](https://github.com/databrickslabs/remorph/issues/7). This enhancement allows the library to accurately convert TSQL code into an equivalent format that is compatible with other databases. The fix resolves reported bugs related to incorrect syntax interpretation, thereby improving the overall reliability and functionality of the transpiler. Software engineers and developers relying on TSQL compatibility for cross-database operations will benefit from this improvement. We encourage users to test and provide feedback on this updated feature.
* Fixed `SELECT TOP X PERCENT` IR translation for TSQL ([733](https://github.com/databrickslabs/remorph/issues/733)). In this release, we have made several enhancements to the open-source library to improve compatibility with T-SQL and Catalyst. We have added a new dependency, `pprint_${scala.binary.version}` version 0.8.1 from the `com.lihaoyi` group, to provide advanced pretty-printing functionality for Scala. We have also fixed the translation of the TSQL `SELECT TOP X PERCENT` feature in the parser for intermediate expressions, addressing the difference in syntax between TSQL and SQL for limiting the number of rows returned by a query. Additionally, we have modified the implementation of the `WITH` clause and added a new expression for the `SELECT TOP` clause in T-SQL, improving the compatibility of the codebase with T-SQL and aligning it with Catalyst. We also introduced a new abstract class `Rule` and a case class `Rules` in the `com.databricks.labs.remorph.parsers.intermediate` package to fix the `SELECT TOP X PERCENT` IR translation for TSQL by adding new rules. Furthermore, we have added a new Scala file, `subqueries.scala`, containing abstract class `SubqueryExpression` and two case classes that extend it, and made changes to the `trees.scala` file to improve the tree string representation for better readability and consistency in the codebase. These changes aim to improve the overall functionality of the library and make it easier for new users to understand and adopt the project.
* Fixed invalid null constraint and FQN ([517](https://github.com/databrickslabs/remorph/issues/517)). In this change, we addressed issues [#516](https://github.com/databrickslabs/remorph/issues/516) and [#517](https://github.com/databrickslabs/remorph/issues/517), which involved resolving an invalid null constraint and correcting a fully qualified name (FQN) in our open-source library. The `read_data` function in the `databricks.py` file was updated to improve null constraint handling and ensure the FQN is valid. Previously, the catalog was always appended to the table name, potentially resulting in an invalid FQN and null constraint issues. Now, the code checks if the catalog exists before appending it to the table name, and if not provided, the schema and table name are concatenated directly. Additionally, we removed the NOT NULL constraint from the catalog field in the source_table and target_table structs in the main SQL file, allowing null values for this field. These changes maintain backward compatibility and enhance the overall functionality and robustness of the project, ensuring accurate query results and avoiding potential errors.
* Generate SQL for arithmetic operators ([726](https://github.com/databrickslabs/remorph/issues/726)). In this release, we have introduced a new private method `arithmetic` to the `ExpressionGenerator` class that generates SQL for arithmetic operations, including unary minus, unary plus, multiplication, division, modulo, addition, and subtraction. This improves the readability and maintainability of the code by separating concerns and making the functionality more explicit. Additionally, we have introduced a new trait named `Arithmetic` to group arithmetic expressions together, which enables easier manipulation and identification of arithmetic expressions in the code. A new test suite has also been added for arithmetic operations in the `ExpressionGenerator` class, which improves test coverage and ensures the correct SQL is generated for these operations. These changes provide a welcome addition for developers looking to extend the functionality of the `ExpressionGenerator` class for arithmetic operations.
* Generate SQL for bitwise operators. The ExpressionGenerator class in the remorph project has been updated to support generating SQL for bitwise operators (OR, AND, XOR, NOT) through the addition of a new private method `bitwise` that converts bitwise operations to equivalent SQL expressions. The `expression` method has also been updated to utilize the new `bitwise` method for any input ir.Expression that is a bitwise operation. To facilitate this change, a new trait called `Bitwise` and updated case classes for bitwise operations, including `BitwiseNot`, `BitwiseAnd`, `BitwiseOr`, and `BitwiseXor`, have been implemented. The updated case classes extend the new `Bitwise` trait and include the `dataType` override method to return the data type of the left expression. A new test case in ExpressionGeneratorTest for bitwise operators has been added to validate the functionality, and the `expression` method in ExpressionGenerator now utilizes the GeneratorContext() instead of new GeneratorContext(). These changes enable the ExpressionGenerator to generate SQL code for bitwise operations, expanding its capabilities.
* Generate `.. LIKE ..` ([723](https://github.com/databrickslabs/remorph/issues/723)). In this commit, the `ExpressionGenerator` class has been enhanced with a new method, `like`, which generates SQL `LIKE` expressions. The method takes a `GeneratorContext` object and an `ir.Like` object as arguments, and returns a string representation of the `LIKE` expression. It uses the `expression` method to generate the left and right sides of the `LIKE` operator, and also handles the optional escape character. Additionally, the `timestampLiteral` and `dateLiteral` methods have been updated to take an `ir.Literal` object and better handle `NULL` values. A new test case has also been added for the `ExpressionGenerator` class, which checks the `like` function and includes examples for basic usage and usage with an escape character. This commit improves the functionality of the `ExpressionGenerator` class, allowing it to handle `LIKE` expressions and better handle `NULL` values for timestamps and dates.
* Generate `DISTINCT`, `*` ([739](https://github.com/databrickslabs/remorph/issues/739)). In this release, we've enhanced the ExpressionGenerator class to support generating `DISTINCT` and `*` (star) expressions in SQL queries. Previously, the class did not handle these cases, resulting in incomplete or incorrect SQL queries. With the introduction of the `distinct` method, the class can now generate the `DISTINCT` keyword followed by the expression to be applied to, and the `star` method produces the `*` symbol, optionally followed by the name of the object (table or subquery) to which it applies. These improvements make the ExpressionGenerator class more robust and compatible with various SQL dialects, resulting in more accurate query outcomes. We've also added new test cases for the `ExpressionGenerator` class to ensure that `DISTINCT` and `*` expressions are generated correctly. Additionally, support for generating SQL `DISTINCT` and `*` (wildcard) has been added to the transpilation of Logical Plans to SQL, specifically in the `ir.Project` class. This ensures that the correct SQL `SELECT * FROM table` syntax is generated when a wildcard is used in the expression list. These enhancements significantly improve the functionality and compatibility of our open-source library.
* Generate `LIMIT` ([732](https://github.com/databrickslabs/remorph/issues/732)). A new method has been added to generate a SQL LIMIT clause for a logical plan in the data transformation tool. A new `case` branch has been implemented in the `generate` method of the `LogicalPlanGenerator` class to handle the `ir.Limit` case, which generates the SQL LIMIT clause. If a percentage limit is specified, the tool will throw an exception as it is not currently supported. The `generate` method of the `ExpressionGenerator` class has been replaced with a new `generate` method in the `ir.Project`, `ir.Filter`, and new `ir.Limit` cases to ensure consistent expression generation. A new case class `Limit` has been added to the `com.databricks.labs.remorph.parsers.intermediate.relations` package, which extends the `UnaryNode` class and has four parameters: `input`, `limit`, `is_percentage`, and `with_ties`. This new class enables limiting the number of rows returned by a query, with the ability to specify a percentage of rows or include ties in the result set. Additionally, a new test case has been added to the `LogicalPlanGeneratorTest` class to verify the transpilation of a `Limit` node to its SQL equivalent, ensuring that the `Limit` node is correctly handled during transpilation.
* Generate `OFFSET` SQL clauses ([736](https://github.com/databrickslabs/remorph/issues/736)). The Remorph project's latest update introduces a new OFFSET clause generation feature for SQL queries in the LogicalPlanGenerator class. This change adds support for skipping a specified number of rows before returning results in a SQL query, enhancing the library's query generation capabilities. The implementation includes a new case in the match statement of the generate function to handle ir.Offset nodes, creating a string representation of the OFFSET clause using the provide offset expression. Additionally, the commit includes a new test case in the LogicalPlanGeneratorTest class to validate the OFFSET clause generation, ensuring that the LogicalPlanGenerator can translate Offset AST nodes into corresponding SQL statements. Overall, this update enables the generation of more comprehensive SQL queries with OFFSET support, providing software engineers with greater flexibility for pagination and other data processing tasks.
* Generate `ORDER BY` SQL clauses ([737](https://github.com/databrickslabs/remorph/issues/737)). This commit introduces new classes and enumerations for sort direction and null ordering, as well as an updated SortOrder case class, enabling the generation of ORDER BY SQL clauses. The LogicalPlanGenerator and SnowflakeExpressionBuilder classes have been modified to utilize these changes, allowing for more flexible and customizable sorting and null ordering when generating SQL queries. Additionally, the TSqlRelationBuilderSpec test suite has been updated to reflect these changes, and new test cases have been added to ensure the correct transpilation of ORDER BY clauses. Overall, these improvements enhance the Remorph project's capability to parse and generate SQL expressions with various sorting scenarios, providing a more robust and maintainable codebase.
* Generate `UNION` / `EXCEPT` / `INTERSECT` ([731](https://github.com/databrickslabs/remorph/issues/731)). In this release, we have introduced support for the `UNION`, `EXCEPT`, and `INTERSECT` set operations in our data processing system's generator. A new `unknown` method has been added to the `Generator` trait to return a `TranspileException` when encountering unsupported operations, allowing for better error handling and more informative error messages. The `LogicalPlanGenerator` class in the remorph project has been extended to support generating `UNION`, `EXCEPT`, and `INTERSECT` SQL operations with the addition of a new parameter, `explicitDistinct`, to enable explicit specification of `DISTINCT` for these set operations. A new test suite has been added to the `LogicalPlanGenerator` to test the generation of these operations using the `SetOperation` class, which now has four possible set operations: `UnionSetOp`, `IntersectSetOp`, `ExceptSetOp`, and `UnspecifiedSetOp`. With these changes, our system can handle a wider range of input and provide meaningful error messages for unsupported operations, making it more versatile in handling complex SQL queries.
* Generate `VALUES` SQL clauses ([738](https://github.com/databrickslabs/remorph/issues/738)). The latest commit introduces a new feature to generate `VALUES` SQL clauses in the context of the logical plan generator. A new case branch has been implemented in the `generate` method to manage `ir.Values` expressions, converting input data (lists of lists of expressions) into a string representation compatible with `VALUES` clauses. The existing functionality remains unchanged. Additionally, a new test case has been added for the `LogicalPlanGenerator` class, which checks the correct transpilation to `VALUES` SQL clauses. This test case ensures that the `ir.Values` method, which takes a sequence of sequences of literals, generates the corresponding `VALUES` SQL clause, specifically checking the input `Seq(Seq(ir.Literal(1), ir.Literal(2)), Seq(ir.Literal(3), ir.Literal(4)))` against the SQL clause `"VALUES (1,2), (3,4)"`. This change enables testing the functionality of generating `VALUES` SQL clauses using the `LogicalPlanGenerator` class.
* Generate predicate expressions. This commit introduces the generation of predicate expressions as part of the SQL ExpressionGenerator in the `com.databricks.labs.remorph.generators.sql` package, enabling the creation of more complex SQL expressions. The changes include the addition of a new private method, `predicate(ctx: GeneratorContext, expr: Expression)`, to handle predicate expressions, and the introduction of two new predicate expression types, LessThan and LessThanOrEqual, for comparing the relative ordering of two expressions. Existing predicate expression types have been updated with consistent naming. Additionally, the commit incorporates improvements to the handling of comparison operators in the SnowflakeExpressionBuilder and TSqlExpressionBuilder classes, addressing bugs and ensuring precise predicate expression generation. The `ParserTestCommon` trait has also been updated to reorder certain operators in a logical plan, maintaining consistent comparison results in tests. New test cases have been added to several test suites to ensure the correct interpretation and generation of predicate expressions involving different data types and search conditions. Overall, these enhancements provide more fine-grained comparison of expressions, enable more nuanced condition checking, and improve the robustness and accuracy of the SQL expression generation process.
* Merge remote-tracking branch 'origin/main'. In this update, the `ExpressionGenerator` class in the `com.databricks.labs.remorph.generators.sql` package has been enhanced with two new private methods: `dateLiteral` and `timestampLiteral`. These methods are designed to generate the SQL literal representation of `DateType` and `TimestampType` expressions, respectively. The introduction of these methods addresses the previous limitations of formatting date and timestamp values directly, which lacked extensibility and required duplicated code for handling null values. By extracting the formatting logic into separate methods, this commit significantly improves code maintainability and reusability, enhancing the overall readability and understandability of the `ExpressionGenerator` class for developers. The `dateLiteral` method handles `DateType` values by formatting them using the `dateFormat` `SimpleDateFormat` instance, returning `NULL` if the value is missing. Likewise, the `timestampLiteral` method formats `TimestampType` values using the `timeFormat` `SimpleDateFormat` instance, returning `NULL` if the value is missing. These methods will enable developers to grasp the code's functionality more easily and make future enhancements to the class.
* Modified dataclass for table threshold and added documentation ([714](https://github.com/databrickslabs/remorph/issues/714)). A series of modifications have been implemented to enhance the threshold configuration and validation for table reconciliation in the open-source library. The `TableThresholds` dataclass has been updated to accept a string for the `model` attribute, replacing the previously used `TableThresholdModel` Enum. Additionally, a new `validate_threshold_model` method has been added to `TableThresholds` to ensure proper validation of the `model` attribute. A new exception class, `InvalidModelForTableThreshold`, has been introduced to handle invalid settings. Column-specific thresholds can now be set using the `ColumnThresholds` configuration option. The `recon_capture.py` and `recon_config.py` files have been updated accordingly, and the documentation has been revised to clarify these changes. These improvements offer greater flexibility and control for users configuring thresholds while also refining validation and error handling.
* Support CTAS in TSQL Grammar and add more comparison operators ([545](https://github.com/databrickslabs/remorph/issues/545)). In this release, we have added support for the CTAS (CREATE TABLE AS) statement in the TSQL (T-SQL) grammar, as well as introduced new comparison operators: !=, !<, and !>. The CTAS statement allows for the creation of a new table by selecting from an existing table or query, potentially improving code readability and performance. The new comparison operators provide alternative ways of expressing inequalities, increasing flexibility for developers. The keyword `REPLICATE` has also been added for creating a full copy of a database or availability group. These changes enhance the overall functionality of the TSQL grammar and improve the user's ability to express various operations in TSQL. The CTAS statement is implemented as a new rule, and the new comparison operators are added as methods in the TSqlExpressionBuilder class. These changes provide increased capability and flexibility for TSQL parsing and query handling. The new methods are not adding any new external dependencies, and the project remains self-contained. The additions have been tested with the TSqlExpressionBuilderSpec test suite, ensuring the functionality and compatibility of the TSQL parser.
* Support translation of TSQL INGORE NULLS clause in windowing functions ([511](https://github.com/databrickslabs/remorph/issues/511)). The latest change introduces support for translating the TSQL IGNORE NULLS and RESPECT NULLS clauses in windowing functions to their equivalents in Databricks SQL. In TSQL, these clauses appear after the function name and before the OVER clause, affecting how the functions handle null values. Databricks SQL represents this functionality with an optional trailing boolean parameter for specific windowing functions. With this update, when the IGNORE NULLS clause is specified in TSQL, a boolean option is appended to the corresponding Databicks SQL windowing functions, with RESPECT NULLS as the default. This enhancement is facilitated by a new private method, `buildNullIgnore`, which adds the boolean parameter to the original expression when IGNORE NULLS is specified in the OVER clause. The alteration is exemplified in new test examples for the TSqlFunctionSpec, which include testing the LEAD function with and without the IGNORE NULLS clause, and updates to the translation of functions with non-standard syntax.
* TSQL: Implement TSQL UPDATE/DELETE statements ([540](https://github.com/databrickslabs/remorph/issues/540)). In this release, we have added support for TSQL UPDATE and DELETE statements in all syntactical forms, including UDF column transformations, in the TSqlParser.g4 file. The implementation of both statements is done in a single file and they share many common clauses. We have also introduced two new case classes, UpdateTable and MergeTables, in the extensions.scala file to implement the TSQL UPDATE and DELETE statements, respectively. Additionally, we have added new methods to handle various clauses and elements associated with these statements in the TSqlErrorStrategy class. A series of tests have been included to ensure the correct translation of various UPDATE and DELETE queries to their respective Abstract Syntax Trees (ASTs). These changes bring TSQL UPDATE and DELETE statement functionality to the project and allow for their use in a variety of contexts, providing developers with more flexibility and control when working with TSQL UPDATE and DELETE statements in the parser.
* TSQL: Implement translation of INSERT statement ([515](https://github.com/databrickslabs/remorph/issues/515)). In this release, we have implemented the TSQL INSERT statement in its entirety, including all target options, optional clauses, and Common Table Expressions (CTEs) in our open-source library. The change includes updates to the TSqlParser.g4 file to support the INSERT statement's various clauses, such as TOP, INTO, WITH TABLE HINTS, outputClause, and optionClause. We have also added new case classes to the TSQL AST to handle various aspects of the INSERT statement, including LocalVarTable, Output, InsertIntoTable, DerivedRows, DefaultValues, and Default. The TSqlExpressionBuilder and TSqlRelationBuilder classes have been updated to support the new INSERT statement, including handling output column lists, aliases, and JSON clauses. We have added specification tests to TSqlAstBuilderSpec.scala to demonstrate the various ways that the INSERT statement can be written and to ensure the correct translation of TSQL INSERT statements into the remorph project.
* TSQL: Remove the SIGN fragment from numerical tokens ([547](https://github.com/databrickslabs/remorph/issues/547)). In this release, we have made changes to the TSQL expression builder in the remorph project that affect how negative and positive numerical literals are parsed. Negative literals, such as -2, will now be parsed as UMinus(Literal(2)) instead of Literal(-2), and positive literals, such as +1, will be parsed as UPlus(Literal(1)) instead of Literal(1). This change was made to address issue [#546](https://github.com/databrickslabs/remorph/issues/546), but it is not an ideal solution as it may cause inconvenience in downstream processes. The affected numerical tokens include INT, HEX, FLOAT, REAL, and MONEY, which have been simplified by removing the SIGN fragment. We have updated the buildPrimitive method to handle INT, REAL, and FLOAT token types and ensure that numerical tokens are parsed correctly. We intend to keep issue [#546](https://github.com/databrickslabs/remorph/issues/546) open for further exploration of a better solution. The tests have been updated to reflect these changes.
* TSQL: Simplifies named table tableSource, implements columnAlias list ([512](https://github.com/databrickslabs/remorph/issues/512)). This change introduces significant updates to the TSqlParser's grammar for tableSource, simplifying and consolidating rules related to table aliases and column alias lists. A new Relation called TableWithHints has been added to collect and process table hints, some of which have direct counterparts in the Catalyst optimizer or can be used as comments for migration purposes. The TSQLExpressionBuilder and TSqlRelationBuilder classes have been modified to handle table hints and column aliases, and the TSqlAstBuilderSpec test suite has been updated to include new tests for table hints in T-SQL SELECT statements. These changes aim to improve parsing, handling, and optimization of table sources, table hints, and column aliases in TSQL queries.
* TSQL: Support generic FOR options ([525](https://github.com/databrickslabs/remorph/issues/525)). In this release, we have added support for parsing T-SQL (Transact-SQL) options that contain the keyword `FOR` using the standard syntax `[FOR]`. This change is necessary as `FOR` cannot be used directly as an identifier without escaping, as it would otherwise be seen as a table alias in a `SELECT` statement. The ANTLR rule for parsing generic options has been expanded to handle these special cases correctly, allowing for the proper parsing of options such as `OPTIMIZE FOR UNKNOWN` in a `SELECT` statement. Additionally, a new case has been added to the OptionBuilder class to handle the `FOR` keyword and elide it, converting `OPTIMIZE FOR UNKNOWN` to `OPTIMIZE` with an id of 'UNKNOWN'. This ensures the proper handling of options containing `FOR` and avoids any conflicts with the `FOR` clause in T-SQL statements. This change was implemented by Valentin Kasas and involves adding a new alternative to the `genericOption` rule in the TSqlParser.g4 file, but no new methods have been added.
* Updated Dialect Variable Name ([535](https://github.com/databrickslabs/remorph/issues/535)). In this release, the `source` variable name in the `QueryBuilder` class, which refers to the `Dialect` instance, has been updated to `engine` to accurately reflect its meaning as referring to either `Source` or `Target`. This change includes updating the usage of `source` to `engine` in the `build_query`, `_get_with_clause`, and `build_threshold_query` methods, as well as removing unnecessary parentheses in a list. These changes improve the code's clarity, accuracy, and readability, while maintaining the overall functionality of the affected methods.
* Use Oracle library only if the recon source is Oracle ([532](https://github.com/databrickslabs/remorph/issues/532)). In this release, we have added a new `ReconcileConfig` configuration object and a `SourceType` enumeration in the `databricks.labs.remorph.config` and `databricks.labs.remorph.reconcile.constants` modules, respectively. These objects are introduced to determine whether to include the Oracle JDBC driver library in a reconciliation job's task libraries. The `deploy_job` method and the `_job_recon_task` method have been updated to use the `_recon_config` attribute to decide whether to include the Oracle JDBC driver library. Additionally, the `_deploy_reconcile_job` method in the `install.py` file has been modified to include a new parameter called `reconcile`, which is passed as an argument from the `_config` object. This change enhances the flexibility and customization of the reconcile job deployment. Furthermore, new fixtures `oracle_recon_config` and `snowflake_reconcile_config` have been introduced for `ReconcileConfig` objects with Oracle and Snowflake specific configurations, respectively. These fixtures are used in the test functions for deploying jobs, ensuring that the tests are more focused and better reflect the actual behavior of the code.
* [chore] Make singletons for the relevant `DataType` instances ([705](https://github.com/databrickslabs/remorph/issues/705)). This commit introduces case objects for various data types, such as NullType, StringType, and others, effectively making them singletons. This change simplifies the creation of data type instances and ensures that each type has a single instance throughout the application. The new case objects are utilized in building SQL expressions, affecting functions such as Cast and TRY_CAST in the TSqlExpressionBuilder class. Additionally, the test file TSqlExpressionBuilderSpec.scala has been updated to include the new case objects and handle errors by returning null. The data types tested include integer types, decimal types, date and time types, string types, binary types, and JSON. The primary goal of this change is to improve the management and identification of data types in the codebase, as well as to enhance code readability and maintainability.
Dependency updates:
* Bump sqlglot from 25.1.0 to 25.5.1 ([534](https://github.com/databrickslabs/remorph/pull/534)).
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([555](https://github.com/databrickslabs/remorph/pull/555)).
* Bump sqlglot from 25.5.1 to 25.6.1 ([585](https://github.com/databrickslabs/remorph/pull/585)).