Starrocks

Latest version: v1.2.0

Safety actively analyzes 688463 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 20

3.1.0

Release date: August 7, 2023

New Features

Shared-data cluster

- Added support for Primary Key tables, on which persistent indexes cannot be enabled.
- Supports the [AUTO_INCREMENT](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/auto_increment) column attribute, which enables a globally unique ID for each data row and thus simplifies data management.
- Supports [automatically creating partitions during loading and using partitioning expressions to define partitioning rules](https://docs.starrocks.io/en-us/3.1/table_design/automatic_partitioning), thereby making partition creation easier to use and more flexible.

Data Lake analytics

- Supports accessing views created on tables within [Hive catalogs](https://docs.starrocks.io/en-us/3.1/data_source/catalog/hive_catalog).
- Supports accessing Parquet-formatted Iceberg v2 tables.
- [Preview] Supports sinking data to Parquet-formatted Iceberg tables.
- [Preview] Supports accessing data stored in Elasticsearch by using [Elasticsearch catalogs](https://docs.starrocks.io/en-us/3.1/data_source/catalog/elasticsearch_catalog). This simplifies the creation of Elasticsearch external tables.
- [Preview] Supports performing analytics on streaming data stored in Apache Paimon by using [Paimon catalogs](https://docs.starrocks.io/en-us/3.1/data_source/catalog/paimon_catalog).

Storage engine, data ingestion, and query

- Upgraded automatic partitioning to [expression partitioning](https://docs.starrocks.io/en-us/3.1/table_design/expression_partitioning). Users only need to use a simple partition expression (either a time function expression or a column expression) to specify a partitioning method at table creation, and StarRocks will automatically create partitions based on the data characteristics and the rule defined in the partition expression during data loading. This method of partition creation is suitable for most scenarios and is more flexible and user-friendly.
- Supports [list partitioning](https://docs.starrocks.io/en-us/3.1/table_design/list_partitioning). Data is partitioned based on a list of values predefined for a particular column, which can accelerate queries and manage clearly categorized data more efficiently.
- Added a new table named `loads` to the `Information_schema` database. Users can query the results of [Broker Load](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-manipulation/BROKER%20LOAD) and [Insert](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-manipulation/insert) jobs from the `loads` table.
- Supports logging the unqualified data rows that are filtered out by [Stream Load](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-manipulation/STREAM%20LOAD), [Broker Load](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-manipulation/BROKER%20LOAD), and [Spark Load](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-manipulation/SPARK%20LOAD) jobs. Users can use the `log_rejected_record_num` parameter in their load job to specify the maximum number of data rows that can be logged.
- Supports [random bucketing](https://docs.starrocks.io/en-us/3.1/table_design/Data_distribution#choose-bucketing-columns). With this feature, users do not need to configure bucketing columns at table creation, and StarRocks will randomly distribute the data loaded into it to buckets. Using this feature together with the capability of automatically setting the number of buckets (`BUCKETS`) that StarRocks has provided since v2.5.7, users no longer need to consider bucket configurations, and table creation statements are greatly simplified. In big data and high performance-demanding scenarios, however, we recommend that users continue using hash bucketing, because this way they can use bucket pruning to accelerate queries.
- Supports using the table function FILES() in [INSERT INTO](https://docs.starrocks.io/en-us/3.1/loading/InsertInto) to directly load the data of Parquet- or ORC-formatted data files stored in AWS S3. The FILES() function can automatically infer the table schema, which relieves the need to create external catalogs or file external tables before data loading and therefore greatly simplifies the data loading process.
- Supports [generated columns](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/generated_columns). With the generated column feature, StarRocks can automatically generate and store the values of column expressions and automatically rewrite queries to improve query performance.
- Supports loading data from Spark to StarRocks by using [Spark connector](https://docs.starrocks.io/en-us/3.1/loading/Spark-connector-starrocks). Compared to [Spark Load](https://docs.starrocks.io/en-us/3.1/loading/SparkLoad), the Spark connector provides more comprehensive capabilities. Users can define a Spark job to perform ETL operations on the data, and the Spark connector serves as the sink in the Spark job.
- Supports loading data into columns of the [MAP](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-types/Map) and [STRUCT](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-types/STRUCT) data types, and supports nesting Fast Decimal values in ARRAY, MAP, and STRUCT.

SQL reference

- Added the following storage volume-related statements: [CREATE STORAGE VOLUME](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/Administration/CREATE%20STORAGE%20VOLUME), [ALTER STORAGE VOLUME](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/Administration/ALTER%20STORAGE%20VOLUME), [DROP STORAGE VOLUME](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/Administration/DROP%20STORAGE%20VOLUME), [SET DEFAULT STORAGE VOLUME](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/Administration/SET%20DEFAULT%20STORAGE%20VOLUME), [DESC STORAGE VOLUME](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/Administration/DESC%20STORAGE%20VOLUME), [SHOW STORAGE VOLUMES](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/Administration/SHOW%20STORAGE%20VOLUMES).

- Supports altering table comments using [ALTER TABLE](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-definition/ALTER%20TABLE). [#21035](https://github.com/StarRocks/starrocks/pull/21035)

- Added the following functions:

- Struct functions: [struct (row)](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/struct-functions/row), [named_struct](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/struct-functions/named_struct)
- Map functions: [str_to_map](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/string-functions/str_to_map), [map_concat](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/map_concat), [map_from_arrays](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/map_from_arrays), [element_at](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/element_at), [distinct_map_keys](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/distinct_map_keys), [cardinality](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/cardinality)
- Higher-order Map functions: [map_filter](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/map_filter), [map_apply](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/map_apply), [transform_keys](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/transform_keys), [transform_values](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/transform_values)
- Array functions: [array_agg](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/array_agg) supports `ORDER BY`, [array_generate](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/array_generate), [element_at](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/element_at), [cardinality](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/cardinality)
- Higher-order Array functions: [all_match](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/all_match), [any_match](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/any_match)
- Aggregate functions: [min_by](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/aggregate-functions/min_by), [percentile_disc](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/aggregate-functions/percentile_disc)
- Table functions: [generate_series](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/table-functions/generate_series), [FILES](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/table-functions/files)
- Date functions: [next_day](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/date-time-functions/next_day), [previous_day](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/date-time-functions/previous_day), [last_day](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/date-time-functions/last_day), [makedate](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/date-time-functions/makedate), [date_diff](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/date-time-functions/date_diff)
- Bitmap functions：[bitmap_subset_limit](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/bitmap-functions/bitmap_subset_limit), [bitmap_subset_in_range](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/bitmap-functions/bitmap_subset_in_range)

Privileges and security

Added [privilege items](https://docs.starrocks.io/en-us/3.1/administration/privilege_item#storage-volume) related to storage volumes and [privilege items](https://docs.starrocks.io/en-us/3.1/administration/privilege_item#catalog) related to external catalogs, and supports using [GRANT](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/account-management/GRANT) and [REVOKE](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/account-management/REVOKE) to grant and revoke these privileges.

Improvements

Shared-data cluster

Optimized the data cache in StarRocks shared-data clusters. The optimized data cache allows for specifying the range of hot data. It can also prevent queries against cold data from occupying the local disk cache, thereby ensuring the performance of queries against hot data.

Materialized view

- Optimized the creation of an asynchronous materialized view:
- Supports random bucketing. If users do not specify bucketing columns, StarRocks adopts random bucketing by default.
- Supports using `ORDER BY` to specify a sort key.
- Supports specifying attributes such as `colocate_group`, `storage_medium`, and `storage_cooldown_time`.
- Supports using session variables. Users can configure these variables by using the `properties("session.<variable_name>" = "<value>")` syntax to flexibly adjust view refreshing strategies.
- Enables the spill feature for all asynchronous materialized views and implements a query timeout duration of 1 hour by default.
- Supports creating materialized views based on views. This makes materialized views easier to use in data modeling scenarios, because users can flexibly use views and materialized views based on their varying needs to implement layered modeling.
- Optimized query rewrite with asynchronous materialized views:
- Supports Stale Rewrite, which allows materialized views that are not refreshed within a specified time interval to be used for query rewrite regardless of whether the base tables of the materialized views are updated. Users can specify the time interval by using the `mv_rewrite_staleness_second` property at materialized view creation.
- Supports rewriting View Delta Join queries against materialized views that are created on Hive catalog tables (a primary key and a foreign key must be defined).
- Optimized the mechanism for rewriting queries that contain union operations, and supports rewriting queries that contain joins or functions such as COUNT DISTINCT and time_slice.
- Optimized the refreshing of asynchronous materialized views:
- Optimized the mechanism for refreshing materialized views that are created on Hive catalog tables. StarRocks now can perceive partition-level data changes, and refreshes only the partitions with data changes during each automatic refresh.
- Supports using the `REFRESH MATERIALIZED VIEW WITH SYNC MODE` syntax to synchronously invoke materialized view refresh tasks.
- Enhanced the use of asynchronous materialized views:
- Supports using `ALTER MATERIALIZED VIEW {ACTIVE | INACTIVE}` to enable or disable a materialized view. Materialized views that are disabled (in the `INACTIVE` state) cannot be refreshed or used for query rewrite, but can be directly queried.
- Supports using `ALTER MATERIALIZED VIEW SWAP WITH` to swap two materialized views. Users can create a new materialized view and then perform an atomic swap with an existing materialized view to implement schema changes on the existing materialized view.
- Optimized synchronous materialized views:
- Supports direct queries against synchronous materialized views using SQL hints `[_SYNC_MV_]`, allowing for walking around issues that some queries cannot be properly rewritten in rare circumstances.
- Supports more expressions, such as `CASE-WHEN`, `CAST`, and mathematical operations, which make materialized views suitable for more business scenarios.

Data Lake analytics

- Optimized metadata caching and access for Iceberg to improve Iceberg data query performance.
- Optimized the data cache to further improve data lake analytics performance.

Storage engine, data ingestion, and query

- Announced the general availability of the [spill](https://docs.starrocks.io/en-us/3.1/administration/spill_to_disk) feature, which supports spilling the intermediate computation results of some blocking operators to disk. With the spill feature enabled, when a query contains aggregate, sort, or join operators, StarRocks can cache the intermediate computation results of the operators to disk to reduce memory consumption, thereby minimizing query failures caused by memory limits.
- Supports pruning on cardinality-preserving joins. If users maintain a large number of tables which are organized in the star schema (for example, SSB) or the snowflake schema (for example, TCP-H) but they query only a small number of these tables, this feature helps prune unnecessary tables to improve the performance of joins.
- Supports partial updates in column mode. Users can enable the column mode when they perform partial updates on Primary Key tables by using the [UPDATE](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-manipulation/UPDATE) statement. The column mode is suitable for updating a small number of columns but a large number of rows, and can improve the updating performance by up to 10 times.
- Optimized the collection of statistics for the CBO. This reduces the impact of statistics collection on data ingestion and increases statistics collection performance.
- Optimized the merge algorithm to increase the overall performance by up to 2 times in permutation scenarios.
- Optimized the query logic to reduce dependency on database locks.

SQL reference

- Conditional functions case, coalesce, if, ifnull, and nullif support the ARRAY, MAP, STRUCT, and JSON data types.
- The following Array functions support nested types MAP, STRUCT, and ARRAY:
- array_agg
- array_contains, array_contains_all, array_contains_any
- array_slice, array_concat
- array_length, array_append, array_remove, array_position
- reverse, array_distinct, array_intersect, arrays_overlap
- array_sortby
- The following Array functions support the Fast Decimal data type:
- array_agg
- array_append, array_remove, array_position, array_contains
- array_length
- array_max, array_min, array_sum, array_avg
- arrays_overlap, array_difference
- array_slice, array_distinct, array_sort, reverse, array_intersect, array_concat
- array_sortby, array_contains_all, array_contains_any

Bug Fixes

Fixed the following issues:

- Requests to reconnect to Kafka for Routine Load jobs cannot be properly processed. [23477](https://github.com/StarRocks/starrocks/issues/23477)
- For SQL queries that involve multiple tables and contain a `WHERE` clause, if these SQL queries have the same semantics but the order of the tables in each SQL query is different, some of these SQL queries may fail to be rewritten to benefit from the related materialized views. [22875](https://github.com/StarRocks/starrocks/issues/22875)
- Duplicate records are returned for queries that contain a `GROUP BY` clause. [19640](https://github.com/StarRocks/starrocks/issues/19640)
- Invoking the lead() or lag() function may cause BE crashes. [22945](https://github.com/StarRocks/starrocks/issues/22945)
- Rewriting partial partition queries based on materialized views that are created on external catalog tables fail. [19011](https://github.com/StarRocks/starrocks/issues/19011)
- SQL statements that contain both a backward slash (`\`) and a semicolon (`;`) cannot be properly parsed. [16552](https://github.com/StarRocks/starrocks/issues/16552)
- A table cannot be truncated if a materialized view created on the table is removed. [19802](https://github.com/StarRocks/starrocks/issues/19802)

Behavior Change

- The `storage_cache_ttl` parameter is deleted from the table creation syntax used for StarRocks shared-data clusters. Now the data in the local cache is evicted based on the LRU algorithm.
- The BE configuration items `disable_storage_page_cache` and `alter_tablet_worker_count` and the FE configuration item `lake_compaction_max_tasks` are changed from immutable parameters to mutable parameters.
- The default values of the BE configuration items `block_cache_checksum_enable` and `enable_new_load_on_memory` are changed from `true` to `false`.
- The default value of the FE configuration item `max_running_txn_num_per_db` is changed from `100` to `1000`.
- The default value of the FE configuration item `http_max_header_size` is changed from `8192` to `32768`.
- The default value of the FE configuration item `tablet_create_timeout_second` is changed from `1` to `10`.
- The default value of the FE configuration item `max_routine_load_task_num_per_be` is changed from `5` to `16`, and error information will be returned if a large number of Routine Load tasks are created.
- The FE configuration item `quorom_publish_wait_time_ms` is renamed as `quorum_publish_wait_time_ms`, and the FE configuration item `async_load_task_pool_size` is renamed as `max_broker_load_job_concurrency`.
- The BE configuration item `routine_load_thread_pool_size` is deprecated. Now the routine load thread pool size per BE node is controlled only by the FE configuration item `max_routine_load_task_num_per_be`.
- The BE configuration item `txn_commit_rpc_timeout_ms` and the system variable `tx_visible_wait_timeout` are deprecated. Now the `time_out` parameter is used to specify the transaction timeout duration.
- The FE configuration items `max_broker_concurrency` and `load_parallel_instance_num` are deprecated.
- The FE configuration item `max_routine_load_job_num` is deprecated. Now StarRocks dynamically infers the maximum number of Routine Load tasks supported by each individual BE node based on the `max_routine_load_task_num_per_be` parameter and provides suggestions on task failures.
- Two new Routine Load job properties, `task_consume_second` and `task_timeout_second`, are added to control the maximum amount of time to consume data and the timeout duration for individual load tasks within a Routine Load job, making job adjustment more flexible. If users do not specify these two properties in their Routine Load job, the FE configuration items `routine_load_task_consume_second` and `routine_load_task_timeout_second` prevail.
- Two new reserved keywords, COMPACTION and TEXT, are added.

3.1.0rc01

Release date: July 7, 2023

New Features
Shared-data cluster
* Added support for Primary Key tables, on which persistent indexes cannot be enabled.
* Supports the [AUTO_INCREMENT](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/auto_increment) column attribute, which enables a globally unique ID for each data row and thus simplifies data management.
* Supports [automatically creating partitions during loading and using partitioning expressions to define partitioning rules](https://docs.starrocks.io/en-us/3.1/table_design/automatic_partitioning), thereby making partition creation easier to use and more flexible.
* [Preview] Supports storing data on Azure Blob Storage.

Data Lake analytics
* Supports accessing Parquet-formatted Iceberg v2 tables.
* [Preview] Supports sinking data to Iceberg tables in Parquet format.
* Supports accessing data stored in Elasticsearch by using [Elasticsearch catalogs](https://docs.starrocks.io/en-us/3.1/data_source/catalog/elasticsearch_catalog). This simplifies the creation of Elasticsearch external tables.

Storage engine, data ingestion, and query
* Supports [random bucketing](https://docs.starrocks.io/en-us/3.1/table_design/Data_distribution#choose-bucketing-columns), which relieves the need to configure bucketing columns at table creation. In big data and high performance-demanding scenarios, we recommend that you continue using hash bucketing.
* Supports using the FILES keyword (actually a table value function) in [INSERT INTO](https://docs.starrocks.io/en-us/3.1/loading/InsertInto) to directly load the data of Parquet- or ORC-formatted data files stored in AWS S3.
* Supports [generated columns](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/generated_columns). With the generated column feature, StarRocks can automatically generate and store the values of column expressions and automatically rewrite queries to improve query performance.
* Supports loading data into columns of the [MAP](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-types/Map) and [STRUCT](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-types/STRUCT) data types, and supports nesting Fast Decimal values in ARRAY, MAP, and STRUCT.

SQL reference
* Struct functions: [struct (row)](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/struct-functions/row), [named_struct](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/struct-functions/named_struct)
* Map functions: [str_to_map](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/string-functions/str_to_map), [map_concat](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/map_concat), [map_from_arrays](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/map_from_arrays), [element_at](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/element_at), [distinct_map_keys](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/distinct_map_keys), [cardinality](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/cardinality)
* Higher-order Map functions: [map_filter](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/map_filter), [map_apply](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/map_apply), [transform_keys](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/transform_keys), [transform_values](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/map-functions/transform_values)
* Array functions: [array_agg](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/array_agg) supports ORDER BY, [array_generate](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/array_generate), [element_at](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/element_at), [cardinality](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/cardinality)
* Higher-order Array functions: [all_match](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/all_match), [any_match](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/array-functions/any_match)
* Aggregate functions: [min_by](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/aggregate-functions/min_by), [percentile_disc](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/aggregate-functions/percentile_disc)
* Table functions: [generate_series](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-functions/table-functions/generate_series)

Improvements
Shared-data cluster
* Optimized the data cache in StarRocks shared-data clusters. The optimized data cache allows for specifying the range of hot data. It can also prevent queries against cold data from occupying the local disk cache, thereby ensuring the performance of queries against hot data.

Materialized view
* Optimized the creation of an asynchronous materialized view:
* Supports random bucketing. If users do not specify bucketing columns, StarRocks adopts random bucketing by default.
* Supports using ORDER BY to specify a sort key.
* Supports specifying attributes such as colocate_group, storage_medium, and storage_cooldown_time.
* Supports using session variables. Users can configure these variables by using the properties("session.<variable_name>" = "<value>") syntax to flexibly adjust view refreshing strategies.
* Supports creating materialized views based on views. This makes materialized views easier to use in data modeling scenarios, because users can flexibly use views and materialized views based on their varying needs to implement layered modeling.

* Optimized query rewrite with asynchronous materialized views:
* Supports Stale Rewrite, which allows materialized views that are not refreshed within a specified time interval to be used for query rewrite regardless of whether the base tables of the materialized views are updated. Users can specify the time interval by using the mv_rewrite_staleness_second property at materialized view creation.
* Supports rewriting View Delta Join queries against materialized views that are created on Hive catalog tables (a primary key and a foreign key must be defined).
* Optimized the mechanism for rewriting queries that contain union operations, and supports rewriting queries that contain joins or functions such as COUNT DISTINCT and time_slice.

* Optimized the refreshing of asynchronous materialized views:
* Optimized the mechanism for refreshing materialized views that are created on Hive catalog tables. StarRocks now can perceive partition-level data changes, and refreshes only the partitions with data changes during each automatic refresh.
* Supports using the REFRESH MATERIALIZED VIEW WITH SYNC MODE syntax to synchronously invoke materialized view refresh tasks.

* Enhanced the use of asynchronous materialized views:
* Supports using ALTER MATERIALIZED VIEW {ACTIVE | INACTIVE} to enable or disable a materialized view. Materialized views that are disabled (in the INACTIVE state) cannot be refreshed or used for query rewrite, but can be directly queried.
* Supports using ALTER MATERIALIZED VIEW SWAP WITH to swap two materialized views. Users can create a new materialized view and then perform an atomic swap with an existing materialized view to implement schema changes on the existing materialized view.

* Optimized synchronous materialized views:
* Supports direct queries against synchronous materialized views using SQL hints [_SYNC_MV_], allowing for walking around issues that some queries cannot be properly rewritten in rare circumstances.
* Supports more expressions, such as CASE-WHEN, CAST, and mathematical operations, which make materialized views suitable for more business scenarios.

Data Lake analytics
* Optimized metadata caching and access for Iceberg to improve Iceberg data query performance.
* Optimized the data cache to further improve data lake analytics performance.
Storage engine, data ingestion, and query
* Supports partial updates in column mode. Users can enable the column mode when they perform partial updates on Primary Key tables by using the [UPDATE](https://docs.starrocks.io/en-us/3.1/sql-reference/sql-statements/data-manipulation/UPDATE) statement. The column mode is suitable for updating a small number of columns but a large number of rows, and can improve the updating performance by up to 10 times.
* Optimized the collection of statistics for the CBO. This reduces the impact of statistics collection on data ingestion and increases statistics collection performance.
* Optimized the merge algorithm to increase the overall performance by up to 2 times in permutation scenarios.
* Optimized the query logic to reduce dependency on database locks.

SQL reference
* Conditional functions case, coalesce, if, ifnull, and nullif support the ARRAY, MAP, STRUCT, and JSON data types.
* The following Array functions support nested types MAP, STRUCT, and ARRAY:
* array_agg
* array_contains, array_contains_all, array_contains_any
* array_slice, array_concat
* array_length, array_append, array_remove, array_position
* reverse, array_distinct, array_intersect, arrays_overlap
* array_sortby
* The following Array functions support the Fast Decimal data type:
* array_agg
* array_append, array_remove, array_position, array_contains
* array_length
* array_max, array_min, array_sum, array_avg
* arrays_overlap, array_difference
* array_slice, array_distinct, array_sort, reverse, array_intersect, array_concat
* array_sortby, array_contains_all, array_contains_any

Bug Fixes
Fixed the following issues:

* Requests to reconnect to Kafka for Routine Load jobs cannot be properly processed. [23477](https://github.com/StarRocks/starrocks/issues/23477)
* For SQL queries that involve multiple tables and contain a WHERE clause, if these SQL queries have the same semantics but the order of the tables in each SQL query is different, some of these SQL queries may fail to be rewritten to benefit from the related materialized views. [22875](https://github.com/StarRocks/starrocks/issues/22875)
* Duplicate records are returned for queries that contain a GROUP BY clause. [19640](https://github.com/StarRocks/starrocks/issues/19640)
* Invoking the lead() or lag() function may cause BE crashes. [22945](https://github.com/StarRocks/starrocks/issues/22945)
* Rewriting partial partition queries based on materialized views that are created on external catalog tables fail. [19011](https://github.com/StarRocks/starrocks/issues/19011)
* SQL statements that contain both a backward slash (\) and a semicolon (;) cannot be properly parsed. [16552](https://github.com/StarRocks/starrocks/issues/16552)
* A table cannot be truncated if a materialized view created on the table is removed. [19802](https://github.com/StarRocks/starrocks/issues/19802)

Behavior Change
* The storage_cache_ttl parameter is deleted from the table creation syntax used for StarRocks shared-data clusters. Now the data in the local cache is evicted based on the LRU algorithm.

3.0.9

Release date: January 2, 2024

New features

- Added the [percentile_disc](../sql-reference/sql-functions/aggregate-functions/percentile_disc.md) function. [36352](https://github.com/StarRocks/starrocks/pull/36352)
- Added a new metric `max_tablet_rowset_num` for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". [36539](https://github.com/StarRocks/starrocks/pull/36539)

Improvements

- A new value option `GROUP_CONCAT_LEGACY` is added to the session variable [sql_mode](../reference/System_variable.mdsql_mode) to provide compatibility with the implementation logic of the [group_concat](../sql-reference/sql-functions/string-functions/group_concat.md) function in versions earlier than v2.5. [36150](https://github.com/StarRocks/starrocks/pull/36150)
- When using JDK, the default GC algorithm is changed to G1. [37386](https://github.com/StarRocks/starrocks/pull/37386)
- The `be_tablets` view in the `information_schema` database provides a new field `INDEX_DISK`, which records the disk usage (measured in bytes) of persistent indexes [35615](https://github.com/StarRocks/starrocks/pull/35615)
- Queries on MySQL external tables and the external tables within JDBC catalogs support including keywords in the WHERE clause. [35917](https://github.com/StarRocks/starrocks/pull/35917)
- Supports updates onto the specified partitions of an automatically partitioned table. If the specified partitions do not exist, an error is returned. [34777](https://github.com/StarRocks/starrocks/pull/34777)
- The Primary Key table size returned by the [SHOW DATA](../sql-reference/sql-statements/data-manipulation/SHOW_DATA.md) statement includes the sizes of **.cols** files (these are files related to partial column updates and generated columns) and persistent index files. [34898](https://github.com/StarRocks/starrocks/pull/34898)
- Optimized the performance of persistent index update when compaction is performed on all rowsets of a Primary Key table, which reduces disk read I/O. [36819](https://github.com/StarRocks/starrocks/pull/36819)
- When the string on the right side of the LIKE operator within the WHERE clause does not include `%` or `_`, the LIKE operator is converted into the `=` operator. [37515](https://github.com/StarRocks/starrocks/pull/37515)
- Optimized the logic used to compute compaction scores for Primary Key tables, thereby aligning the compaction scores for Primary Key tables within a more consistent range with the other three table types. [36534](https://github.com/StarRocks/starrocks/pull/36534)
- The result returned by the [SHOW ROUTINE LOAD](../sql-reference/sql-statements/data-manipulation/SHOW_ROUTINE_LOAD.md) statement now includes the timestamps of consumption messages from each partition. [36222](https://github.com/StarRocks/starrocks/pull/36222)
- Optimized the performance of some Bitmap-related operations, including:
- Optimized nested loop joins. [340804](https://github.com/StarRocks/starrocks/pull/34804) [#35003](https://github.com/StarRocks/starrocks/pull/35003)
- Optimized the `bitmap_xor` function. [34069](https://github.com/StarRocks/starrocks/pull/34069)
- Supports Copy on Write to optimize Bitmap performance and reduce memory consumption. [34047](https://github.com/StarRocks/starrocks/pull/34047)

Compatibility Changes

Behavior Change
- Added the session variable `enable_materialized_view_for_insert`, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is `false`. [37505](https://github.com/StarRocks/starrocks/pull/37505)
- Changed the FE configuration item `enable_new_publish_mechanism` to a static parameter from a dynamic one. You must restart the FE after you modify the parameter settings. [35338](https://github.com/StarRocks/starrocks/pull/35338)
- The default retention period of trash files is changed to 1 day from the original 3 days. [37113](https://github.com/StarRocks/starrocks/pull/37113)

Parameters

Session variables

- Added session variable `cbo_decimal_cast_string_strict`, which controls how the CBO converts data from the DECIMAL type to the STRING type. If this variable is set to `true`, the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). If this variable is set to `false`, the logic built in versions earlier than v2.5.x prevails and the system processes all valid digits to generate a string. The default value is `true`. [34208](https://github.com/StarRocks/starrocks/pull/34208)
- Added session variables `transaction_read_only` and `tx_read_only` to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. [37249](https://github.com/StarRocks/starrocks/pull/37249)

FE configurations
- Added the FE configuration item `routine_load_unstable_threshold_second`. [36222](https://github.com/StarRocks/starrocks/pull/36222)
- Added the FE configuration item `http_worker_threads_num`, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is `0`. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. [37530](https://github.com/StarRocks/starrocks/pull/37530)
- Added the FE configuration item `default_mv_refresh_immediate`, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is `true`. [37093](https://github.com/StarRocks/starrocks/pull/37093)

BE configurations
- Added the BE configuration item `enable_stream_load_verbose_log`. The default value is `false`. With this parameter set to `true`, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. [36113](https://github.com/StarRocks/starrocks/pull/36113)
- Added the BE configuration item `pindex_major_compaction_limit_per_disk` to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is `2`. [36681](https://github.com/StarRocks/starrocks/pull/36681)
- Added BE configuration items to specify the timeout duration for connecting to object storage:
- `object_storage_connect_timeout_ms`: Timeout duration to establish socket connections with object storage. The default value is `-1`, which means to use the default timeout duration of the SDK configurations.
- `object_storage_request_timeout_ms`: Timeout duration to establish HTTP connections with object storage. The default value is `-1`, which means to use the default timeout duration of the SDK configurations.

Bug Fixes

Fixed the following issues:

- In some cases, BEs may crash when a Catalog is used to read ORC external tables. [27971](https://github.com/StarRocks/starrocks/pull/27971)
- The BEs crash if users create persistent indexes in the event of data corruption. [30841](https://github.com/StarRocks/starrocks/pull/30841)
- BEs occasionally crash after a Bitmap index is added. [26463](https://github.com/StarRocks/starrocks/pull/26463)
- Failures in replaying replica operations may cause FEs to crash. [32295](https://github.com/StarRocks/starrocks/pull/32295)
- Setting the FE parameter `recover_with_empty_tablet` to `true` may cause FEs to crash. [33071](https://github.com/StarRocks/starrocks/pull/33071)
- Queries fail during hash joins, causing BEs to crash. [32219](https://github.com/StarRocks/starrocks/pull/32219)
- In a StarRocks shared-nothing cluster, queries against Iceberg or Hive tables may cause BEs to crash. [34682](https://github.com/StarRocks/starrocks/pull/34682)
- The error "get_applied_rowsets failed, tablet updates is in error state: tablet:18849 actual row size changed after compaction" is returned for queries. [33246](https://github.com/StarRocks/starrocks/pull/33246)
- Running `show proc '/statistic'` may cause a deadlock. [34237](https://github.com/StarRocks/starrocks/pull/34237/files)
- The FE performance plunges after the FE configuration item `enable_collect_query_detail_info` is set to `true`. [35945](https://github.com/StarRocks/starrocks/pull/35945)
- Errors may be thrown if large amounts of data are loaded into a Primary Key table with persistent index enabled. [34352](https://github.com/StarRocks/starrocks/pull/34352)
- After StarRocks is upgraded from v2.4 or earlier to a later version, compaction scores may rise unexpectedly. [34618](https://github.com/StarRocks/starrocks/pull/34618)
- If `INFORMATION_SCHEMA` is queried by using the database driver MariaDB ODBC, the `CATALOG_NAME` column returned in the `schemata` view holds only `null` values. [34627](https://github.com/StarRocks/starrocks/pull/34627)
- FEs crash due to the abnormal data loaded and cannot restart. [34590](https://github.com/StarRocks/starrocks/pull/34590)
- If schema changes are being executed while a Stream Load job is in the **PREPARD** state, a portion of the source data to be loaded by the job is lost. [34381](https://github.com/StarRocks/starrocks/pull/34381)
- Including two or more slashes (`/`) at the end of the HDFS storage path causes the backup and restore of the data from HDFS to fail. [34601](https://github.com/StarRocks/starrocks/pull/34601)
- The `partition_live_number` property added by using the ALTER TABLE statement does not take effect. [34842](https://github.com/StarRocks/starrocks/pull/34842)
- The [array_distinct](https://docs.starrocks.io/docs/sql-reference/sql-functions/array-functions/array_distinct/) function occasionally causes the BEs to crash. [#36377](https://github.com/StarRocks/starrocks/pull/36377)
- Deadlocks may occur when users refresh materialized views. [35736](https://github.com/StarRocks/starrocks/pull/35736)
- Global Runtime Filter may cause BEs to crash in certain scenarios. [35776](https://github.com/StarRocks/starrocks/pull/35776)
- In some cases, `bitmap_to_string` may return incorrect result due to data type overflow. [37405](https://github.com/StarRocks/starrocks/pull/37405)

3.0.5

Release date: August 16, 2023

New Features
- Supports aggregate functions [COVAR_SAMP](https://docs.starrocks.io/en-us/3.0/sql-reference/sql-functions/aggregate-functions/covar_samp), [COVAR_POP](https://docs.starrocks.io/en-us/3.0/sql-reference/sql-functions/aggregate-functions/covar_pop), and [CORR](https://docs.starrocks.io/en-us/3.0/sql-reference/sql-functions/aggregate-functions/corr).
- Supports the following [window functions](https://docs.starrocks.io/en-us/3.0/sql-reference/sql-functions/Window_function): COVAR_SAMP, COVAR_POP, CORR, VARIANCE, VAR_SAMP, STD, and STDDEV_SAMP.

Improvements
- Added more prompts in the error message xxx too many versions xxx. [28397](https://github.com/StarRocks/starrocks/pull/28397)
- Dynamic partitioning further supports the partitioning unit to be year. [28386](https://github.com/StarRocks/starrocks/pull/28386)
- The partitioning field is case-insensitive when expression partitioning is used at table creation and [INSERT OVERWRITE is used to overwrite data in a specific partition](https://docs.starrocks.io/en-us/3.0/table_design/expression_partitioning#load-data-into-partitions). [28309](https://github.com/StarRocks/starrocks/pull/28309)

Bug Fixes
Fixed the following issues:

- Incorrect table-level scan statistics in FE cause inaccurate metrics for table queries and loading. [27779](https://github.com/StarRocks/starrocks/pull/27779)
- The query result is not stable if the sort key is modified for a partitioned table. [27850](https://github.com/StarRocks/starrocks/pull/27850)
- The version number for a tablet is inconsistent between the BE and FE after data is restored. [26518](https://github.com/StarRocks/starrocks/pull/26518/files)
- If the bucket number is not specified when users create a Colocation table, the number will be inferred as 0, which causes failures in adding new partitions. [27086](https://github.com/StarRocks/starrocks/pull/27086)
- When the SELECT result set of INSERT INTO SELECT is empty, the load job status returned by SHOW LOAD is CANCELED. [26913](https://github.com/StarRocks/starrocks/pull/26913)
- BEs may crash when the input values of the sub_bitmap function are not of the BITMAP type. [27982](https://github.com/StarRocks/starrocks/pull/27982)
- BEs may crash when the AUTO_INCREMENT column is being updated. [27199](https://github.com/StarRocks/starrocks/pull/27199)
- Outer join and Anti join rewrite errors for materialized views. [28028](https://github.com/StarRocks/starrocks/pull/28028)
- Inaccurate estimation of average row size causes Primary Key partial updates to occupy excessively large memory. [27485](https://github.com/StarRocks/starrocks/pull/27485)
- Activating an inactive materialized view may cause a FE to crash. [27959](https://github.com/StarRocks/starrocks/pull/27959)
- Queries can not be rewritten to materialized views created based on external tables in a Hudi catalog. [28023](https://github.com/StarRocks/starrocks/pull/28023)
- The data of a Hive table can still be queried even after the table is dropped and the metadata cache is manually updated. [28223](https://github.com/StarRocks/starrocks/pull/28223)
- Manually refreshing an asynchronous materialized view via a synchronous call results in multiple INSERT OVERWRITE records in the information_schema.task_runs table. [28060](https://github.com/StarRocks/starrocks/pull/28060)
- FE memory leak caused by blocked LabelCleaner threads. [28311](https://github.com/StarRocks/starrocks/pull/28311)

3.0.4

Release date: July 18, 2023

New Feature
- Queries can be rewritten even when the queries contain a different type of join than the materialized view. [25099](https://github.com/StarRocks/starrocks/pull/25099)

Improvements
- If the queried fields are not included in the output columns of a materialized view but are included in the predicate of the materialized view, the query can still be rewritten to benefit from the materialized view. [23028](https://github.com/StarRocks/starrocks/issues/23028)
- [When the SQL dialect (sql_dialect) is set to trino](https://docs.starrocks.io/en-us/3.0/reference/System_variable), table aliases are not case-sensitive, and the json_array function is supported in queries. [#26094](https://github.com/StarRocks/starrocks/pull/26094) [#25282](https://github.com/StarRocks/starrocks/pull/25282)
- Added a new field table_id to the table Information_schema.tables_config. You can join the table tables_config with the table be_tablets on the column table_id in the database Information_schema to query the names of the database and table to which a tablet belongs. [24061](https://github.com/StarRocks/starrocks/pull/24061)

Bug Fixes
Fixed the following issues:
- If a query that contains the sum aggregate function is rewritten to directly obtain query results from a single-table materialized view, the values in sum() field may be incorrect due to type inference issues. [25512](https://github.com/StarRocks/starrocks/pull/25512)
- An error occurs when SHOW PROC is used to view information about tablets in a StarRocks shared-data cluster.
- The INSERT operation hangs when the length of CHAR data in a STRUCT to be inserted exceeds the maximum length. [25942](https://github.com/StarRocks/starrocks/pull/25942)
- Some data rows queried fail to be returned for INSERT INTO SELECT with FULL JOIN. [26603](https://github.com/StarRocks/starrocks/pull/26603)
- An error ERROR xxx: Unknown table property xxx occurs when the ALTER TABLE statement is used to modify the table's property default.storage_medium. [25870](https://github.com/StarRocks/starrocks/issues/25870)
- An error occurs when Broker Load is used to load empty files. [26212](https://github.com/StarRocks/starrocks/pull/26212)
- Decommissioning a BE sometimes hangs. [26509](https://github.com/StarRocks/starrocks/pull/26509)

3.0.3

Release date: June 28, 2023

Improvements
- Metadata synchronization of StarRocks external tables has been changed to occur during data loading. [24739](https://github.com/StarRocks/starrocks/pull/24739)
- Users can specify partitions when they run INSERT OVERWRITE on tables whose partitions are automatically created. For more information, see [Automatic partitioning](https://docs.starrocks.io/en-us/3.0/table_design/automatic_partitioning). [#25005](https://github.com/StarRocks/starrocks/pull/25005)
- Optimized the error message reported when partitions are added to a non-partitioned table. [25266](https://github.com/StarRocks/starrocks/pull/25266)
Bug Fixes
Fixed the following issues:
- The min/max filter gets the wrong Parquet field when the Parquet file contains complex data types. [23976](https://github.com/StarRocks/starrocks/pull/23976)
- Load tasks are still queuing even when the related database or table has been dropped. [24801](https://github.com/StarRocks/starrocks/pull/24801)
There is a low probability that an FE restart may cause BEs to crash. [25037](https://github.com/StarRocks/starrocks/pull/25037)
- Load and query jobs occasionally freeze when the variable enable_profile is set to true. [25060](https://github.com/StarRocks/starrocks/pull/25060)
- Inaccurate error message is displayed when INSERT OVERWRITE is executed on a cluster with less than three alive BEs. [25314](https://github.com/StarRocks/starrocks/pull/25314)

Page 7 of 20

Releases

Has known vulnerabilities

Previous Next

Starrocks

Page 7 of 20

3.1.0

3.1.0rc01

3.0.9

3.0.5

3.0.4

3.0.3

Page 7 of 20

Links

Releases