Koalas

Latest version: v1.8.2

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Page 9 of 9

0.18.0

Multi-index columns support

We continue improving multi-index columns support (793, 776). We made the following APIs support multi-index columns:

- `applymap` (793)
- `shift` (793)
- `diff` (793)
- `fillna` (793)
- `rank` (793)

Also, we can set tuple or None name for Series and Index. (776)

python
>>> import databricks.koalas as ks
>>> kser = ks.Series([1, 2, 3])
>>> kser.name = ('a', 'b')
>>> kser
0 1
1 2
2 3
Name: (a, b), dtype: int64

Plots

We also continue adding plot APIs as follows:

For Series:

- `plot.kde()` (767)

For DataFrame:

- `plot.hist()` (780)

Options

In addition, we added the support for namespace-access in options (785).

python
>>> import databricks.koalas as ks
>>> ks.options.display.max_rows
1000
>>> ks.options.display.max_rows = 10
>>> ks.options.display.max_rows
10

See also [User Guide](https://koalas.readthedocs.io/en/latest/user_guide/options.html) of our project docs.

Other new features and improvements

We added the following new features:

koalas.DataFrame:

- `aggregate` (796)
- `agg` (796)
- `items` (787)

koalas.indexes.Index/MultiIndex

- `is_boolean` (795)
- `is_categorical` (795)
- `is_floating` (795)
- `is_integer` (795)
- `is_interval` (795)
- `is_numeric` (795)
- `is_object` (795)

Along with the following improvements:

- Add `index_col` for `read_json` (797)
- Add index_col for spark IO reads (769, 775)
- Add "sep" parameter for read_csv (777)
- Add axis parameter to dataframe.diff (774)
- Add read_json and let to_json use spark.write.json (753)
- Use spark.write.csv in to_csv of Series and DataFrame (749)
- Handle TimestampType separately when convert to pandas' dtype. (798)
- Fix `spark_df` when `set_index(.., drop=False)`. (792)

Backward compatibility

- We removed some parameters in `DataFrame.to_csv` and `DataFrame.to_json` to allow distributed writing (749, 753)

0.17.0

Options

We started using options to configure the Koalas' behavior. Now we have the following options:

- `display.max_rows` (714, 742)
- `compute.max_rows` (721, 736)
- `compute.shortcut_limit` (717)
- `compute.ops_on_diff_frames` (725)
- `compute.default_index_type` (723)
- `plotting.max_rows` (728)
- `plotting.sample_ratio` (737)

We can also see the list and their descriptions in the [User Guide](https://koalas.readthedocs.io/en/latest/user_guide/options.html) of our project docs.

Plots

We continue adding plot APIs as follows:

For Series:

- `plot.area()` (704)

For DataFrame:

- `plot.line()` (686)
- `plot.bar()` (695)
- `plot.barh()` (698)
- `plot.pie()` (703)
- `plot.area()` (696)
- `plot.scatter()` (719)

Multi-index columns support

We also continue improving multi-index columns support. We made the following APIs support multi-index columns:

- `koalas.concat()` (680)
- `koalas.get_dummies()` (695)
- `DataFrame.pivot_table()` (635)

Other new features and improvements

We added the following new features:

koalas:

- `read_sql_table()` (741)
- `read_sql_query()` (741)
- `read_sql()` (741)

koalas.DataFrame:

- `style` (712)

Along with the following improvements:

- `GroupBy.apply` should return Koalas DataFrame instead of pandas DataFrame (731)
- Fix `rpow` and `rfloordiv` to use proper operators in Series (735)
- Fix `rpow` and `rfloordiv` to use proper operators in DataFrame (740)
- Add schema inference support at DataFrame.transform (732)
- Add `Option` class to support type check and value check in options (739)
- Added missing tests (687, 692, 694, 709, 711, 730, 729, 733, 734)

Backward compatibility

- We renamed two of the default index names from `one-by-one` and `distributed-one-by-one ` to `sequence` and `distributed-sequence` respectively. (679)
- We moved the configuration for enabling operations on different DataFrames from the environment variable to the option. (725)
- We moved the configuration for the default index from the environment variable to the option. (723)

0.16.0

Firstly, we introduced new mode to enable operations on different DataFrames (633). This mode can be enabled by setting `OPS_ON_DIFF_FRAMES` environment variable is set to `true` as below:

python
>>> import databricks.koalas as ks
>>>
>>> kdf1 = ks.range(5)
>>> kdf2 = ks.DataFrame({'id': [5, 4, 3]})
>>> (kdf1 - kdf2).sort_index()
id

0.15.1

Apache Arrow 0.15.0 did not work well with PySpark 2.4 so it was disabled in the previous version.
With Arrow 0.15.1, now it works in Koalas (902).

Expanding and Rolling

We also added `expanding()` and `rolling()` APIs in all `groupby()`, Series and Frame (985, 991, 990, 1015, 996, 1034, 1037)

- `min`
- `max`
- `sum`
- `mean`
- `std`
- `var`

Multi-index columns support

We continue improving multi-index columns support. We made the following APIs support multi-index columns:

- `median` (995)
- `at` (1049)

Documentation

We added "Best Practices" section in the documentation (1041) so that Koalas users can read and follow. Please see https://koalas.readthedocs.io/en/latest/user_guide/best_practices.html

Other new features and improvements

We added the following new features:

koalas.DataFrame:

- `quantile` (984)
- `explain` (1042)

koalas.Series:

- `between` (997)
- `update` (923)
- `mask` (1017)

koalas.MultiIndex:

- `from_tuples` (970)
- `from_arrays` (1001)

Along with the following improvements:

- Introduce column_scols in InternalFrame substitude for data_columns. (956)
- Fix different index level assignment when 'compute.ops_on_diff_frames' is enabled (1045)
- Fix Dataframe.melt function & Add doctest case for melt function (987)
- Enable creating Index from list like 'Index([1, 2, 3])' (986)
- Fix combine_frames to handle where the right hand side arguments are modified Series (1020)
- `setup.py` should support Python 2 to show a proper error message. (1027)
- Remove `Series.schema`. (993)

0.15

Apache Arrow 0.15.0 was released on the 5th of October, 2019, which Koalas depends on to execute Pandas UDF, but the Spark community reports [an issue](https://issues.apache.org/jira/browse/SPARK-29367) with PyArrow 0.15.

We decided to set an upper bound for pyarrow version to avoid such issues until we are sure that Koalas works fine with it.

- Set an upper bound for pyarrow version. (918)

Multi-index columns support

We continue improving multi-index columns support. We made the following APIs support multi-index columns:

- `pivot_table` (908)
- `melt` (920)

Other new features and improvements

We added the following new features:

koalas.DataFrame:

- `xs` (892)

koalas.Series:

- `drop_duplicates` (896)
- `replace` (903)

koalas.GroupBy:

- `shift` (910)

Along with the following improvements:

- Implement nested renaming for groupby agg (904)
- Add 'index_col' parameter to DataFrame.to_spark (906)
- Add more options to `read_csv` (916)
- Add NamedAgg (911)
- Enable DataFrame setting value as list of labels (905)

0.000000

Page 9 of 9

Releases

Has known vulnerabilities

Koalas

Page 9 of 9

0.18.0

0.17.0

0.16.0

0.15.1

0.15

0.000000

Page 9 of 9

Links

Releases