Multi-index columns support
We continue improving multi-index columns support (793, 776). We made the following APIs support multi-index columns:
- `applymap` (793)
- `shift` (793)
- `diff` (793)
- `fillna` (793)
- `rank` (793)
Also, we can set tuple or None name for Series and Index. (776)
python
>>> import databricks.koalas as ks
>>> kser = ks.Series([1, 2, 3])
>>> kser.name = ('a', 'b')
>>> kser
0 1
1 2
2 3
Name: (a, b), dtype: int64
Plots
We also continue adding plot APIs as follows:
For Series:
- `plot.kde()` (767)
For DataFrame:
- `plot.hist()` (780)
Options
In addition, we added the support for namespace-access in options (785).
python
>>> import databricks.koalas as ks
>>> ks.options.display.max_rows
1000
>>> ks.options.display.max_rows = 10
>>> ks.options.display.max_rows
10
See also [User Guide](https://koalas.readthedocs.io/en/latest/user_guide/options.html) of our project docs.
Other new features and improvements
We added the following new features:
koalas.DataFrame:
- `aggregate` (796)
- `agg` (796)
- `items` (787)
koalas.indexes.Index/MultiIndex
- `is_boolean` (795)
- `is_categorical` (795)
- `is_floating` (795)
- `is_integer` (795)
- `is_interval` (795)
- `is_numeric` (795)
- `is_object` (795)
Along with the following improvements:
- Add `index_col` for `read_json` (797)
- Add index_col for spark IO reads (769, 775)
- Add "sep" parameter for read_csv (777)
- Add axis parameter to dataframe.diff (774)
- Add read_json and let to_json use spark.write.json (753)
- Use spark.write.csv in to_csv of Series and DataFrame (749)
- Handle TimestampType separately when convert to pandas' dtype. (798)
- Fix `spark_df` when `set_index(.., drop=False)`. (792)
Backward compatibility
- We removed some parameters in `DataFrame.to_csv` and `DataFrame.to_json` to allow distributed writing (749, 753)