`loc` and `iloc` indexers improvement
We improved `loc` and `iloc` indexers. Now, `loc` can support scalar values as indexers (1172).
python
>>> import databricks.koalas as ks
>>>
>>> df = ks.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df.loc['sidewinder']
max_speed 7
shield 8
Name: sidewinder, dtype: int64
>>> df.loc['sidewinder', 'max_speed']
7
In addition, Series derived from a different Frame can be used as indexers (1155).
python
>>> import databricks.koalas as ks
>>>
>>> ks.options.compute.ops_on_diff_frames = True
>>>
>>> df1 = ks.DataFrame({'A': [0, 1, 2, 3, 4], 'B': [100, 200, 300, 400, 500]},
... index=[20, 10, 30, 0, 50])
>>> df2 = ks.DataFrame({'A': [0, -1, -2, -3, -4], 'B': [-100, -200, -300, -400, -500]},
... index=[20, 10, 30, 0, 50])
>>> df1.A.loc[df2.A > -3].sort_index()
10 1
20 0
30 2
Lastly, now `loc` uses its natural order according to index identically with pandas' when using the slice (1159, 1174, 1179). See the example below.
python
>>> df = ks.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df.loc['cobra':'viper', 'max_speed']
cobra 1
viper 4
Name: max_speed, dtype: int64
Other new features and improvements
We added the following new features:
koalas.Series:
- `get` (1153)
koalas.Index
- `drop` (1117)
- `len` (1161)
- `set_names` (1134)
- `argmin` (1162)
- `argmax` (1162)
koalas.MultiIndex:
- `from_product` (1144)
- `drop` (1117)
- `len` (1161)
- `set_names` (1134)
Other improvements
- Add support `from_pandas` for Index/MultiIndex. (1170)
- Add a hidden column `__natural_order__`. (1146)
- Introduce `_LocIndexerLike` and consolidate some logic. (1149)
- Refactor `LocIndexerLike.__getitem__`. (1152)
- Remove sort in `GroupBy._reduce_for_stat_function`. (1147)
- Randomize index in tests and fix some window-like functions. (1151)
- Explicitly don't support `Index.duplicated` (1131)
- Fix `DataFrame._repr_html_()`. (1177)