New Features
- PR 722 Add bzip2 decompression support to `read_csv()`
- PR 693 add ZLIB-based GZIP/ZIP support to `read_csv_strings()`
- PR 411 added null support to gdf_order_by (new API) and cudf_table::sort
- PR 525 Added GitHub Issue templates for bugs, documentation, new features, and questions
- PR 501 CSV Reader: Add support for user-specified decimal point and thousands separator to read_csv_strings()
- PR 455 CSV Reader: Add support for user-specified decimal point and thousands separator to read_csv()
- PR 439 add `DataFrame.drop` method similar to pandas
- PR 356 add `DataFrame.transpose` method and `DataFrame.T` property similar to pandas
- PR 505 CSV Reader: Add support for user-specified boolean values
- PR 350 Implemented Series replace function
- PR 490 Added print_env.sh script to gather relevant environment details when reporting cuDF issues
- PR 474 add ZLIB-based GZIP/ZIP support to `read_csv()`
- PR 547 Added melt similar to `pandas.melt()`
- PR 491 Add CI test script to check for updates to CHANGELOG.md in PRs
- PR 550 Add CI test script to check for style issues in PRs
- PR 558 Add CI scripts for cpu-based conda and gpu-based test builds
- PR 524 Add Boolean Indexing
- PR 564 Update python `sort_values` method to use updated libcudf `gdf_order_by` API
- PR 509 CSV Reader: Input CSV file can now be passed in as a text or a binary buffer
- PR 607 Add `__iter__` and iteritems to DataFrame class
- PR 643 added a new api gdf_replace_nulls that allows a user to replace nulls in a column
Improvements
- PR 426 Removed sort-based groupby and refactored existing groupby APIs. Also improves C++/CUDA compile time.
- PR 461 Add `CUDF_HOME` variable in README.md to replace relative pathing.
- PR 472 RMM: Created centralized rmm::device_vector alias and rmm::exec_policy
- PR 500 Improved the concurrent hash map class to support partitioned (multi-pass) hash table building.
- PR 454 Improve CSV reader docs and examples
- PR 465 Added templated C++ API for RMM to avoid explicit cast to `void**`
- PR 513 `.gitignore` tweaks
- PR 521 Add `assert_eq` function for testing
- PR 502 Simplify Dockerfile for local dev, eliminate old conda/pip envs
- PR 549 Adds `-rdynamic` compiler flag to nvcc for Debug builds
- PR 472 RMM: Created centralized rmm::device_vector alias and rmm::exec_policy
- PR 577 Added external C++ API for scatter/gather functions
- PR 500 Improved the concurrent hash map class to support partitioned (multi-pass) hash table building
- PR 583 Updated `gdf_size_type` to `int`
- PR 500 Improved the concurrent hash map class to support partitioned (multi-pass) hash table building
- PR 617 Added .dockerignore file. Prevents adding stale cmake cache files to the docker container
- PR 658 Reduced `JOIN_TEST` time by isolating overflow test of hash table size computation
- PR 664 Added Debuging instructions to README
- PR 651 Remove noqa marks in `__init__.py` files
- PR 671 CSV Reader: uncompressed buffer input can be parsed without explicitly specifying compression as None
- PR 684 Make RMM a submodule
- PR 718 Ensure sum, product, min, max methods pandas compatibility on empty datasets
- PR 720 Refactored Index classes to make them more Pandas-like, added CategoricalIndex
- PR 749 Improve to_arrow and from_arrow Pandas compatibility
- PR 766 Remove TravisCI references, remove unused variables from CMake, fix ARROW_VERSION in Cmake
- PR 773 Add build-args back to Dockerfile and handle dependencies based on environment yml file
- PR 781 Move thirdparty submodules to root and symlink in /cpp
- PR 843 Fix broken cudf/python API examples, add new methods to the API index
Bug Fixes
- PR 569 CSV Reader: Fix days being off-by-one when parsing some dates
- PR 531 CSV Reader: Fix incorrect parsing of quoted numbers
- PR 465 Added templated C++ API for RMM to avoid explicit cast to `void**`
- PR 473 Added missing <random> include
- PR 478 CSV Reader: Add api support for auto column detection, header, mangle_dupe_cols, usecols
- PR 495 Updated README to correct where cffi pytest should be executed
- PR 501 Fix the intermittent segfault caused by the `thousands` and `compression` parameters in the csv reader
- PR 502 Simplify Dockerfile for local dev, eliminate old conda/pip envs
- PR 512 fix bug for `on` parameter in `DataFrame.merge` to allow for None or single column name
- PR 511 Updated python/cudf/bindings/join.pyx to fix cudf merge printing out dtypes
- PR 513 `.gitignore` tweaks
- PR 521 Add `assert_eq` function for testing
- PR 537 Fix CMAKE_CUDA_STANDARD_REQURIED typo in CMakeLists.txt
- PR 447 Fix silent failure in initializing DataFrame from generator
- PR 545 Temporarily disable csv reader thousands test to prevent segfault (test re-enabled in PR 501)
- PR 559 Fix Assertion error while using `applymap` to change the output dtype
- PR 575 Update `print_env.sh` script to better handle missing commands
- PR 612 Prevent an exception from occuring with true division on integer series.
- PR 630 Fix deprecation warning for `pd.core.common.is_categorical_dtype`
- PR 622 Fix Series.append() behaviour when appending values with different numeric dtype
- PR 603 Fix error while creating an empty column using None.
- PR 673 Fix array of strings not being caught in from_pandas
- PR 644 Fix return type and column support of dataframe.quantile()
- PR 634 Fix create `DataFrame.from_pandas()` with numeric column names
- PR 654 Add resolution check for GDF_TIMESTAMP in Join
- PR 648 Enforce one-to-one copy required when using `numba>=0.42.0`
- PR 645 Fix cmake build type handling not setting debug options when CMAKE_BUILD_TYPE=="Debug"
- PR 669 Fix GIL deadlock when launching multiple python threads that make Cython calls
- PR 665 Reworked the hash map to add a way to report the destination partition for a key
- PR 670 CMAKE: Fix env include path taking precedence over libcudf source headers
- PR 674 Check for gdf supported column types
- PR 677 Fix 'gdf_csv_test_Dates' gtest failure due to missing nrows parameter
- PR 604 Fix the parsing errors while reading a csv file using `sep` instead of `delimiter`.
- PR 686 Fix converting nulls to NaT values when converting Series to Pandas/Numpy
- PR 689 CSV Reader: Fix behavior with skiprows+header to match pandas implementation
- PR 691 Fixes Join on empty input DFs
- PR 706 CSV Reader: Fix broken dtype inference when whitespace is in data
- PR 717 CSV reader: fix behavior when parsing a csv file with no data rows
- PR 724 CSV Reader: fix build issue due to parameter type mismatch in a std::max call
- PR 734 Prevents reading undefined memory in gpu_expand_mask_bits numba kernel
- PR 747 CSV Reader: fix an issue where CUDA allocations fail with some large input files
- PR 750 Fix race condition for handling NVStrings in CMake
- PR 719 Fix merge column ordering
- PR 770 Fix issue where RMM submodule pointed to wrong branch and pin other to correct branches
- PR 778 Fix hard coded ABI off setting
- PR 784 Update RMM submodule commit-ish and pip paths
- PR 794 Update `rmm::exec_policy` usage to fix segmentation faults when used as temprory allocator.
- PR 800 Point git submodules to branches of forks instead of exact commits