-------------------------
This is the first major release of csvkit in a very long time. The entire backend has been rewritten to leverage the `agate <https://agate.rtfd.io>`_ data analysis library, which was itself inspired by csvkit. The new backend provides better type detection accuracy, as well as some new features.
Because of the long and complex cycle behind this release, the list of changes should not be considered exhaustive. In particular, the output format of some tools may have changed in small ways. Any existing data pipelines using csvkit should be tested as part of the upgrade.
Much of the credit for this release goes to `James McKinney <https://github.com/jpmckinney>`_, who has almost single-handedly kept the csvkit fire burning for a year. Thanks, James!
Backwards-incompatible changes:
- :doc:`/scripts/csvjoin` renames duplicate columns with integer suffixes to prevent collisions in output.
- :doc:`/scripts/csvsql` generates ``DateTime`` columns instead of ``Time`` columns.
- :doc:`/scripts/csvsql` generates ``Decimal`` columns instead of ``Integer``, ``BigInteger``, and ``Float`` columns.
- :doc:`/scripts/csvsql` no longer generates max-length constraints for text columns.
- The ``--doublequote`` long flag is gone, and the ``-b`` short flag is an alias for ``--no-doublequote``.
- When using the ``--columns`` or ``--not-columns`` options, you must not have spaces around the comma-separated values, unless the column names contain spaces.
- When sorting, null values are greater than other values instead of less than.
- ``CSVKitReader``, ``CSVKitWriter``, ``CSVKitDictReader``, and ``CSVKitDictWriter`` have been removed. Use ``agate.csv.reader``, ``agate.csv.writer``, ``agate.csv.DictReader`` and ``agate.csv.DictWriter``.
- Drop Python 2.6 support (end-of-life was October 29, 2013).
- Drop support for older versions of PyPy.
- If ``--no-header-row`` is set, the output has column names ``a``, ``b``, ``c``, etc. instead of ``column1``, ``column2``, ``column3``, etc.
- csvlook renders a simpler, markdown-compatible table.
Improvements:
- csvkit is tested against Python 3.6. (702)
- ``import csvkit as csv`` defers to agate readers/writers.
- :doc:`/scripts/csvgrep` supports ``--no-header-row``.
- :doc:`/scripts/csvjoin` supports ``--no-header-row``.
- :doc:`/scripts/csvjson` streams input and output if the ``--stream`` and ``--no-inference`` flags are set.
- :doc:`/scripts/csvjson` supports ``--snifflimit`` and ``--no-inference``.
- :doc:`/scripts/csvlook` adds ``--max-rows``, ``--max-columns`` and ``--max-column-width`` options.
- :doc:`/scripts/csvlook` supports ``--snifflimit`` and ``--no-inference``.
- :doc:`/scripts/csvpy` supports ``--agate`` to read a CSV file into an agate table.
- ``csvsql`` supports custom `SQLAlchemy dialects <https://docs.sqlalchemy.org/en/latest/dialects/>`_.
- :doc:`/scripts/csvstat` supports ``--names``.
- :doc:`/scripts/in2csv` CSV-to-CSV conversion streams input and output if the ``--no-inference`` flag is set.
- :doc:`/scripts/in2csv` CSV-to-CSV conversion uses ``agate.Table``.
- :doc:`/scripts/in2csv` GeoJSON conversion adds columns for geometry type, longitude and latitude.
- Documentation: Update tool usage, remove shell prompts, document connection string, correct typos.
Fixes:
- Fixed numerous instances of open files not being closed before utilities exit.
- Change ``-b``, ``--doublequote`` to ``--no-doublequote``, as doublequote is True by default.
- :doc:`/scripts/in2csv` DBF conversion works with Python 3.
- :doc:`/scripts/in2csv` correctly guesses format when file has an uppercase extension.
- :doc:`/scripts/in2csv` correctly interprets ``--no-inference``.
- :doc:`/scripts/in2csv` again supports nested JSON objects (fixes regression).
- :doc:`/scripts/in2csv` with ``--format geojson`` prints a JSON object instead of ``OrderedDict([(...)])``.
- :doc:`/scripts/csvclean` with standard input works on Windows.
- :doc:`/scripts/csvgrep` returns the input file's line numbers if the ``--linenumbers`` flag is set.
- :doc:`/scripts/csvgrep` can match multiline values.
- :doc:`/scripts/csvgrep` correctly operates on ragged rows.
- :doc:`/scripts/csvsql` correctly escapes ``% characters in SQL queries.
- :doc:`/scripts/csvsql` adds standard input only if explicitly requested.
- :doc:`/scripts/csvstack` supports stacking a single file.
- :doc:`/scripts/csvstat` always reports frequencies.
- The ``any_match`` argument of ``FilteringCSVReader`` works correctly.
- All tools handle empty files without error.