Highlights
Jinja
Use Jinja to write data using templates.
yaml
wrangles:
- create.jinja:
output: description
template:
string: <directly in the recipe> or
file: <from a file> or
column: <from a column - allows different templates per row>
By default it uses the column names as keys. You can also include input to set a column, which then expects a dictionary within that column.
There's also a jinja connector that allows running an equivalent before or after a recipe.
Write filtering
All write connectors support columns, not_columns and where
yaml
write:
- file:
name: file.csv
columns: <a list, wildcards, or regex>
not_columns: <a list, wildcards, or regex>
where: category = 'something' SQL syntax
New custom function syntax
A simple custom function syntax that applies per row, without having to worry about looping through the dataframe, or using apply etc.
yaml
wrangles:
- custom.add_columns:
output: column3
py
def add_columns(column1, column2):
"""
Parameter names must correspond to column names
"""
return column1 + column2
CKAN
Supports read, write and upload/download (for run - e.g. download multiple files before the recipe starts).
yaml
read:
- ckan:
host: https://ckan.example.com
dataset: data
file: file.csv
api_key: ${API_KEY}
Other
- Added **convert.fraction_to_decimal**.
- Added **use_labels** option for **extract.custom** to group extracted entities.
- Added **format.pad** to pad text to a fixed length.
- Added **replace**, equivalent to a single standardize using regex.
- Added **extract.regex** allowing a single extract using regex.
- Added option to write data using a connector when using **log**.
- Improved consistency of behaviour when specifying a single or multiple input/outputs.
- Wildcard columns now support regex by specifying "regex: " as a prefix.
- Added **where** option for **filter** allowing specifying a SQL-like clause.
- Improved and expanded tests.
- General code tidying and refactoring.