Wrangles

Latest version: v1.12.0

Safety actively analyzes 682244 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 5

1.3.0

Highlights
Row-Based Custom Function Improvements
Row-based custom functions now allow kwargs, passing recipe parameters and using input to limit the columns available.
yml
wrangles:
- custom.my_function:
if input is specified, only columns set here
will be available to the function. Otherwise
all columns will be available
input: column*
output: output column
my_parameter: my_value


py
def my_function(column1, my_parameter, **kwargs):
kwargs will contain any non-explicitly referenced columns or parameters
my_parameter contains my_value from the recipe above.
column1 contains the value from the column named column1


Pandas functions
Additional Pandas functions are available natively from a recipe.
- Copy
- Drop
- Transpose
- Round
- Reindex

Other Changes
- Allow setting default values for select.list_element and select.dictionary_element.
- Allow multiple input columns for extract.address.
- Updated API requests to use the latest endpoints.
- Extract.custom. Move use_labels logic to the backend API call.
- Better error messages and schema validation when input/output lengths are inconsistent or invalid.
- Bugfix: Extract.attributes - allow any attribute type permitted by the backend + improved schema.
- Bugfix: fabric version caused issues when installing.
- Bugfix: MongoDB connector - ensure credentials using non-url safe characters are encoded correctly.
- Bugfix: Correct datatype for salesforce connector schema.
- Bugfix: Preserve column ordering if using not_columns for a write.

1.2.1

- Bugfix: *Extract.custom* - improved behaviour when using use_labels and first_element.
- Cleaner stack trace when errors are raised.
- Custom functions for run/on_failure can use the parameter *error* to get the Exception.

1.2.0

- Allow referencing columns with *create.jinja* by replacing spaces with underscores.
- Added *huggingface* wrangle to use models from huggingface.
- Added Akeneo connector.
- Don't include 'unlabeled' key if it is empty when using labels for *extract.custom*.
- Added *first_element* boolean to only get 1 result from *extract.custom*.
- Various schema updates.
- Bugfix: Fixed an issue where using filter could affect wrangles downstream that use the dataframe index.
- Bugfix: Fix bug where translate would fail if source_language was omitted rather than using AUTO language detection.

1.1.0

Highlights

Jinja
Use Jinja to write data using templates.
yaml
wrangles:
- create.jinja:
output: description
template:
string: <directly in the recipe> or
file: <from a file> or
column: <from a column - allows different templates per row>

By default it uses the column names as keys. You can also include input to set a column, which then expects a dictionary within that column.
There's also a jinja connector that allows running an equivalent before or after a recipe.

Write filtering
All write connectors support columns, not_columns and where
yaml
write:
- file:
name: file.csv
columns: <a list, wildcards, or regex>
not_columns: <a list, wildcards, or regex>
where: category = 'something' SQL syntax


New custom function syntax
A simple custom function syntax that applies per row, without having to worry about looping through the dataframe, or using apply etc.
yaml
wrangles:
- custom.add_columns:
output: column3

py
def add_columns(column1, column2):
"""
Parameter names must correspond to column names
"""
return column1 + column2


CKAN
Supports read, write and upload/download (for run - e.g. download multiple files before the recipe starts).
yaml
read:
- ckan:
host: https://ckan.example.com
dataset: data
file: file.csv
api_key: ${API_KEY}


Other
- Added **convert.fraction_to_decimal**.
- Added **use_labels** option for **extract.custom** to group extracted entities.
- Added **format.pad** to pad text to a fixed length.
- Added **replace**, equivalent to a single standardize using regex.
- Added **extract.regex** allowing a single extract using regex.
- Added option to write data using a connector when using **log**.
- Improved consistency of behaviour when specifying a single or multiple input/outputs.
- Wildcard columns now support regex by specifying "regex: " as a prefix.
- Added **where** option for **filter** allowing specifying a SQL-like clause.
- Improved and expanded tests.
- General code tidying and refactoring.

1.0.1

Bug fixes:
- Fixed issues when using variables in recipes nested within other strings.
- Fixed extract.brackets for round and angled brackets.
- Fixed extract.date_range schema validation.

1.0

New Wrangles
- Create.bins: segment and sort data values into bins
- Extract.date_range: Extract date range between two dates
- Extract.date_properties: Extract date properties from a date (day, month, year, etc...)
- Extract.brackets: Extract text properties in brackets from the input
- Date_calculator: Add or Subtract time from a date
- Merge.dictionaries: Take dictionaries in multiple columns and merge them to a single dictionary.
- Format.prefix and format.suffix
- Convert.from_json: Convert a JSON representation into an object

Updated Wrangles
- Standardize: allows a single find/replace to be applied in a recipe without requiring creating a DIY wrangle
- Extract: allows a single extract to be applied in a recipe without requiring creating a DIY wrangle
- Extract.custom: allow for now also allows a list of models
- Extract.properties: type is now case insensitive. e.g( type: ShaPEs)
- Extract.properties: error if property type selected is not in predefined list of properties
- Extract.properties: parameter return_data_type. Allow the return value to be a string or a list (default)
- Extract.remove_words: added parameter ignore_case
- Translate: Allow full language name or code to be used. e.g. English or En
- Translate: Added parameter case. Text case can be changed before translating for better accuracy
- Log: Allow for usage of wildcards and escape character (\*)
- Rename: improved error information if rename column does not exist
- Split.text: Added parameter element. Allow to select the element after splitting the text

Connectors
- New Train Connector: train DIY wrangles from python recipes.
- S3 Connector: Additional options for downloading and uploading files within run section.
- File connector: If directory is not found then create directory rather than failing
- HTTP: Allow passing params and json.

Misc
- Improved Variable logic. Now replaced after recipe is read and converted to an object.
- Recipes can be read from a http(s) url.
- Wildcard expansion is now performed globally and not in individual wrangles (escape characters included).
- Custom functions are now be passed to sub-recipes.
- Improved error information if a user's custom function is not found.
- Wrangles.recipe terminal command no longer fails if non-function variables are present in functions.py file.
- Removed Geography Wrangle (duplicate of address).
- Custom Write functions no longer require a return statement.

Page 4 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.