This release updates the `Step` class `__init__` process, and adds some methods for witching the filepaths in the step manifest between relative and absolute paths.
- More keyword arguments were added, to set configurations like `step.filepath_columns` and `step.metadata_columns`.
- Options to set the project name and the step name to be something other than what's inferred by the directory tree were added for future use but are currently nonfunctional.
- The new init defaults are
def __init__(
self,
clean_before_run=True,
filepath_columns=["filepath"],
metadata_columns=[],
step_name=None,
package_name=None,
direct_upstream_tasks: List["Step"] = [],
config: Optional[Union[str, Path, Dict[str, str]]] = None,
):
- There are now a lot of keyword args, but you only have to set them in your step classes if you want them to be different from the default values.
- For example in the `Raw` step below,
class Raw(Step):
def __init__(self, filepath_columns=["col1", "col2"]):
super().__init__(filepath_columns=filepath_columns)
I need two filepath columns, so I set that in the init, but don't set anything else, since I'm fine with the defaults. I also pass `filepath_columns` to `super().__init__`, so that my `Raw` class gets to use the initialization process that's already defined in `Step`.
- The methods for switching between relative and absolute paths are `step.manifest_filepaths_rel2abs` and `step.manifest_filepaths_abs2rel`. For example if i had a `QC` step after `Raw` that needed data from raw, i could use
class QC(Step):
def __init__(self):
super().__init__()
def run():
raw = Raw()
raw.manifest_filepaths_abs2rel()
interate through raw.manifest and do some qc
to move from relative paths in raw, which is needed for uploading to quilt, to absolute paths, which are easier for local file access.