Alpha release - v0.16.0a2 (almost final)
We've been hard at work over the holidays putting this together. This will be the final alpha release for the new natively typed Flytekit. Please see the [earlier](https://github.com/lyft/flytekit/releases/tag/v0.16.0a0) [two](https://github.com/lyft/flytekit/releases/tag/v0.16.0a1) releases as well as the [proposal doc](https://docs.google.com/document/d/17rNKg6Uvow8CrECaPff96Tarr87P2fn4ilf_Tv2lYd4/), which will be updated again and finalized for the coming beta release, which we expect to make by the end of next week.
Changes
Potentially Breaking
First, the updates in this release that may break your existing code, and what changes you'll need to make.
* Task settings have been condensed into `TaskConfig` subclasses as opposed to just indiscriminate `kwargs`. For instance,
from flytekit.taskplugins.hive.task import HiveConfig
HiveTask(
cluster_label="flyte", (old)
task_config=HiveConfig(cluster_label="flyte"), (new)
...
)
* Spark session functionality has been nested inside the Spark context
sess = flytekit.current_context().spark_session
count = sess.parallelize(...) (old)
count = sess.sparkContext.parallelize(...) (new)
* Tasks that take an explicit metadata should be changed to use the `TaskMetadata` dataclass instead of the `metadata()` function.
from flytekit import metadata (old)
from flytekit import TaskMetadata (new)
in a task declaration
WaitForObjectStoreFile(
metadata=metadata(retries=2), (old)
metadata=TaskMetadata(retries=2), (new)
...
)
* Types have been moved to a subfolder so your imports may need to change. This was done because previously importing one custom type (say `FlyteFile`) would trigger imports of all the custom types in flytekit.
from flytekit.types import FlyteFile, FlyteSchema (old)
from flytekit.types.file import FlyteFile (new)
from flytekit.types.schema import FlyteSchema
* Flytekit tasks and workflows by default assign names to the outputs, `o0`, `o1`, etc. If you want to explicitly name your outputs, you can use a `typing.NamedTuple`, like
rankings = typing.NamedTuple("Rankings", order=int)
def t1() -> rankings:
...
Previously though, flytekit was accidentally de-tuple-tizing single-output named tuples. That has been fixed.
r = t1()
read_ordering_task(r=r) (old)
read_ordering_task(r=r.order) (new)
Process Changes
Registration of Flyte entities (that is, translating your Python tasks, workflows, and launch plans from Python code to something that Flyte understands) has always been a two step process, even if it looks like one step. The first step is compilation (aka serialization), where users Python code is compiled down to protobuf files, and the second step is sending those files over the wire to the Flyte control plane.
In this release, we've further isolated moved where certain settings are read and applied. That is, when you call
pyflyte --config /code/sandbox.config serialize workflows -f /tmp/output
the `project`, `domain`, `version` values are no longer in the compiled protobuf files. `kubernetes-service-account` `assumable-iam-role`, and `output-location-prefix` have all also been removed. If you inspect the protobuf files via `flyte-cli parse-proto -f serialized_task.pb -p flyteidl.admin.task_pb2.TaskSpec` you should see that those values are now missing. Instead, they will be filled in during the second step, in `flyte-cli`. Your registration command should now look like.
flyte-cli register-files -p <project> -d <domain> -v <your version, we recommend still the git sha> --kubernetes-service-account <account> OR --assumable-iam-role --output-location-prefix s3://some/bucket -h flyte.host.com serialized_protos/*
Note however that the container image that tasks run is still specified at serialization time (and we suggest that the image version also be same git sha).
This change was made because serialized protos are now completely portable. You can serialize your code, hand them to someone else, and as long as they have access to the container images specified in the task specifications, they can register them under a completely different AWS account or cloud provider.
In the near future, we hope to add some automation to our [examples](https://github.com/lyft/flytesnacks) repo so that with each release of flytekit, we also publish the docker image for the cookbook (Second Edition) along with all the serialized artifacts so that users can just run one registration command to pick them all up and play around with them.
Other Changes
* Resources, Auth and custom environment variables now piped through correctly in task decorator
* Schedules and Notifications added to launch plans
* Reference Entities refactored and Reference Launch Plans added
* Added `FlyteDirectory` as a parallel to `FlyteFile`
* Minor cleanup of some mypy warnings
* Shift operators (aka `runs_before` function) along with an explicit node creating call introduced. For those of you familiar with the existing Flytekit (master), this style should be reminiscent.
* Additional task types ported over
* Pytorch
* Sagemaker task and custom training task
* Sagemaker HPO
* The default node and output names that Flytekit assigns have been condensed from `out0`, `out1`, etc, to `o0`, `o1`... Nodes have been shortened from `node-0` to `n0`. This is just to save on disk, network, and compute.
* Workflow metadata settings like failure policy and interruptible have been added to the workflow decorator.