Lytekit

Latest version: v0.15.14

Safety actively analyzes 681881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 52 of 69

0.16.0a8

0.16.0a3

This release introduces distributed training support and new serialization changes along with clearer error messages and miscellaneous bug fixes. See the [proposal doc](https://docs.google.com/document/d/17rNKg6Uvow8CrECaPff96Tarr87P2fn4ilf_Tv2lYd4/), which will be updated again and finalized for the coming beta release, which we expect to make by the end of next week.

Distributed Training support
Flyte task executions have to produce outputs if the interface of the task declares so. But, in distributed training, we may run multiple instances of the same task and usually only one of them will produce an output. This needs an ability for the plugin to tell that platform that only one output is expected. This is further complicated by the fact that the primary execution that produces the output may change, for e.g., if using MPI, it may be `rank0, rank1, … rankN`.

Flytekit now supports an IgnoreOutputs(Exception), which is like StopIteration. If raised, it will force flytekit to exit gracefully, without writing outputs. If all nodes in the task execution raise this exception, eventually FlytePropeller will error out, as the post-condition of outputs is not met from at-least one node. For Sagemaker distributed training, the user can specify a [function](https://github.com/lyft/flytekit/blob/annotations/flytekit/taskplugins/sagemaker/training.py#L40) that indicates if output should be persisted. For all cases in which the function returns false, the outputs are ignored. The input to the function is [DistributedContext](https://github.com/lyft/flytekit/blob/annotations/flytekit/taskplugins/sagemaker/distributed_training.py#L26). By default, for distributed training, rank0 instances are expected to produce the output.

Serialization changes
[Optional] You now no longer need to enter your flyte project container in order to serialize entities during the registration process.


pyflyte -c flyte.config --pkgs recipes serialize --in-container-config-path /root/flyte.config \
--local-source-root ${CURDIR} --image somedocker.com/myimage:someversion123 workflows -f _pb_output/


After using out of container serialize, proceed to `register-files` like you previously have, using flyte-cli and also optionally outside of your container.

Fast serialization
You can use out of container fast serialize to skip having to build a docker container between code changes as you iterate on your workflows. As long as your changes do not alter dependencies or otherwise require rebuilding your docker container you can use the following:


pyflyte -c flyte.config --pkgs recipes serialize --in-container-config-path /root/flyte.config \
--local-source-root `pwd` --image somedocker.com/myimage:someversion123 fast workflows -f _pb_output/


And fast-register using:


flyte-cli fast-register-files -p <project> -d <domain> -h flyte.example.com \
--additional-distribution-dir s3://my-s3-bucket/fast/ --dest-dir /root/recipes _pb_output/*

0.16.0a2

Alpha release - v0.16.0a2 (almost final)

We've been hard at work over the holidays putting this together. This will be the final alpha release for the new natively typed Flytekit. Please see the [earlier](https://github.com/lyft/flytekit/releases/tag/v0.16.0a0) [two](https://github.com/lyft/flytekit/releases/tag/v0.16.0a1) releases as well as the [proposal doc](https://docs.google.com/document/d/17rNKg6Uvow8CrECaPff96Tarr87P2fn4ilf_Tv2lYd4/), which will be updated again and finalized for the coming beta release, which we expect to make by the end of next week.

Changes
Potentially Breaking
First, the updates in this release that may break your existing code, and what changes you'll need to make.
* Task settings have been condensed into `TaskConfig` subclasses as opposed to just indiscriminate `kwargs`. For instance,

from flytekit.taskplugins.hive.task import HiveConfig
HiveTask(
cluster_label="flyte", (old)
task_config=HiveConfig(cluster_label="flyte"), (new)
...
)

* Spark session functionality has been nested inside the Spark context

sess = flytekit.current_context().spark_session
count = sess.parallelize(...) (old)
count = sess.sparkContext.parallelize(...) (new)

* Tasks that take an explicit metadata should be changed to use the `TaskMetadata` dataclass instead of the `metadata()` function.

from flytekit import metadata (old)
from flytekit import TaskMetadata (new)
in a task declaration
WaitForObjectStoreFile(
metadata=metadata(retries=2), (old)
metadata=TaskMetadata(retries=2), (new)
...
)

* Types have been moved to a subfolder so your imports may need to change. This was done because previously importing one custom type (say `FlyteFile`) would trigger imports of all the custom types in flytekit.

from flytekit.types import FlyteFile, FlyteSchema (old)
from flytekit.types.file import FlyteFile (new)
from flytekit.types.schema import FlyteSchema

* Flytekit tasks and workflows by default assign names to the outputs, `o0`, `o1`, etc. If you want to explicitly name your outputs, you can use a `typing.NamedTuple`, like

rankings = typing.NamedTuple("Rankings", order=int)
def t1() -> rankings:
...

Previously though, flytekit was accidentally de-tuple-tizing single-output named tuples. That has been fixed.

r = t1()
read_ordering_task(r=r) (old)
read_ordering_task(r=r.order) (new)


Process Changes
Registration of Flyte entities (that is, translating your Python tasks, workflows, and launch plans from Python code to something that Flyte understands) has always been a two step process, even if it looks like one step. The first step is compilation (aka serialization), where users Python code is compiled down to protobuf files, and the second step is sending those files over the wire to the Flyte control plane.

In this release, we've further isolated moved where certain settings are read and applied. That is, when you call

pyflyte --config /code/sandbox.config serialize workflows -f /tmp/output

the `project`, `domain`, `version` values are no longer in the compiled protobuf files. `kubernetes-service-account` `assumable-iam-role`, and `output-location-prefix` have all also been removed. If you inspect the protobuf files via `flyte-cli parse-proto -f serialized_task.pb -p flyteidl.admin.task_pb2.TaskSpec` you should see that those values are now missing. Instead, they will be filled in during the second step, in `flyte-cli`. Your registration command should now look like.

flyte-cli register-files -p <project> -d <domain> -v <your version, we recommend still the git sha> --kubernetes-service-account <account> OR --assumable-iam-role --output-location-prefix s3://some/bucket -h flyte.host.com serialized_protos/*


Note however that the container image that tasks run is still specified at serialization time (and we suggest that the image version also be same git sha).

This change was made because serialized protos are now completely portable. You can serialize your code, hand them to someone else, and as long as they have access to the container images specified in the task specifications, they can register them under a completely different AWS account or cloud provider.

In the near future, we hope to add some automation to our [examples](https://github.com/lyft/flytesnacks) repo so that with each release of flytekit, we also publish the docker image for the cookbook (Second Edition) along with all the serialized artifacts so that users can just run one registration command to pick them all up and play around with them.

Other Changes
* Resources, Auth and custom environment variables now piped through correctly in task decorator
* Schedules and Notifications added to launch plans
* Reference Entities refactored and Reference Launch Plans added
* Added `FlyteDirectory` as a parallel to `FlyteFile`
* Minor cleanup of some mypy warnings
* Shift operators (aka `runs_before` function) along with an explicit node creating call introduced. For those of you familiar with the existing Flytekit (master), this style should be reminiscent.

* Additional task types ported over
* Pytorch
* Sagemaker task and custom training task
* Sagemaker HPO
* The default node and output names that Flytekit assigns have been condensed from `out0`, `out1`, etc, to `o0`, `o1`... Nodes have been shortened from `node-0` to `n0`. This is just to save on disk, network, and compute.
* Workflow metadata settings like failure policy and interruptible have been added to the workflow decorator.

0.16.0a1

Changes

Since the last alpha release...

Multi-image support
Many users have complained that it doesn’t make sense to force all tasks to use the same image, especially when workflows can span spark tasks and GPU-based ML tasks. For instance if you publish a separate spark image with the tag `spark-<sha>`, you can get a task to use it instead of the default by adding the following to the task decorator.

python
container_image='{{.image.default.fqn}}:spark-{{.image.default.version}}'


Branches
Branching logic has been fully implemented and a minor bug patched in flytepropeller, so you can now call conditionals in your workflow. See the [cookbook](https://flytecookbook.readthedocs.io/en/latest/auto_recipes/02_intermediate/run_conditions.html) for an example.

Tasks
[Hive tasks](https://flytecookbook.readthedocs.io/en/latest/auto_recipes/02_intermediate/hive.html) were ported to this new flytekit API and backend plugin changes made to increase the customizability of Hive queries. Please see the [issue](https://github.com/lyft/flyte/issues/631) for the full discussion of the changes. Sidecar tasks have also been ported over. Please see the [examples](https://flytecookbook.readthedocs.io/en/latest/auto_recipes/02_intermediate/sidecar.html). The ability to specify task resources has also been added.

Spark
Spark is a supported task type. [This example](https://flytecookbook.readthedocs.io/en/latest/auto_recipes/02_intermediate/pyspark_pi.html#sphx-glr-auto-recipes-02-intermediate-pyspark-pi-py) provides more details about how to use spark in flytekit. In spark-enabled clusters, simply decorate a function like `hello_spark` with

python
from flytekit.taskplugins.spark import Spark
this configuration is applied to the spark cluster
task(spark_conf={
...
})
def hello_spark(partitions: int) -> float:
...


Within this function users can access the Spark context as follows
python
session = flytekit.current_context().spark_session

Spark Tasks can also simply return `pyspark.DataFrame` as a valid supported returned object for `FlyteSchema` types. This only works when `FlyteSchema` is declared as a return type. Refer to this [example](https://flytecookbook.readthedocs.io/en/latest/auto_recipes/02_intermediate/dataframe_passing.html) for more.

Dataclasses
Often times flytekit users want to use custom data objects in their classes and tasks. Flyte supports both dictionaries and json natively, but there is no simple interface to return this information. Thus, if users want to use custom data-classes, they can now use python **dataclasses** in their tasks and workflows. Refer to this [example](https://flytecookbook.readthedocs.io/en/latest/auto_recipes/02_intermediate/custom_objects.html) for more information.

Minor changes
When returning files in tasks, the filename now gets preserved instead of becoming a random string of characters. Upon download of an incoming file, the filename is also preserved. Better handling around empty dicts and functions with no outputs.

Cookbook
Some of the links above lead to the new cookbook ("2nd edition") we've been working on. Please keep in mind that content on there isn't complete - we'll be pushing to it frequently. Documentation is always a work in progress. Please let us know if you like or dislike the new format. We are moving to it because the plugins used enable literate programming, and we feel that style is especially well suited for this repo, but let us know if you disagree.

Setup
Please continue to refer to the instructions from the initial [alpha0 release](https://github.com/lyft/flytekit/releases/tag/v0.16.0a0) for the iteration steps.

As always, please give us feedback on this release, or anything else Flyte in the open source Slack (flytekit or any other channel).

Thanks!

0.16.0a0

_Please refer to the [design doc](https://docs.google.com/document/d/17rNKg6Uvow8CrECaPff96Tarr87P2fn4ilf_Tv2lYd4/edit) for additional background information._

Native Typing Support

Take a look at the examples in our [cookbook](https://github.com/lyft/flytesnacks/tree/annotated/cookbook/recipes/native_typing). Because of changes to the Dockerfile, this PR has not been merged to master.

You should probably play around with the new interface with that repo as the starting point.
Here is a list of the salient features in the new version,
1. Write code in python using python-3.7+ typing annotations. Once happy with the python function - decorate it with `task`. No need to use `inputs`, `outputs`, `spark_task`.
1. A Workflow is just another python function. Use the `workflow` decorator on a python function that accepts inputs and produces outputs. The inputs to the workflow function will become inputs to the workflow and the outputs will become outputs of the workflow.
* Syntactically workflow and tasks look the same, but semantically the contents of a workflow function are run at compile time and the content of tasks are run at runtime. The Flyte serialization process will convert the workflow function to the graph representation understood by Flyte Admin.
* For this reason, please abstain from using `random()`, `datetime.now()`, etc. in the workflow function. Constants are valid in the workflow however.
* Currently we only support invoking both the task and a workflow with keyword arguments (`t1(in1=3)` as opposed to `t1(3)`). This is to ensure that we bind the correct variables.
* Native inputs to the workflow function are transformed into Flytekit objects. That is, while tasks can accept them as inputs, they cannot be directly manipulated/accessed using Python native functions (like `range` for an int). You should expect an error.
* In the workflow the tasks can be accessed directly and outputs can be captured as variables and passed on to the next task

a, b = t1(x=y);
return t2(x=a)


* You can print the outputs of a task within a workflow when run locally, but they may look a little different (stay tuned)
3. The workflows and tasks can be run locally, just as if you are invoking a python function
1. Flyte interpreter today supports translating most python native objects like
* `Int, float, str, datetime.datetime, datetime.timedelta, os.pathlike, TextIO, Dict[type, type]. List[Type]` and also untyped dict
* It also supports advanced types like `pandas.DataFrame` (more coming)
* Flyte introduced 2 new types, `FlyteFile` and `FlyteSchema`
* The most interesting thing is the TypeSystem is completely extendable, new types can be added which requires translations (more on this later)
5. The Flyte Task plugins have also been greatly simplified. Extending Tasks and adding new types is easier (more on this later)


Setup

1. If you don't already have one, create a virtualenv with Python 3.7 or higher.
1. `pip install flytekit[all]==0.16.0a0`
1. Upgrade your requirements(3).in/txt to this version as well and run pip compile if that is part of your process.

Iteration
If you look through examples in the `native_typing` folder you'll see that each file has a main section `__main__`. That means you can run each of these examples locally. (We’ll be adding unit tests for each later.)


python recipes/native_typing/simple.py
Running recipes/native_typing/simple.py main...
Running parent_wf(a=3) (5, 'world', 'world')
Running parent_wf_with_subwf_default(a=30) (32, 'world', 'world')
Running parent_wf_with_lp_with_default(a=40) (42, 'world', 'world')
Running parent_wf_with_lp_overriding_input(a=50) (52, 'world', 'world')


Registration
Note:
One key difference besides the obvious interface changes is that the instructions below run two commands rather than one to register tasks/workflows with the control plane. This process has actually always been around and we would like to steadily move to it as the flytectl project gains momentum. As such, you'll need to use `flyte-cli` in addition to `pyflyte`.

From here, note that the `sandbox.config` file's `workflow_packages` setting has been scoped down to only the new `native_typing` folder. This is just to make the following steps quicker.

As seen from above, to play around locally, you will not need to build any containers. When the time comes to register your workflows with Flyte, use the following commands to serialize and register. The instructions below assume you have a Flyte installation running.
1. If you haven't used `flyte-cli` before, run `flyte-cli setup-config -h flyte.lyft.net` (only if your Flyte admin has authentication turned on, not for local deployments). See more information [here](https://lyft.github.io/flyte/user/features/flytecli.html?highlight=setup%20config#flyte-cli-user-configuration).
1. `make serialize_sandbox` This will create protobuf files in the `_pb_output` directory of the repo
1. `flyte-cli register-files _pb_output/*`

At Lyft, additional instructions will be sent out.

0.15.6

Not secure
Just a bump, forgot to manually change the hardcoded version. In 0.16.x and later, this is done automatically by GitHub workflows.

Page 52 of 69

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.