Fedml

Latest version: v0.8.30

Safety actively analyzes 623343 Python packages for vulnerabilities to keep your Python projects secure.

0.8.9

FEDML Open Source: A Unified and Scalable Machine Learning Library for Running Training and Deployment Anywhere at Any Scale

Backed by FEDML Nexus AI: Next-Gen Cloud Services for LLMs & Generative AI (https://nexus.fedml.ai)

<div align="center">
<img src="https://github.com/FedML-AI/FedML/blob/master/doc/img/fedml_logo_light_mode.png" width="400px">
</div>

FedML Documentation: https://doc.fedml.ai

FedML Homepage: https://fedml.ai/ \
FedML Blog: https://blog.fedml.ai/ \
FedML Medium: https://medium.com/FedML \
FedML Research: https://fedml.ai/research-papers/

Join the Community: \
Slack: https://join.slack.com/t/fedml/shared_invite/zt-havwx1ee-a1xfOUrATNfc9DFqU~r34w \
Discord: https://discord.gg/9xkW8ae6RV

FEDML® stands for Foundational Ecosystem Design for Machine Learning. [FEDML Nexus AI](https://nexus.fedml.ai) is the next-gen cloud service for LLMs & Generative AI. It helps developers to *launch* complex model *training*, *deployment*, and *federated learning* anywhere on decentralized GPUs, multi-clouds, edge servers, and smartphones, *easily, economically, and securely*.

Highly integrated with [FEDML open source library](https://github.com/fedml-ai/fedml), FEDML Nexus AI provides holistic support of three interconnected AI infrastructure layers: user-friendly MLOps, a well-managed scheduler, and high-performance ML libraries for running any AI jobs across GPU Clouds.

![drawing](https://github.com/FedML-AI/FedML/blob/master/doc/en/_static/image/fedml-nexus-ai-overview.png)

A typical workflow is showing in figure above. When developer wants to run a pre-built job in Studio or Job Store, FEDML®Launch swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. When running the job, FEDML®Launch orchestrates the compute plane in different cluster topologies and configuration so that any complex AI jobs are enabled, regardless model training, deployment, or even federated learning. FEDML®Open Source is unified and scalable machine learning library for running these AI jobs anywhere at any scale.

In the MLOps layer of FEDML Nexus AI
- **FEDML® Studio** embraces the power of Generative AI! Access popular open-source foundational models (e.g., LLMs), fine-tune them seamlessly with your specific data, and deploy them scalably and cost-effectively using the FEDML Launch on GPU marketplace.
- **FEDML® Job Store** maintains a list of pre-built jobs for training, deployment, and federated learning. Developers are encouraged to run directly with customize datasets or models on cheaper GPUs.

In the scheduler layer of FEDML Nexus AI
- **FEDML® Launch** swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. It supports a range of compute-intensive jobs for generative AI and LLMs, such as large-scale training, serverless deployments, and vector DB searches. FEDML Launch also facilitates on-prem cluster management and deployment on private or hybrid clouds.

In the Compute layer of FEDML Nexus AI
- **FEDML® Deploy** is a model serving platform for high scalability and low latency.
- **FEDML® Train** focuses on distributed training of large and foundational models.
- **FEDML® Federate** is a federated learning platform backed by the most popular federated learning open-source library and the world’s first FLOps (federated learning Ops), offering on-device training on smartphones and cross-cloud GPU servers.
- **FEDML® Open Source** is unified and scalable machine learning library for running these AI jobs anywhere at any scale.

Contributing
FedML embraces and thrive through open-source. We welcome all kinds of contributions from the community. Kudos to all of <a href="https://github.com/fedml-ai/fedml/graphs/contributors" target="_blank">our amazing contributors</a>!
FedML has adopted [Contributor Covenant](https://github.com/FedML-AI/FedML/blob/master/CODE_OF_CONDUCT.md).

0.8.7

What's Changed
New Features
- [CoreEngine/MLOps] Supported LLM record logging.
- [Serving] Made the inference backend for deepspeed work.
- [CoreEngine/DevOps] Made the public cloud server scheduled into specific nodes.
- [DevOps] Added the fedml light docker image and related documents.
- [DevOps] Built and pushed light docker images and related pipelines.
- [CoreEngine] Added timestamp when reporting system metrics.
- [DevOps] Made the serving k8s cluster work with the latest images and updated related chart files.
- [CoreEngine] Added the skip_log_model_net option for llm training.
- [CoreEngine/CrossSilo] Supported customized hierarchical cross-silo.
- [Serving] Created the default model config and readme file if the user did not provide any model config and readme options when creating a model card.
- [Serving] Allow users to customize their token for end point and inference.
Bug Fixes
- [CoreEngine] Made compatibility when opening subprocess on windows.
- [CoreEngine] Fixed the issue that MPI Mode does not have client rank -1.
- [CoreEngine] Set the python interpreter based on the current running python version.
- [CoreEngine] Fixed the issue that failed to verify the pip ssl certificate when checking OTA versions.
- [CrossDevice] Fixed issues where the test metrics are reported twice to MLOps and loss metrics are clipped to integers on the Beehive platform.
- [App] Fixed issues when installing flamby on the heart-disease app.
- [CoreEngine] Added handler when utf-8 cannot decode the output and error string.
- [App] Fixed scripts and requirements on the FedNLP app.
- [CoreEngine] Fixed issues whereFileExistsError triggered for all os.makedirs.
- [Serving] Changed the model url to open.fedml.ai.
- [Serving] Fixed the issue for OnnxExporterError and added Onnx as default dependent library when installing fedml.
- [Serving] Fixed the issue where the local package name is different from MLOps UI.
Enhancements
- [Serving] Establish container based on user's config and improve code readability.

0.8.4

At FedML, our mission is to remove the friction and pain points of converting your ML & AI models from R&D into production-scale-distributed and federated training & serving via our no-code MLOps platform.
FedML is happy to announce our update 0.8.4. This release is filled with new capabilities, bug fixes, and enhancements. A key announcement is the launch of FedLLM for simplifying & reducing the costs associated with training & serving large language models. You can read more about it on our [blog post](https://blog.fedml.ai/releasing-fedllm-build-your-own-large-language-models-on-proprietary-data-using-the-fedml-platform/).
<img width="655" alt="Screenshot 2023-06-21 at 01 42 07" src="https://github.com/FedML-AI/FedML/assets/68619812/6da6c1de-d649-4b7b-bb32-daf074fc05cc">

New Features

- [CoreEngine/MLOps] Launched FedLLM (Federated Large Language Model) for training and serving [GitHub](https://github.com/FedML-AI/FedML/tree/master/python/app/fedllm) [Blog](https://blog.fedml.ai/releasing-fedllm-build-your-own-large-language-models-on-proprietary-data-using-the-fedml-platform/)

- [CoreEngine] Deployed Helm Charts to our repository for packaging and ease of deploying on Kubernetes https://github.com/FedML-AI/FedML/blob/master/installation/install_on_k8s/fedml-edge-client-server/fedml-server-deployment-latest.tgz https://github.com/FedML-AI/FedML/blob/master/installation/install_on_k8s/fedml-edge-client-server/fedml-client-deployment-latest.tgz

- [Documents] Refactored the devops and installation structures (devops for internal pipelines, installation for external users). https://github.com/FedML-AI/FedML/tree/master/installation

- [DevOps] Deployed a new fedml fedml-light docker image and related documents. [DockerHub](https://hub.docker.com/r/fedml/fedml/tags) [GitHub doc](https://github.com/FedML-AI/FedML/blob/master/doc/en/starter/installation.md)

- [DevOps] Built the light docker image to deploy to the k8s cluster, refined k8s related installation sections in the document. https://hub.docker.com/r/fedml/fedml-edge-client-server-light/tags

- [CoreEngine] Added support for multiple simultaneous training jobs when using our open source MLOPs commands.

- [CoreEngine] Improved training health monitoring and properly report failed status.

- [CoreEngine] Added APIs for enabling, disabling and querying client agent status. The APIs are as follows.

curl -XPOST http://localhost:40800/fedml/api/v2/disableAgent -d’{}'
curl -XPOST [http://localhost:40800/fedml/api/v2/enableAgent](http://localhost:40800/fedml/api/v2/disableAgent) -d’{}'
curl -XPOST [http://localhost:40800/fedml/api/v2/queryAgentStatus](http://localhost:40800/fedml/api/v2/disableAgent) -d’{}'
Bug Fixes

- [CoreEngine] Create distinct device ids when running multiple Docker containers to simulate multiple clients or silos on one machine. Now using the product id plus a random id as the device id

- [CoreEngine] Fixed a device assignment issue in get_torch_device in the distributed training mode.

- [Serving] Fixed the exceptions that occurred when recovering at startup after upgrading.

- [CoreEngine] Fixed the device id issue when running in the docker on MacOS.

- [App] Fixed the issue in the app fedprox + sage graph regression and graph clf.

- [App] Fixed an issue with the heart disease app failing when running in MLOps.

- [App] Fixed an issue with the heart disease app’s performance curve

- [App/Android] Enhanced Android starting/stopping mechanism and fixed the following issues:

Fixed status displays after stopping the run.
When stopping a Run during a round that has not finished, the [MNN](https://github.com/alibaba/MNN) process will remain in IDLE state (it was previously going OFFLINE).
When stopping after a round is done, the training will now stop
Python server TAG in the logs is not correct. Now you can easily find the server mentioned in logs.
Enhancements

- [Serving] Tested the inference backend and checked the response after the model deployment is finished.

- [CoreEngine/Serving] Set the GPU option based on the availability of CUDA when running the inference backend, optimize the mqtt connection checking.

- [CoreEngine] Stored model caches to the user home directory when running the federated learning.

- [CoreEngine] Added the device id to the monitor message when processing inference request

- [CoreEngine] Reported the runner exception and ignored exceptions when missing the bootstrap section in the fedml_config.yaml.

0.8.3

What's Changed
New Features
- [CoreEngine/MLOps] Introducing the FedML OTA (Over-the-Air) upgrade mechanism for the training platform and serving platform.
- [Documents] Added guidance for the OTA mechanism in the user guide document.
Bug Fixes
- [Serving] Fixed an issue where exceptions occurred when activating the model inference.
- [CoreEngine] Fixed an issue where aggregator exceptions occurred when running MPI scripts.
- [Documents] Fixed broken links in the user guide document.
- [CoreEngine] Checked if the current job is empty in the get_current_job_status api.
- [CoreEngine] Fixed a high CPU usage issue when the reload option was enabled in the client API.
Enhancements
- [Serving] Improved data syncing between Redis server and Sqlite database.
- [Serving] Implemented the use of triple elements (end point name/model name/model version) to identify each inference API request.
- [DevOps] Updated Jenkinsfile to automate the building and deployment of the model serving Docker to the K8s cluster.
- [Serving] Implemented the model monitor stop functionality when deactivating and deleting the model deployment.
- [Serving] Checked the status of the end point when recovering on startup.
- [CoreEngine] Refactored the OTA upgrade process for improved robustness.
- [CoreEngine] Attach logs to the new Run ID when initiating a new run or deploying a model.
- [CoreEngine] Refined upgrade status messages for enhanced clarity.

0.8.2

What's Changed
New Features
- [CoreEngine/MLOps] Refactor the entire serving platform to make it run more smoothly on the Kubernetes cluster.
Bug Fixes
- [Training] The training status is still running after training.
- [Training] Fixed the issue that the parrot platform can not collect and analyze metrics, events and logs.
- [CoreEngine] Make the device unique in the docker container.
Enhancements
- [CoreEngine/MLOps] Print log does not show on the MLOps distributed logging platform.
- [CoreEngine/MLOps] Use the bootstrap script to upgrade the version of FedML when we don't need to publish the pip package.

0.8.0

**FedML Open and Collaborative AI Platform**

Train, deploy, monitor, and improve machine learning models anywhere (edge/cloud) powered by collaboration on combined data, models, and computing resources
![image](https://user-images.githubusercontent.com/7238845/227211348-87a9fa62-1ef0-4c7d-9e5f-12e3dd2fa154.png)

**What's Changed**
**Feature Overview**
1. supporting MLOps (https://open.fedml.ai)
2. Multiple scenarios:
- FedML Octopus: Cross-silo Federated Learning
- FedML Beehive: Cross-device Federated Learning
- FedML Parrot: FL Simulation with Single Process or Distributed Computing, smooth migration from research to production
- FedML Spider: Federated Learning on Web Browsers

3. Support Any Machine Learning Framework: PyTorch, TensorFlow, JAX with Haiku, and MXNet.
4. Diverse communication backends (MPI, gRPC, PyTorch RPC, MQTT + S3)
5. Differential Privacy (CDP-central DP; LDP-local DP)
6. Attacker (API: fedml.core.FedMLAttacker); README: [python/fedml/core/security/readme.md](https://github.com/FedML-AI/FedML/blob/master/python/fedml/core/security/readme.md)
7. Defender (API: fedml.core.FedMLDefender); README: [python/fedml/core/security/readme.md](https://github.com/FedML-AI/FedML/blob/master/python/fedml/core/security/readme.md)
8. Secure Aggregation (multi-party computation): [cross_silo/light_sec_agg_example](https://github.com/FedML-AI/FedML/blob/master/python/examples/cross_silo/light_sec_agg_example)
9. In [FedML/python/app](https://github.com/FedML-AI/FedML/blob/master/python/app) folder, we provide applications in real-world settings.
10. Enable federated model inference at MLOps (https://open.fedml.ai)

For more detailed instructions, please refer to https://doc.fedml.ai/

**New Features**
- [Serving] Make all serving pipelines work: device login, model creation, model packaging, model pushing, model deployment and model monitoring.
- [Serving] Make three entries for creating model cards work: from the trained model list, from the web page for creating model cards, from the related CLI for fedml model.
- [OpenSource] Formally releases all of the previous versions as this v0.8.0 version: training, security, aggregator, communication backends, MQTT optimization, metrics tracing, events tracing, realtime logs.
**Bug Fixes**
- [CoreEngine] CLI engine error when running simulation.
- [Serving] Adjust the training codes to adapt the ONNX sequence rule.
- [Serving] URL error in the model serving platform.
**Enhancements**
- [CoreEngine/MLOps][log] Format the log time to NTP time.
- [CoreEngin/MLOps] Shows the progress bar and the size of the transferred data in the log when the client downloads and uploads the model.
- [CoreEngine] Client optimization when the network is weak or disconnected.

fedml_v0.6_before_fundraising

Releases

Has known vulnerabilities