Raphtory

Latest version: v0.13.1

Safety actively analyzes 682471 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 7

0.5.1

**Full Changelog**: https://github.com/Pometry/Raphtory/compare/v0.5.0...v0.5.1

0.5.0

Since our transfer from Java to Rust and the release of Raphtory 0.3.0 the team has been hard at work perfecting the ways in which you can interact with your graphy data. As usual there are more updates than you can shake a stick at, so below are a couple of the key highlights!

Highlights 🏆

API Makeover 🎮

- **Pandas Connectors:** To make the ingestion process easier, Raphtory graphs can now automatically ingest Pandas dataframes, fully handling the merging of complex property rich data.
- **Property Overhaul:** We have overhauled graph properties from the inside out, allowing you to store values on nodes, edges and the graph itself. These can now also be much more complex values, such as dicts, lists and even whole Raphtory graphs, allowing for mind melting hierarchal modelling 🤯
- **Unified Property API:** These new property features are joined by a unified property API, supporting fine grained manipulation and aggregation.
- **Vectorised Exploration:** Instead of performing operations on individual edges or vertices you can now call the same functions on collections of `Vertices` and `Edges` and the operation will be performed on all of them simultaneously.
- **Standardised Algorithmic Output:** To top it off, algorithms now return a new result object which provides group_by's, sorting, top_k, and translation back into a Pandas dataframes - drastically reducing the complexity of pipelines involving Raphtory.

New Views 👀

- **Edge Deletions:** Deletions are back in Raphtory and they have got their own semantics! With the new `GraphWithDeletions` class you can model relationships with duration and graph views (such as windowing) will include all prior entities unless explicitly deleted.
- **Subgraphs:** With the new subgraph API you can subset the nodes in your graph, based on any criteria you like, hiding the rest and their associated edges. This allows you to apply `k-core` filters, investigate the importance of individual nodes by 'removing' them from the analysis or look at individual communities without having to ever recreate the graph.
- **Multi-Layer Graphs:** When adding edges to a Raphtory graph you can now add an additional parameter, setting the `layer` within which the edge belongs. This allows you to model entities in your data sharing different relationship types or nodes existing in multiple networks simultaneously.
- **Composable Views:** All of these new views, along with those already available, are fully composable allowing you to combine them in any way your heart desires!
- **Materialisable Views:** Last but not least, any view you create via a combination of all of the above can be materialised into a new graph or saved to disk in our binary format for later analysis.

Raphtory as a service :globe_with_meridians:
- **Running a Raphtory Service:** Adding to the many ways Raphtory can be utilised you can now run Raphtory as a GraphQL server! For this you can either load graphs directly from a python dict or from a directory of saved graphs on disk. The GraphQL schema then provides exactly the same functionality as the python APIs.
- **Indexed Graphs and Searching:** In addition to the standard Raphtory APIs, the GraphQL server also indexes all graphs it is hosting via `tantivy` allowing for fuzzy searching of node/edge properties as part of your queries!

Where to go next ⏭️

If you would like to give this new version a go, you can check out the [docs](https://docs.raphtory.com/) as well as our [examples](https://github.com/Pometry/Raphtory/tree/master/examples) - available in both Python and Rust. Good places to start are the Jupyter notebook for the [Reddit](https://github.com/Pometry/Raphtory/blob/master/examples/py/reddit/demo.ipynb) snap dataset.

If you have an other suggestions please do let us know via Github [issues](https://github.com/Pometry/Raphtory/issues), [discussions](https://github.com/Pometry/Raphtory/discussions) or on [Slack](https://join.slack.com/t/raphtory/shared_invite/zt-xbebws9j-VgPIFRleJFJBwmpf81tvxA)!

**Full Changelog**: https://github.com/Pometry/Raphtory/compare/v0.3.1...v0.5.0

0.4.3

- [x] Publish to crates.io
- [x] Publish to PyPi
- [x] Make Tag
- [x] Release to Github
- Auto-generated by [create-pull-request] triggered by release action [1]
[1]: https://github.com/peter-evans/create-pull-request

0.4.2

- [x] Publish to crates.io
- [x] Publish to PyPi
- [x] Make Tag
- [x] Release to Github
- Auto-generated by [create-pull-request] triggered by release action [1]
[1]: https://github.com/peter-evans/create-pull-request

0.4.1

- [x] Publish to crates.io
- [x] Publish to PyPi
- [x] Make Tag
- [x] Release to Github
- Auto-generated by [create-pull-request] triggered by release action [1]
[1]: https://github.com/peter-evans/create-pull-request

0.4.0

1. A new analysis engine with a clearer more intuitive API;
2. A full rework of Raphtory orchestration, allowing for clean local and distributed deployment;
3. Batching and backpressure added between all components, improving stability and providing 10-100x speed up in ingestion and analysis conversion;
4. Integration tests for all update types, different analysis tasks and windowed perspectives;
5. CI/CD built on top of the tests for automated testing of branches, publishing of nightly builds and release tagging.

Analysis Control Overhaul
- Added the `Query Manager` and `Query Handler` to deal with the organisation of a query and replace the Analysis Manager/Task Handlers.
- Added `PerspectiveControllers` which look after all `perspectives` (combinations of timestamp + windows) within a job.
- Added the `QueryExecutor` inside of each Partition to run submitted algorithms - replacing the `AnalyserExecutor`.
- Centralised `safetime` checking within the watermarker instead of asking all the Executors independently.
- Removed all REST api and replaced with a programatic `RaphtoryClient` which can be run via a compiled Jar or via the scala REPL.

New Graph Algorithm Structure
- Added `Graph Algorithm`, `Graph Perspective` and `Table` traits as the components of our new algorithm API.
- Graph algorithms consist of calling `Step` and `Iterate` on the graph perspective - these take a function to run on each vertex. This massively expands the analytical facilities of Raphtory allowing the chaining of multiple algorithmic steps.
- Once an algorithm has been defined via step/iterate the user may call `select` which converts each vertex to a row and returns a table abstraction. This table may be `filtered`, `exploded` (turning one row into N rows as defined by a user given function) and then written to disk via `writeTo`.
- Global aggregation is currently removed as it was causing several issues in the previous analytical model. Elements such as counting, groupBy, topK, etc. will be added in the next minor version.

New Analysis Features
- Added Explode Edges and helpers to view Temporal Edges as singular entities (one for each update).
- Added a global vertex count within Graph Algorithms via `graph.nodeCount` which changes based upon the perspective.
- Added an equivalent function to assignID inside of the graphAlgorithm called checkID this is a helper function which allows the user to feed queries with the strings that exist in the raw data and convert these to the internal long ID.
- Changed property access to take a class tag (i.e. `getState[String]()`) instead of requiring `.asInstanceOf[T]`
- Added `getStateOrElse` to vertex visitor.

Deployment Overhaul
- Moved away from docker fully, allowing Raphtory to be compiled into a Jar and deployed on bare metal.
- Organised deployment classes into `RaphtoryGraph` for local deployment and `RaphtoryService` for distributed deployments.
- Added a `RaphtoryClient` for users to submit jobs to either Graph or Service.
- Converted the Raphtory Component (top level akka handler of spout, builder, partition, etc) into a Component Factory Object used by all deployment classes i.e. the `RaphtoryGraph` and `RaphtoryService`
- Moved creation of Spout and Builder away from Scala reflection to allow them to have multiple parameters.

Message Batching
- Batched the messages between the spout and the builders to minimise the amount of akka messages sent between them. The size of this is configurable by `RAPHTORY_BUILDER_BATCH_SIZE`.
- Added a outgoing queue map for the builder (one queue for each partition) which will be pulled when the partition is ready for more data.
- Added `RAPHTORY_BUILDER_MAX_CACHE` for the total amount of messages a builder will hold before it stops pulling data from the spout (to stop it becoming memory overloaded itself).
- Added `RAPHTORY_PARTITION_MIN_QUEUE` which is number of messages in the queue of each partition actor below which the Partition Manager will request more data from the builders.
- Batched effect sync messages between partitions which are pushed out every second.
- Batched vertex messages between Readers during analysis, which are flushed after each superstep has concluded on all vertices in the perspective.
- Changed ChannelID between the builder/partition to be Ints instead of strings and only send 1 per batch.

Message Handling
- Swapped to Twitter Chill library for kryo serialisation instead of altoo-ag. This removed the need to specify each case class manually in the conf and allows users to send their own types.
- Moved all actors onto the large message queue - allowing heartbeats etc to use the normal queue unopposed - shrinking the queue size to 100k, but increasing each message frame to 10MB.
- Bumped up heartbeat monitoring to stop akka complaining when actors are looping through a batch of messages.
- Added a new custom mailbox which tracks the amount of messages in each actors queue to get a better idea of workload. This is how the Partition Manager knows how much data each partition has to process.
- Swapped the vertex message queue to use ArrayBuffer instead of ListBuffer
- Swapped the watermarker to ask the watchdog when the cluster is ready before probing for timestamps to stop fake dead letters.

Testing:
- Added a new version of the all commands test, which is more extensible than the prior and works with the new API. This also doesn't require the `golden standard` data to be available, instead comparing hashes. The data has also been removed from the repo and is now pulled into /tmp within a users first run.
- Added a Raphtory pseudo distributed deployment class (`RaphtoryPD`) which simulates a real distributed environment for testing.
- Adding speed logging to All commands test for ingestion and queries
- Added speed logging to the Query Manager for all jobs.


CI/CD
- Added CI pipeline workflow via GitHub actions which will - Run the all commands test on push to any branch that isn’t master - Run a nightly build of develop branch and publish a tag and release on successful build - Run a build on push to master branch, bump semantic version, create a tag and release on successful build
- Added in badges to readme to show latest build run status for push and scheduled events
- Added in badges to readme to show latest tag and release versions (SemVer only as to only show published release rather than nightly builds)

Partition Overhaul
- Have abstracted the graph partition so that we can work on the storage analysis and ingestion separately. They were very intertwined before.
- Have removed the concept of a local partition/partitioned shared state as this means we can move partitions round a lot easier and support other non-object based partitions i.e. arrow.
- Turned the object based entity visitors and graph lens into interfaces so that we can easily see what a user will have access to + wrapper functions. This also further separates the storage and analysis.
- Current version of the entity storage implements these and has been renamed as `POJOGraph` to make it distinct from later implementation in frameworks such as arrow.
- Reworked all the `worker classes` (router/writer/reader) to have a unique name and not reference the machine they are on.
- Turned the actor lookup functions within the RaphtoryActor into lazy vals so that they do not reinitialise every call.
- Removed all ParTrieMaps (our last parallel data structure) as these were causing resource contention.

Partition Memory Management:
- Swapped all multiple.treemaps to ArrayBuffers and moved to using arrays in analysis were possible as these are apparently more efficient based on recent benchmarks (https://www.lihaoyi.com/post/BenchmarkingScalaCollections.html).
- Added a state deduplication step run periodically in the partitions to remove any state which is stored more than once (say in the instance of a vertex added at the same time as some of its edges).
- Removed the HistoricOrdering object as it is no longer used without the trees and one was being instantiated for each entity in the graph taking huge amounts of memory.

Spout
- Deleted the multiline file spout as not used.
- Added a mongoDB spout.
- Added a parquet spout.
- Fixed the Kafka spout to work with the overhaul from 0.3.0

Config
- Added `RAPHTORY_LEADER_ADDRESS`, `RAPHTORY_LEADER_PORT`, `RAPHTORY_BIND_ADDRESS`, `RAPHTORY_BIND_PORT` for specifying where a Raphtory service should be binding to and where to look for the leader of a deployment.
- Added `RAPHTORY_DATA_HAS_DELETIONS` flag to only run the extra steps for handling deletions which such elements exist in the data.

Misc clean up
- Removed all env variables throughout the code, notably in spouts and algorithms - these are all now taken as class arguments.
- Tided up the Raphtory components and placed previously duplicated akka code such as the mediator into the top level Raphtory Actor.
- Added a `Windows` type as a wrapper to List[Long] to better explain what is happening when you submit a query
- Shortened Raphtory job names, removing the full algorithm path.
- Deleted env-setter as no longer using raphtory docker image.
- Deleted old docker settings inside of `build.sbt` as we no longer build straight into docker.
- Removed all the old compile at runtime code as now depreciated.
- Removed old Kamon logging code which caused deadlocking issues.
- Removed the Router Manager and fully renamed the router to graph builder throughout.
- Deleted unneeded utils and and actors, including the original seed actor.
- Deleted the old Snapshot Manager
- Increased the frequency of watermarking
- Added logging info of the IP of each Partition joining a Raphtory cluster to better discern where an issue arrises from.

raphtory-akka-0.3.0
A large number of changes have occurred in dev causing it to diverge largely from master and current documentation. Before any larger changes are completed (notably snapshotting, breaking away from docker and the creation of a new analysis API) we are releasing 0.3.0. The changes for this are listed below:

**Major Changes - Raphtory Management**
- Raphtory have been upgraded to use Akka 2.6 and swapped from Netty to Artery
- All messaging is now handled through Kryo serialiser instead of the default Java serialiser
- The Watchdog SeedNode and Watermark Manager are now combined into an orchestration actor group that manages the whole cluster.
- All Raphtory Components are now managed by a Raphtory Component Connector which ensures cluster startup by reporting to the watchdog.
- Writer Workers can now martial and un-martial the state of their allocated entity storage to/from parquet.

**Major Changes - Analysis Management**
- Raphtory’s logic for handling analysis has been totally rewritten. The Analysis Manager now spawns a task that contains the full control logic for each submitted query. This task requests the Reader Workers to create a separate actor for the analysis (the AnalysisSubtaskWorker) which contains all vertex visitors. Once the analysis is completed (across all flattenings) both the Task and SubtaskWorkers can be killed, removing all analytical states and stopping the build-up of visitors/analysis properties over time.
- The above drastically simplified the VertexMultiQueue which now only needs an odd and an even mailbox instead of storing timestamp and windowsize as well.
- PubSub was removed as a communication method between Analysis Task and SubtaskWorkers in favour of direct actor messaging. Completed for performance and practical reasons (new actors spawning requires gossiping of their location which is slow and causes intermittent errors).
- VertexMessageHandler was created to track all vertex messages between SubtaskWorkers.
- ViewLens and WindowedLens compressed into one class (GraphLens)

**Major Changes - Analysis API**
- Analysers now require the user to return a map of results which can then be serialised in a variety of ways. This makes the analyser more general and removes the need to edit the code when the user wants to swap from saving to a text file to saving to mongo etc.
- Raphtory queries may now be submitted with a serialiser class which contains the logic to save the results in the users desired format.
- Raphtory Serialisers handle both windowed and unwindowed flattenings, therefore, ProcessResults and ProcessWindowResults have been replaced with extractResults, removing A LOT of duplicated code.
- The old serialiser type which extended Analyser is now removed as redundant.
- Analysers are now typed, specifying what each subtask worker is returning and, therefore, what the analysis task is aggregating. This removes the need for unpleasant casting inside of extractResults.
- The Query API has had the explicit window type arguments removed (true, false, batched). A user may now simply submit a window set (which can include one window) and raphtory will handle it internally.
- Vertex Visitor and Edge Visitor were renamed to Vertex and Edge for user clarity.
- Double args array submission is now no longer possible with the RaphtoryGraph removing confusion.
- Example algorithms have been updated with the new API
- Example Algorithms have been given named param alternatives to the args array.

**Test Changes**
- All Commands Test changed to use set generated file as all scala versions seem to do something different in utils.Random (see testUpdates.txt in dev/allcommands)
- Datablast Analyser added which throws large arrays of data from all subtask workers to see how well Raphtory handles it.
- All commands test converted to unit tests which are fully automated to make comparisons between versions a lot simpler.

**Minor Changes**
- build.sbt has been cleaned up and organised
- A large amount of package refactoring ahead of breaking the project into core + modules
- The router has been officially renamed GraphBuilder
- Initial SnapshotManager included, but currently stub
- Initial GraphAlgorithm included, but currently a stub
- AnalysisUtils created for misc runtime compilation code.
- Swapped from 32 bit murmur3 hash to 64 bit xxHash to remove chances of collisions.

raphtory-akka-0.2.0

Page 5 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.