Raphtory

Latest version: v0.14.0

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 7

0.4.0

1. A new analysis engine with a clearer more intuitive API;
2. A full rework of Raphtory orchestration, allowing for clean local and distributed deployment;
3. Batching and backpressure added between all components, improving stability and providing 10-100x speed up in ingestion and analysis conversion;
4. Integration tests for all update types, different analysis tasks and windowed perspectives;
5. CI/CD built on top of the tests for automated testing of branches, publishing of nightly builds and release tagging.

Analysis Control Overhaul
- Added the `Query Manager` and `Query Handler` to deal with the organisation of a query and replace the Analysis Manager/Task Handlers.
- Added `PerspectiveControllers` which look after all `perspectives` (combinations of timestamp + windows) within a job.
- Added the `QueryExecutor` inside of each Partition to run submitted algorithms - replacing the `AnalyserExecutor`.
- Centralised `safetime` checking within the watermarker instead of asking all the Executors independently.
- Removed all REST api and replaced with a programatic `RaphtoryClient` which can be run via a compiled Jar or via the scala REPL.

New Graph Algorithm Structure
- Added `Graph Algorithm`, `Graph Perspective` and `Table` traits as the components of our new algorithm API.
- Graph algorithms consist of calling `Step` and `Iterate` on the graph perspective - these take a function to run on each vertex. This massively expands the analytical facilities of Raphtory allowing the chaining of multiple algorithmic steps.
- Once an algorithm has been defined via step/iterate the user may call `select` which converts each vertex to a row and returns a table abstraction. This table may be `filtered`, `exploded` (turning one row into N rows as defined by a user given function) and then written to disk via `writeTo`.
- Global aggregation is currently removed as it was causing several issues in the previous analytical model. Elements such as counting, groupBy, topK, etc. will be added in the next minor version.

New Analysis Features
- Added Explode Edges and helpers to view Temporal Edges as singular entities (one for each update).
- Added a global vertex count within Graph Algorithms via `graph.nodeCount` which changes based upon the perspective.
- Added an equivalent function to assignID inside of the graphAlgorithm called checkID this is a helper function which allows the user to feed queries with the strings that exist in the raw data and convert these to the internal long ID.
- Changed property access to take a class tag (i.e. `getState[String]()`) instead of requiring `.asInstanceOf[T]`
- Added `getStateOrElse` to vertex visitor.

Deployment Overhaul
- Moved away from docker fully, allowing Raphtory to be compiled into a Jar and deployed on bare metal.
- Organised deployment classes into `RaphtoryGraph` for local deployment and `RaphtoryService` for distributed deployments.
- Added a `RaphtoryClient` for users to submit jobs to either Graph or Service.
- Converted the Raphtory Component (top level akka handler of spout, builder, partition, etc) into a Component Factory Object used by all deployment classes i.e. the `RaphtoryGraph` and `RaphtoryService`
- Moved creation of Spout and Builder away from Scala reflection to allow them to have multiple parameters.

Message Batching
- Batched the messages between the spout and the builders to minimise the amount of akka messages sent between them. The size of this is configurable by `RAPHTORY_BUILDER_BATCH_SIZE`.
- Added a outgoing queue map for the builder (one queue for each partition) which will be pulled when the partition is ready for more data.
- Added `RAPHTORY_BUILDER_MAX_CACHE` for the total amount of messages a builder will hold before it stops pulling data from the spout (to stop it becoming memory overloaded itself).
- Added `RAPHTORY_PARTITION_MIN_QUEUE` which is number of messages in the queue of each partition actor below which the Partition Manager will request more data from the builders.
- Batched effect sync messages between partitions which are pushed out every second.
- Batched vertex messages between Readers during analysis, which are flushed after each superstep has concluded on all vertices in the perspective.
- Changed ChannelID between the builder/partition to be Ints instead of strings and only send 1 per batch.

Message Handling
- Swapped to Twitter Chill library for kryo serialisation instead of altoo-ag. This removed the need to specify each case class manually in the conf and allows users to send their own types.
- Moved all actors onto the large message queue - allowing heartbeats etc to use the normal queue unopposed - shrinking the queue size to 100k, but increasing each message frame to 10MB.
- Bumped up heartbeat monitoring to stop akka complaining when actors are looping through a batch of messages.
- Added a new custom mailbox which tracks the amount of messages in each actors queue to get a better idea of workload. This is how the Partition Manager knows how much data each partition has to process.
- Swapped the vertex message queue to use ArrayBuffer instead of ListBuffer
- Swapped the watermarker to ask the watchdog when the cluster is ready before probing for timestamps to stop fake dead letters.

Testing:
- Added a new version of the all commands test, which is more extensible than the prior and works with the new API. This also doesn't require the `golden standard` data to be available, instead comparing hashes. The data has also been removed from the repo and is now pulled into /tmp within a users first run.
- Added a Raphtory pseudo distributed deployment class (`RaphtoryPD`) which simulates a real distributed environment for testing.
- Adding speed logging to All commands test for ingestion and queries
- Added speed logging to the Query Manager for all jobs.

CI/CD
- Added CI pipeline workflow via GitHub actions which will - Run the all commands test on push to any branch that isn’t master - Run a nightly build of develop branch and publish a tag and release on successful build - Run a build on push to master branch, bump semantic version, create a tag and release on successful build
- Added in badges to readme to show latest build run status for push and scheduled events
- Added in badges to readme to show latest tag and release versions (SemVer only as to only show published release rather than nightly builds)

Partition Overhaul
- Have abstracted the graph partition so that we can work on the storage analysis and ingestion separately. They were very intertwined before.
- Have removed the concept of a local partition/partitioned shared state as this means we can move partitions round a lot easier and support other non-object based partitions i.e. arrow.
- Turned the object based entity visitors and graph lens into interfaces so that we can easily see what a user will have access to + wrapper functions. This also further separates the storage and analysis.
- Current version of the entity storage implements these and has been renamed as `POJOGraph` to make it distinct from later implementation in frameworks such as arrow.
- Reworked all the `worker classes` (router/writer/reader) to have a unique name and not reference the machine they are on.
- Turned the actor lookup functions within the RaphtoryActor into lazy vals so that they do not reinitialise every call.
- Removed all ParTrieMaps (our last parallel data structure) as these were causing resource contention.

Partition Memory Management:
- Swapped all multiple.treemaps to ArrayBuffers and moved to using arrays in analysis were possible as these are apparently more efficient based on recent benchmarks (https://www.lihaoyi.com/post/BenchmarkingScalaCollections.html).
- Added a state deduplication step run periodically in the partitions to remove any state which is stored more than once (say in the instance of a vertex added at the same time as some of its edges).
- Removed the HistoricOrdering object as it is no longer used without the trees and one was being instantiated for each entity in the graph taking huge amounts of memory.

Spout
- Deleted the multiline file spout as not used.
- Added a mongoDB spout.
- Added a parquet spout.
- Fixed the Kafka spout to work with the overhaul from 0.3.0

Config
- Added `RAPHTORY_LEADER_ADDRESS`, `RAPHTORY_LEADER_PORT`, `RAPHTORY_BIND_ADDRESS`, `RAPHTORY_BIND_PORT` for specifying where a Raphtory service should be binding to and where to look for the leader of a deployment.
- Added `RAPHTORY_DATA_HAS_DELETIONS` flag to only run the extra steps for handling deletions which such elements exist in the data.

Misc clean up
- Removed all env variables throughout the code, notably in spouts and algorithms - these are all now taken as class arguments.
- Tided up the Raphtory components and placed previously duplicated akka code such as the mediator into the top level Raphtory Actor.
- Added a `Windows` type as a wrapper to List[Long] to better explain what is happening when you submit a query
- Shortened Raphtory job names, removing the full algorithm path.
- Deleted env-setter as no longer using raphtory docker image.
- Deleted old docker settings inside of `build.sbt` as we no longer build straight into docker.
- Removed all the old compile at runtime code as now depreciated.
- Removed old Kamon logging code which caused deadlocking issues.
- Removed the Router Manager and fully renamed the router to graph builder throughout.
- Deleted unneeded utils and and actors, including the original seed actor.
- Deleted the old Snapshot Manager
- Increased the frequency of watermarking
- Added logging info of the IP of each Partition joining a Raphtory cluster to better discern where an issue arrises from.

raphtory-akka-0.3.0
A large number of changes have occurred in dev causing it to diverge largely from master and current documentation. Before any larger changes are completed (notably snapshotting, breaking away from docker and the creation of a new analysis API) we are releasing 0.3.0. The changes for this are listed below:

**Major Changes - Raphtory Management**
- Raphtory have been upgraded to use Akka 2.6 and swapped from Netty to Artery
- All messaging is now handled through Kryo serialiser instead of the default Java serialiser
- The Watchdog SeedNode and Watermark Manager are now combined into an orchestration actor group that manages the whole cluster.
- All Raphtory Components are now managed by a Raphtory Component Connector which ensures cluster startup by reporting to the watchdog.
- Writer Workers can now martial and un-martial the state of their allocated entity storage to/from parquet.

**Major Changes - Analysis Management**
- Raphtory’s logic for handling analysis has been totally rewritten. The Analysis Manager now spawns a task that contains the full control logic for each submitted query. This task requests the Reader Workers to create a separate actor for the analysis (the AnalysisSubtaskWorker) which contains all vertex visitors. Once the analysis is completed (across all flattenings) both the Task and SubtaskWorkers can be killed, removing all analytical states and stopping the build-up of visitors/analysis properties over time.
- The above drastically simplified the VertexMultiQueue which now only needs an odd and an even mailbox instead of storing timestamp and windowsize as well.
- PubSub was removed as a communication method between Analysis Task and SubtaskWorkers in favour of direct actor messaging. Completed for performance and practical reasons (new actors spawning requires gossiping of their location which is slow and causes intermittent errors).
- VertexMessageHandler was created to track all vertex messages between SubtaskWorkers.
- ViewLens and WindowedLens compressed into one class (GraphLens)

**Major Changes - Analysis API**
- Analysers now require the user to return a map of results which can then be serialised in a variety of ways. This makes the analyser more general and removes the need to edit the code when the user wants to swap from saving to a text file to saving to mongo etc.
- Raphtory queries may now be submitted with a serialiser class which contains the logic to save the results in the users desired format.
- Raphtory Serialisers handle both windowed and unwindowed flattenings, therefore, ProcessResults and ProcessWindowResults have been replaced with extractResults, removing A LOT of duplicated code.
- The old serialiser type which extended Analyser is now removed as redundant.
- Analysers are now typed, specifying what each subtask worker is returning and, therefore, what the analysis task is aggregating. This removes the need for unpleasant casting inside of extractResults.
- The Query API has had the explicit window type arguments removed (true, false, batched). A user may now simply submit a window set (which can include one window) and raphtory will handle it internally.
- Vertex Visitor and Edge Visitor were renamed to Vertex and Edge for user clarity.
- Double args array submission is now no longer possible with the RaphtoryGraph removing confusion.
- Example algorithms have been updated with the new API
- Example Algorithms have been given named param alternatives to the args array.

**Test Changes**
- All Commands Test changed to use set generated file as all scala versions seem to do something different in utils.Random (see testUpdates.txt in dev/allcommands)
- Datablast Analyser added which throws large arrays of data from all subtask workers to see how well Raphtory handles it.
- All commands test converted to unit tests which are fully automated to make comparisons between versions a lot simpler.

**Minor Changes**
- build.sbt has been cleaned up and organised
- A large amount of package refactoring ahead of breaking the project into core + modules
- The router has been officially renamed GraphBuilder
- Initial SnapshotManager included, but currently stub
- Initial GraphAlgorithm included, but currently a stub
- AnalysisUtils created for misc runtime compilation code.
- Swapped from 32 bit murmur3 hash to 64 bit xxHash to remove chances of collisions.

raphtory-akka-0.2.0

0.3.2

- [x] Publish to crates.io
- [x] Publish to PyPi
- [x] Make Tag
- [x] Release to Github
- Auto-generated by [create-pull-request] triggered by release action [1]
[1]: https://github.com/peter-evans/create-pull-request

0.3.1

0.3.0

Thanks to a whole bunch of feedback on our Python alpha, over the last 6 months Raphtory has been through its second full make over with a total rewrite of the core engine in Rust! There are WAY too many changes to list here, but below you can see a couple of highlights.

If you would like to give this new version a go, you can check out the [docs](https://docs.raphtory.com/en/v0.3.1/) as well as our [examples](https://github.com/Pometry/Raphtory/tree/master/examples) - available in both Python and Rust. Good places to start are the Jupyter notebooks for the [Reddit](https://github.com/Pometry/Raphtory/blob/master/examples/py/reddit/demo.ipynb) snap dataset and [Enron](https://github.com/Pometry/Raphtory/blob/master/examples/py/enron/enron.ipynb) emails.

Highlights 🏆

Revamped API
The first thing you will notice about the new Raphtory is that the API is A LOT more friendly. The example notebooks above show this off nicely, but as a snippet this is how you could grab Gandalf from our Lord of the Rings Graph and plot his degree every 1000 time increments:

python
timestamps = []
degree = []
for gandalf in g.vertex("Gandalf").rolling(window=1000):
timestamps.append(gandalf.latest_time())
degree.append(gandalf.degree())

sns.set_context()
ax = plt.gca()
plt.xticks(rotation=45)
ax.set_xlabel("Time")
ax.set_ylabel("Interactions")
sns.lineplot(x = timestamps, y = degree,ax=ax)

<img width="580" alt="image" src="https://github.com/Pometry/Raphtory/assets/6665739/974d06a4-a432-4654-8ace-984ec73ff582">

Performance 📈
- The memory usage of Rust is over an order of magnitude less than Scala, with one test graph (~11 million nodes, ~33 million edges) previously requiring ~120GB of Ram now only needing 7GB! This means that datasets which required spinning up a EC2 instance or cluster can now easily be run on your laptop 💪🏻
- Similarly, complex algorithms like `Temporal Triadic Motifs` run in seconds instead of hours - even when running locally instead of on a beefy box 🍖
- Querying for specific vertices/edges runs 1000x faster and can be done directly on the graph via the new APIs instead of having to write a full algorithm.

Python 🐍
- Raphtory can now be easily installed with all dependencies via `pip install raphtory`.
- Via [PyO3](pyo3.rs) Raphtory can now be run and called directly in python without secretly running a Java Runtime Environment in the background 🥷. This also means it now works perfectly with all other python libraries!
- The vast majority of the heavy lifting is done in Rust, meaning you get all the ease of Python, with very little slow down!

Whats next ⏭️

We are currently tinkering away with several parts of the new APIs, adding more helper functions, and expanding the algorithmic suite. However, alongside this we have some larger pieces of work in the pipeline:

- We are building a GraphQL extension to Raphtory, codename [flux-capacitor](https://github.com/Pometry/flux-capacitor), alongside the functionality for compiling Raphtory to [WASM](https://webassembly.org) - making it possible to interact with Raphtory through Javascript/web UIs.
- We are working on out-of-core analysis, allowing Raphtory to work on Graphs many times the size of the memory of the machine it is running on.
- Finally, we are running a suite of standard benchmarks on this new version to give some concrete comparable numbers to other Graph systems.

If you have an other suggestions please do let us know via Github [issues](https://github.com/Pometry/Raphtory/issues), [discussions](https://github.com/Pometry/Raphtory/discussions) or on [Slack](https://join.slack.com/t/raphtory/shared_invite/zt-xbebws9j-VgPIFRleJFJBwmpf81tvxA)!

**Full Changelog**: https://github.com/Pometry/Raphtory/compare/v0.2.0...v0.3.1

Raphtory

Page 6 of 7

0.4.0

0.3.2

0.3.1

0.3.0

0.2.2

0.2.1

Page 6 of 7

Links

Releases