@dbernal you can, but you need two properties. 1. You need to be able to read in order from storage, otherwise things might get back out of order when a peer fails and you recover. 2. You need some data on the message that will let you use onyx/group-by-key in a consistent way. Group by key ensures order by making it so data always ends up on certain peers in a consistent way. For kafka you could use the kafka partition to do your group by
We could probably implicitly do it by auto partitioning by the input peer’s slot-id
That’d make it so the input peer’s data always gets sent to the same peers all the way downstream
@lucasbradstreet just a few follow up questions. How does the system ensure that peers process in-order? If for a workflow like A->B->C and B contains 3 peers. How do those peers send in order to downstream task C? I haven't fully grasped how sending consistently to all the peers ensures that order is maintained from the source input.
@dbernal it’s essentially a ring, where each virtual peer knows its “follower”. a virtual peer can only be assigned a single task, so this is fairly simple to orchestrate upon job creation
you are also able to tweak the precise coordination of which virtual peer gets assigned what task by using different job schedulers http://www.onyxplatform.org/docs/user-guide/0.12.x/#scheduling
for example, you might want to favor to colocate different tasks on virtual peers that share the same host
Hi everyone! Distributed Masonry has been working on something new for a while, and I'm happy to announce it here first. Today we're unveiling Pyrostore, a new streaming storage product that complements Kafka with inexpensive, virtually limitless storage. http://pyrostore.io/blog/2018/05/10/kafka-potential-past-present.html
We're super excited about how Pyrostore changes the landscape of streaming. Thanks for being part of the community. We build awesome products like this because we have a big group of great people to learn from.
!!
We could use some upvotes on news.yc when we post it shortly, if you all don’t mind 🙂
sure :)
congrats on the launch!
Thanks! This has been the biggest thing @lucasbradstreet, myself, and the rest of the team has worked on since Onyx!
Awesome!
congrats!
@lmergen thanks for the info
We’ve posted it to news.yc. Would appreciate upvotes via https://news.ycombinator.com/newest
you can search for pyrostore
@dbernal finally a little time to respond. Crazy day! Anyway, Onyx maintains ordering within a peer, and within a peer to another peer. So say you have messages going from task A, peer1, to task B peer2, all of the messages between them will be processed by peer2 in the order they were sent by peer1. So this works nicely, except if you start to send from task A, peer1 to task B, some segments to peer 1 and some segments to peer 2. That’s why you need something like onyx/group-by to essentially always route messages with a partition key from one peer to the next peer. Otherwise we can’t ensure that they will be re-interleaved correctly downstream.
Pyrostore sounds pretty cool