onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
2018-01-04T00:11:08.000110Z

Hi All

2018-01-04T00:11:53.000142Z

Sorry to be jumping straight to a question, I need some advice about using onyx. I will drop it here hopefully someone can help me out.

2018-01-04T00:13:33.000190Z

I want to poll multiple rest APIs every second, process the results and send them to a Kafka topic. I know I can submit a task with a job but is there a more robust way of generating jobs? Or is this what I one should be doing (using a timer to submit jobs)

michaeldrogalis 2018-01-04T00:39:31.000263Z

@mehdi.avdi I'd use a single job with a plugin that repeatedly polls on your timer.

michaeldrogalis 2018-01-04T00:40:08.000073Z

The nature of the input is a little awkward in general -- hence why I suggest writing your own input plugin. After that everything should work smoothly with a single job routing data to Kafka.

2018-01-04T00:42:38.000132Z

A plugin that repeatedly sends segments downstream on a timer config?

2018-01-04T00:44:08.000034Z

Segments practically just containing a timestamp and then take it from there

michaeldrogalis 2018-01-04T00:45:04.000063Z

@mehdi.avdi I'd have the plugin poll the rest API, yeah. You can specify how frequently the plugin should poll for input and use that to control the rate.

michaeldrogalis 2018-01-04T00:45:20.000127Z

If you write a solid input plugin, the rest of the system can turn into a nice, unidirectional streaming design.

2018-01-04T00:48:08.000005Z

Thank you, I will have some reading to do but this is very helpful.

michaeldrogalis 2018-01-04T00:49:10.000048Z

@mehdi.avdi No problem, feel free to ask more questions here post-reading.

2018-01-04T00:49:18.000167Z

Cheers

pfeodrippe 2018-01-04T02:38:04.000045Z

About the Jason talk at Clojure eXchange 2017 "How I Bled All Over Onyx" (starting at 11 min) (https://skillsmatter.com/skillscasts/10939-how-i-bled-all-over-onyx), was the app he was doing really not at the Onyx scope? Or there is some kind of misconception by him of what part of the configuration should be tuned? I'm studying to use Onyx for our IoT sensors data processing and be scalable enough... I just think Kafka stream is very verbose but I like the not-a-separate-cluster-needed approach of it

michaeldrogalis 2018-01-04T02:53:49.000064Z

@pfeodrippe Processing huge messages with Onyx isn't a great idea. There's nothing inherent to Onyx's design that makes it weak under high loads. There are some knobs to tune as volume grows, but it's design is solid high perf/low latency applications like IoT processing.

michaeldrogalis 2018-01-04T02:54:32.000062Z

Having highly volatile message sizes is bad news regardless of what technology you're doing streaming with.

2018-01-04T06:25:39.000061Z

dumb question: I'm going thru the learn-onyx exercises (https://github.com/onyx-platform/learn-onyx/blob/master/src/workshop/challenge_4_1.clj) and in the lifecycle i'm writing, i'm encountering a whole bunch of event maps without

:onyx.core/batch
key, even though i specified it as :lifecycle/after-batch. Is this expected, or is something wrong with my code? Relevant code:
(defn store-max [event lifecycle]
  "Work in progress..."
  (println "Event:" (keys event))
  {})

(def store-max-lifecycle
  {:lifecycle/after-batch store-max})

(defn build-lifecycles []
  [{:lifecycle/task :identity
    :lifecycle/calls ::store-max-lifecycle
    :onyx/doc "Stores the highest encountered value in state atom"}

;; I/O boilerplate from previous exercises...
Thanks for your help!

jasonbell 2018-01-04T08:48:36.000097Z

@pfeodrippe “was the app he was doing really not at the Onyx scope” Was it out of scope, not really, it could have been done. Was the initial design wrong, yes, in terms of how Onyx works. Onyx would be perfect for IoT as @michaeldrogalis has already pointed out. Message volatility will never help matters. Kafka streams is verbose and compact but if the design is wrong you’ll still hit issues. Hindsight is a wonderful thing though…..

ninja 2018-01-04T09:54:33.000032Z

Hi, folks! I have been working with onyx for a while now and my workflow grew quite large. At this point the dashboard's static illustration of the workflow is insufficient. It would be much appreciated to have something like a recorder that tracks the path of a segment through the workflow graph to enable some kind of introspection after a job finished without the need of putting log-functions at several locations in the code. I thought of something that does some kind of workflow and catalog rewriting to create intermediate nodes that can store whole segments as well as information regarding their source and target node within the original graph. This information could then be queried and visualized (either as a simple table, an annotated graph etc.) So here is the question: Is there already a tool/library that does something similar? Or is there a common practice that should be applied instead to achieve the same? Note, that this should be for dev-purposes only.

👍 1
❓ 1
pfeodrippe 2018-01-04T13:03:26.000079Z

@mablack @jasonbell Understood, very much thank you, guys 😃

👍 1
jasonbell 2018-01-04T13:21:43.000275Z

@pfeodrippe The main crux of the talk is that Jase tried to do this mad customer request with Onyx. To be honest I learned a lot more about the Onyx internals that I bargained for and both @michaeldrogalis and @lucasbradstreet were gracious with their time and patience as I went through the scenarios. It also taught us about the Docker/DCOS things and Kafka doesn’t always behave how you want it to, schedulers detatch and resassign randomly never a good thing for assigned brokers etc. The talk was really the journey, the coal face and the eventual solution. I’ve still an Onyx talk under consideration for Strata London in May but I’m fairly sure that sadly won’t happen as I’m confirmed to talk on self training AI with Kafka and DeepLearning4J.

😃 1
jasonbell 2018-01-04T13:22:00.000114Z

And as you might have noticed @pfeodrippe I don’t take my talks too seriously 🙂

jasonbell 2018-01-04T13:22:13.000367Z

Thanks for asking a thoughtful question, very much appreciated.

pfeodrippe 2018-01-04T13:31:29.000141Z

@jasonbell Hope you give a talk here at Brazil some day heheaheehu I'll be there. Thanks o/

jasonbell 2018-01-04T13:34:06.000143Z

👍 Never say never 🙂

michaeldrogalis 2018-01-04T16:12:12.000535Z

@quan.ngoc.nguyen Ahh - this is our bad. The latest release of Onyx removed that key from the event map. I'm not sure how our build of learn-onyx didn't red flag that commit. Call keys on the event map to see the additions.

michaeldrogalis 2018-01-04T16:12:23.000042Z

Ill make a note to circle back later today and update it. Thanks for the report.

💯 1
michaeldrogalis 2018-01-04T16:14:23.000014Z

@atwrdik I don't know of anything out there at the moment for Onyx specifically, but I always wanted to see what it would be like to integrate with Zipkin. Same idea as what Erlang supports natively -- using tracer IDs to correlate distributed message paths.

👍 2
lucasbradstreet 2018-01-04T23:02:43.000100Z

@michaeldrogalis that’s a bit strange. We didn’t remove onyx.core/batch

michaeldrogalis 2018-01-04T23:03:12.000176Z

Didn't we? I'm misremembering what we removed. onyx.core/results, then?

lucasbradstreet 2018-01-04T23:03:16.000022Z

Yes

lucasbradstreet 2018-01-04T23:03:38.000113Z

Turned that into onyx.core/write-batch

michaeldrogalis 2018-01-04T23:03:58.000095Z

Weird. @quan.ngoc.nguyen Can you invoke (keys) on those maps and check out what's there? Perhaps you're using the wrong param?

2018-01-04T23:09:51.000103Z

@michaeldrogalis I'm not at home right now but I can tell u that I remember seeing several event maps with :onyx.core/batch key in them (and when printed out they gave the expected 100 numbers). However after processing all the segments, the lifecycle got called a bunch more times. I think the event maps in this case were the same as the previous ones , except they were just missing :onyx.core/batch. I can attach my code tonight, maybe that will help?

michaeldrogalis 2018-01-04T23:10:44.000311Z

@quan.ngoc.nguyen Oh -- Im not certain that that key is supplied if no segments were read, I'd need to check on that

2018-01-04T23:13:46.000022Z

@michaeldrogalis if my understanding is correct, the after-batch lifecycle should only be called if the task receives segments right? In this case it seems like it gets called a bunch of extra times for some reason (might just be something wrong with my code so I'll take another look tonight)

michaeldrogalis 2018-01-04T23:16:37.000183Z

No, it'll still be called regardless.

michaeldrogalis 2018-01-04T23:16:52.000238Z

The batch just might be empty, or the key may be absent.