onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
niamu 2017-12-06T00:43:17.000105Z

When using plugins such as onyx-http or onyx-kafka, the segment is expected to have certain values in it, like :message ... when using the kafka output. We have been creating additional tasks to come before the plugin tasks to transform the segment to the expected structure for the plugin. Is that the expected convention or is there a better way?

niamu 2017-12-06T00:45:59.000081Z

I almost expect to be able to do segment transformation as part of a lifecycle before the output task execution to transform the segment into the expected structure, but I don’t think lifecycles can manipulate segment data if I understand correctly.

lucasbradstreet 2017-12-06T00:48:19.000173Z

If you use an output plugin that expects the segments like that you can always wrap them via an :onyx/fn on the output task that has the plugin

lucasbradstreet 2017-12-06T00:48:32.000115Z

It’s really up to you whether you wrap them in the task before or on the final task

lucasbradstreet 2017-12-06T00:48:52.000234Z

Lifecycles can manipulate segment data but it gets into the internals more so onyx/fn is more important.

niamu 2017-12-06T00:50:47.000293Z

So having :onyx/fn on the output task defined will execute that function on the segment before the rest of the output task is called?

lucasbradstreet 2017-12-06T00:51:49.000159Z

Yes

niamu 2017-12-06T00:53:03.000190Z

That’s great. I don’t think I noticed that explained anywhere in the User Guide. That’s much better than what we’ve been doing so far.

lucasbradstreet 2017-12-06T00:53:34.000100Z

Yeah, I can see how you could miss it. I just updated the description in http://www.onyxplatform.org/docs/cheat-sheet/latest/

lucasbradstreet 2017-12-06T00:53:47.000021Z

If you see somewhere in the user guide you could add it, I would love to merge a PR about it.

niamu 2017-12-06T00:54:55.000177Z

I’ll certainly think that over and open a pull request for that, thanks.

lucasbradstreet 2017-12-06T00:56:58.000209Z

The batch processing phases are described here: http://www.onyxplatform.org/docs/cheat-sheet/latest/#task-states/:process-batch

niamu 2017-12-06T01:00:16.000179Z

I guess there’s a lot of information in the cheat sheet that isn’t necessarily explicitly described as part of the user guide. I think I made the mistake of assuming the cheat sheet was going to be a subset of information in the guide.

niamu 2017-12-06T01:00:46.000255Z

They’re far more complimentary than I thought.

lucasbradstreet 2017-12-06T01:04:15.000055Z

Yeah, I need to change the name from cheat sheet

lucasbradstreet 2017-12-06T01:04:22.000293Z

It’s really a the number one documentation source at this point

lucasbradstreet 2017-12-06T01:04:51.000111Z

This is what generates our validation, error messages, and the cheat sheet now https://github.com/onyx-platform/onyx/blob/0.12.x/src/onyx/information_model.cljc

lellis 2017-12-06T11:37:24.000099Z

Hi! Im running onyx saving checkpoint on S3 and today i got these exception, any tip?

curl <http://localhost:8081/job/exception?job-id=c0455d65-1780-56f0-26fe-f622e2f65d70>
{:status :success, :result #error {
 :cause "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 67A4CAB11516672E)"
 :data {:original-exception :com.amazonaws.services.s3.model.AmazonS3Exception}
 :via
 [{:type clojure.lang.ExceptionInfo
   :message "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 67A4CAB11516672E)"
   :data {:original-exception :com.amazonaws.services.s3.model.AmazonS3Exception}
   :at [com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "AmazonHttpClient.java" 1545]}]
 :trace
 [[com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "AmazonHttpClient.java" 1545]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor executeOneRequest "AmazonHttpClient.java" 1183]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor executeHelper "AmazonHttpClient.java" 964]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor doExecute "AmazonHttpClient.java" 676]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor executeWithTimer "AmazonHttpClient.java" 650]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor execute "AmazonHttpClient.java" 633]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor access$300 "AmazonHttpClient.java" 601]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl execute "AmazonHttpClient.java" 583]
  [com.amazonaws.http.AmazonHttpClient execute "AmazonHttpClient.java" 447]
  [com.amazonaws.services.s3.AmazonS3Client invoke "AmazonS3Client.java" 4031]
  [com.amazonaws.services.s3.AmazonS3Client putObject "AmazonS3Client.java" 1585]
  [com.amazonaws.services.s3.transfer.internal.UploadCallable uploadInOneChunk "UploadCallable.java" 131]
  [com.amazonaws.services.s3.transfer.internal.UploadCallable call "UploadCallable.java" 123]
  [com.amazonaws.services.s3.transfer.internal.UploadMonitor call "UploadMonitor.java" 139]
  [com.amazonaws.services.s3.transfer.internal.UploadMonitor call "UploadMonitor.java" 47]
  [java.util.concurrent.FutureTask run "FutureTask.java" 266]
  [java.util.concurrent.ThreadPoolExecutor runWorker "ThreadPoolExecutor.java" 1149]
  [java.util.concurrent.ThreadPoolExecutor$Worker run "ThreadPoolExecutor.java" 624]

2017-12-06T12:25:47.000133Z

congrats on 0.12!

2017-12-06T12:27:26.000376Z

I take it as if reduce is meant to replace / improve the current way of doing windows, so you don't have to both emit downstream and trigger at the same time ?

michaeldrogalis 2017-12-06T16:09:25.000235Z

@lmergen Correct 🙂

michaeldrogalis 2017-12-06T16:10:47.000663Z

@lellis Hm, not sure at a first glance

michaeldrogalis 2017-12-06T16:10:58.000733Z

Ill dig in a little later today and get an answer for you.

lellis 2017-12-06T16:11:09.000274Z

Ty! @michaeldrogalis

michaeldrogalis 2017-12-06T16:15:49.000548Z

@lellis Are you seeing that with only one job?

michaeldrogalis 2017-12-06T16:16:00.000756Z

Im wondering if you're S3 endpoint is misconfigured? Just a guess though

lellis 2017-12-06T16:16:17.000291Z

I have only one datomic-input type job.

michaeldrogalis 2017-12-06T16:16:38.000049Z

Has that endpoint ever worked for you? We use that endpoint regularly

lellis 2017-12-06T16:16:38.000819Z

working fine, and still after resubmit job

michaeldrogalis 2017-12-06T16:16:56.000097Z

That's really strange.

lellis 2017-12-06T16:21:11.000375Z

I read something about wrong content-length, so S3 waiting for more data and throw timeout because there's no more data. But its just a superficial looking to these exception.

lellis 2017-12-06T16:22:09.000396Z

I have checkpoint working in all my 3 env's.

lucasbradstreet 2017-12-06T17:19:35.000600Z

@lellis do you have any idea how big the checkpoints are? Which version of Onyx?

lellis 2017-12-06T17:28:45.000797Z

Hi @lucasbradstreet, onyx "0.10.0" and i have no ideia how big are, how can i check this?

lucasbradstreet 2017-12-06T18:57:11.000691Z

@lellis if you use onyx-peer-http-query you can query /metrics and view checkpoint_size_Value

lucasbradstreet 2017-12-06T18:57:50.000319Z

We recently changed checkpoint recovery to load the checkpoint more asynchronously, which will mean that it no longer times out. You may have a better experience with 0.12

☀️ 2