When using plugins such as onyx-http or onyx-kafka, the segment is expected to have certain values in it, like :message ...
when using the kafka output. We have been creating additional tasks to come before the plugin tasks to transform the segment to the expected structure for the plugin. Is that the expected convention or is there a better way?
I almost expect to be able to do segment transformation as part of a lifecycle before the output task execution to transform the segment into the expected structure, but I don’t think lifecycles can manipulate segment data if I understand correctly.
If you use an output plugin that expects the segments like that you can always wrap them via an :onyx/fn
on the output task that has the plugin
It’s really up to you whether you wrap them in the task before or on the final task
Lifecycles can manipulate segment data but it gets into the internals more so onyx/fn is more important.
So having :onyx/fn
on the output task defined will execute that function on the segment before the rest of the output task is called?
Yes
That’s great. I don’t think I noticed that explained anywhere in the User Guide. That’s much better than what we’ve been doing so far.
Yeah, I can see how you could miss it. I just updated the description in http://www.onyxplatform.org/docs/cheat-sheet/latest/
If you see somewhere in the user guide you could add it, I would love to merge a PR about it.
I’ll certainly think that over and open a pull request for that, thanks.
The batch processing phases are described here: http://www.onyxplatform.org/docs/cheat-sheet/latest/#task-states/:process-batch
I guess there’s a lot of information in the cheat sheet that isn’t necessarily explicitly described as part of the user guide. I think I made the mistake of assuming the cheat sheet was going to be a subset of information in the guide.
They’re far more complimentary than I thought.
Yeah, I need to change the name from cheat sheet
It’s really a the number one documentation source at this point
This is what generates our validation, error messages, and the cheat sheet now https://github.com/onyx-platform/onyx/blob/0.12.x/src/onyx/information_model.cljc
Hi! Im running onyx saving checkpoint on S3 and today i got these exception, any tip?
curl <http://localhost:8081/job/exception?job-id=c0455d65-1780-56f0-26fe-f622e2f65d70>
{:status :success, :result #error {
:cause "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 67A4CAB11516672E)"
:data {:original-exception :com.amazonaws.services.s3.model.AmazonS3Exception}
:via
[{:type clojure.lang.ExceptionInfo
:message "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 67A4CAB11516672E)"
:data {:original-exception :com.amazonaws.services.s3.model.AmazonS3Exception}
:at [com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "AmazonHttpClient.java" 1545]}]
:trace
[[com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "AmazonHttpClient.java" 1545]
[com.amazonaws.http.AmazonHttpClient$RequestExecutor executeOneRequest "AmazonHttpClient.java" 1183]
[com.amazonaws.http.AmazonHttpClient$RequestExecutor executeHelper "AmazonHttpClient.java" 964]
[com.amazonaws.http.AmazonHttpClient$RequestExecutor doExecute "AmazonHttpClient.java" 676]
[com.amazonaws.http.AmazonHttpClient$RequestExecutor executeWithTimer "AmazonHttpClient.java" 650]
[com.amazonaws.http.AmazonHttpClient$RequestExecutor execute "AmazonHttpClient.java" 633]
[com.amazonaws.http.AmazonHttpClient$RequestExecutor access$300 "AmazonHttpClient.java" 601]
[com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl execute "AmazonHttpClient.java" 583]
[com.amazonaws.http.AmazonHttpClient execute "AmazonHttpClient.java" 447]
[com.amazonaws.services.s3.AmazonS3Client invoke "AmazonS3Client.java" 4031]
[com.amazonaws.services.s3.AmazonS3Client putObject "AmazonS3Client.java" 1585]
[com.amazonaws.services.s3.transfer.internal.UploadCallable uploadInOneChunk "UploadCallable.java" 131]
[com.amazonaws.services.s3.transfer.internal.UploadCallable call "UploadCallable.java" 123]
[com.amazonaws.services.s3.transfer.internal.UploadMonitor call "UploadMonitor.java" 139]
[com.amazonaws.services.s3.transfer.internal.UploadMonitor call "UploadMonitor.java" 47]
[java.util.concurrent.FutureTask run "FutureTask.java" 266]
[java.util.concurrent.ThreadPoolExecutor runWorker "ThreadPoolExecutor.java" 1149]
[java.util.concurrent.ThreadPoolExecutor$Worker run "ThreadPoolExecutor.java" 624]
congrats on 0.12!
I take it as if reduce is meant to replace / improve the current way of doing windows, so you don't have to both emit downstream and trigger at the same time ?
@lmergen Correct 🙂
@lellis Hm, not sure at a first glance
Ill dig in a little later today and get an answer for you.
Ty! @michaeldrogalis
@lellis Are you seeing that with only one job?
Im wondering if you're S3 endpoint is misconfigured? Just a guess though
I have only one datomic-input type job.
Has that endpoint ever worked for you? We use that endpoint regularly
working fine, and still after resubmit job
That's really strange.
I read something about wrong content-length, so S3 waiting for more data and throw timeout because there's no more data. But its just a superficial looking to these exception.
I have checkpoint working in all my 3 env's.
@lellis do you have any idea how big the checkpoints are? Which version of Onyx?
Hi @lucasbradstreet, onyx "0.10.0" and i have no ideia how big are, how can i check this?
@lellis if you use onyx-peer-http-query you can query /metrics
and view checkpoint_size_Value
We recently changed checkpoint recovery to load the checkpoint more asynchronously, which will mean that it no longer times out. You may have a better experience with 0.12