Interesting conversation, Confluent do a Kafka Connect plugin for topic -> S3 direct but I'm not 100% sure of the configuration options. I've only used the basic file sinks for a talk I'm doing next week.
Previously I had Clojure based kafka stream apps to persist to S3 and also, if the message sizes were huge, Onyx to do the work instead. Got to say that Pyrostore looks very interesting but I've not had time to review it yet. cc @lmergen @eoliphant
I would say that doing a direct connection from Kafka to S3 perfectly fine. You typically don’t want any post-processing anyway, so it makes sense.
Is there something special required to configure Aeron when running in a docker container?
I have to run, so I can’t answer this in depth, but you have to mount a memory volume that you share between the onyx and aeron containers, and you need to open the port it uses on the aeron container
Just found the docker stuff in onyx-template. Hopefully that gets me pointed in the right direction.
Trying to run with :onyx.messaging.aeron/embedded-driver? true
in an effort to minimize the variables while I test some other aspects of deployment. Everything starts up fine, but eventually shuts down, looks like no heartbeats received from tasks. Not sure how to configure messaging in this case. Or maybe I shouldn't be taking this approach...
That’s an ok approach thought not always recommended long term. It’s probably a problem either communicating it’s external IP address to the other peers, or you don’t have the right ports open
The main reason it’s not recommended is that having them both in the same process increases the chances of GCs taking out smooth operation of the media driver. Other than that it’s pretty fine
I was trying to run all peers is one container, just to test connectivity. Setting :onyx.messaging/bind-addr
to localhost results in the heartbeat issue, while using the docker gateway throws " Channel error: Cannot assign requested address : aeron:udp?endpoint=host.docker.internal:40200|term-length=4194304". I'm missing some fiddly docker detail here, whole thing runs fine from the command line.
You need messaging external addr to be set too
To distinguish between the two
Bind addr prob has to be the ip of the interface
We grab it with a script I think
Got it working with --net="host"
using 127.0.0.1 as the address. I didn't have to set the external address. Anyway, on to AWS...
Right that works too if that’s an option