I’m learning Onyx and it looks like the only way to get windowing to work properly is to let it write checkpoints to S3 is this correct?
Yes, though if your windows are small and you’re using a test/ephemeral ZK server, you can set this setting: http://www.onyxplatform.org/docs/cheat-sheet/latest/#peer-config/:onyx.peer/storage.zk.insanely-allow-windowing-QMARK
but note that it will fail when you write big windows out.
Is that documented anywhere? I may have missed it but that’s a pretty big dependency and unfortunately I can’t use S3 (I doubt I can even get IT to open the firewall for it). How hard would it be to write another storage backend?
Others have done it for Google cloud’s S3 equivalent, and it worked out ok. If you’re going to run it on prem, building a HDFS plugin would probably be the best bet.
You basically have to implement this: https://github.com/onyx-platform/onyx/blob/0.12.x/src/onyx/storage/s3.clj
Google cloud storage was pretty much a straight port: https://github.com/tenaciousjzh/onyx/blob/0.11.x/src/onyx/storage/gcs.clj
Thanks!
Yeah, our Google cloud storage port has worked out so far. Hopefully we can get that in soon
🙂. Anything that might be useful on our end to separate it out into a plugin? I guess the main thing would be the information model and any schema checks?
Then we could just separate out the gcs.clj file into its own project, with the GCS dependencies.
I think currently we just added the stuff to onyx core where the S3 schema's and info models are. I'm sure we can pull it out
Yeah, I saw. It looks like it should be easy to pull out. I mostly want to keep the schema checks and maybe add a nice error if the dep hasn’t been included.