clojure-uk

A place for people in the UK, near the UK, visiting the UK, planning to visit the UK or just vaguely interested to randomly chat about things (often vi and emacs, occasionally clojure). More general the #ldnclj
alexlynham 2020-11-10T07:14:32.415400Z

morning

alexlynham 2020-11-10T07:14:56.415500Z

think i was the opposite, quite liked ruby, came to resent rails

alexlynham 2020-11-10T07:15:21.415700Z

either way i just mean in terms of 'becomes the one way to do things, then becomes obsolete'

alexlynham 2020-11-10T07:15:37.415900Z

but i spose that happens to everything

dharrigan 2020-11-10T08:02:28.416200Z

Good Morning!

2020-11-10T08:41:31.416400Z

Morning

Russ Anderson 2020-11-10T08:48:51.416600Z

'Mornin all

agile_geek 2020-11-10T08:53:59.416800Z

Bore da :welsh_flag:

2020-11-10T09:37:58.417200Z

…a bore da hefyd! :flag-wales:

mccraigmccraig 2020-11-10T09:43:43.417600Z

måning

thomas 2020-11-10T10:12:46.417900Z

mogge

maleghast 2020-11-10T11:47:18.418400Z

Hello everyone 🙂

rlj 2020-11-10T11:59:00.418900Z

Mornin

mccraigmccraig 2020-11-10T17:11:03.419500Z

anyone had a good or bad experience with aws athena ?

dominicm 2020-11-11T08:20:18.423500Z

We had a few queries that just broke it (nullpointer exception or something). And then we had to wait for aws support to tell us what was broken so we could stop doing that... But we kinda needed to do that.

mccraigmccraig 2020-11-11T08:52:31.423800Z

@joetague was it getting expensive with plain csv or json, or with parquet ?

mccraigmccraig 2020-11-11T08:55:59.424300Z

i guess i'll try it out and see... i've got a kafka topic with telemetry data - it looks easy enough to dump that to parquet on s3 with kafka-connect, and if that turns out to lead to criminally expensive queries then i'll dump it to CSV and load into redshift

joetague 2020-11-11T15:33:08.426900Z

Just had a peek in the S3 bucket, we were coping GA/BigQuery data from GCS -> S3 as json and left it in Standard-IA class

joetague 2020-11-11T15:34:36.427100Z

Guesstimate/ballpark most of the files were about 600-800mb in size, they weren't well partitioned so we ended up having to load in a few GB of data a day

2020-11-10T17:21:49.419600Z

both: overall it’s a super useful and easy to use service. Occasionally it has latency issues

2020-11-10T17:23:56.419800Z

as in: queries stay in starting state and AFAIK there is little you can do with it. Happened to me just once

mccraigmccraig 2020-11-10T17:26:56.420Z

cool, thanks

mccraigmccraig 2020-11-10T17:27:07.420200Z

did you convert your data to parquet before dumping to S3 ?

2020-11-10T17:35:56.420400Z

yes

2020-11-10T17:36:24.420600Z

for other reasons than performance as well, eg handling of multiline strings

2020-11-10T17:37:54.420800Z

if you need it just for performance reasons and csv/json serde works fine for you, there is an option to do the conversion within athena as well

joetague 2020-11-10T20:00:13.421500Z

+1 to all the points above.

joetague 2020-11-10T20:01:30.422300Z

For our usage it started get expensive as well