I remember having to do this in a project in the past, but am not sure where now; basically though if you compare the shape of your json file (with jq or something similar) to the geojson file used in one of the vega-lite/vega example visualizations, you may see that it has some obviously different shape. Like, the actual geojson part of the data might be nested under some parent entity in a different way. So just as with data where the attributes you want to specify are nested, you have a to modify your field specifications to match.
Anyone looked at TileDB? https://twitter.com/tiledb/status/1257721692476116992
They supposedly have a Java API. Would be interesting to see how we ll this meshes with our Clojure abstractions.
It is pretty low level which I like for sure but it will take a lot of learning to get anything going: https://docs.tiledb.com/main/quickstart That is their quickstart...Reminds me of TVM in that sense. Harder learning curve up front plus I would have to push some level of context management to users in a lot of cases. I would invest in this before I would invest in Arrow, personally. But I wouldn't invest in either until there was a really solid reason. We are moving towards jdbc-next for http://tech.ml.dataset as the first sort of external integration. Great find, thanks for bringing it up. Will keep it on the radar.
Word; Thanks for your feedback/perspective on it! I'm not as solid on the lower level details, but aside from that I have a pretty similar reaction re: why not postgres (or whatever)? I can imagine they may have optimized for certain use cases, but I'm sure there's a cost to that as well. Definitely worth keeping an eye on.
Thanks, that makes sense. I will calve out some time to experiment.