data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
Anthony Khong 2020-12-28T05:41:36.131200Z

Hi, David, author of Geni here. It uses Spark Datasets (see https://www.baeldung.com/java-spark-dataframe-dataset-rdd for a discussion). So it’s a typed view of DataFrames. However, the type information only comes in when you load the schema, so that you’ll get the type errors in run time.

2020-12-28T08:19:40.133600Z

Does it have an impact when you use datasets? Do you feel the burden of types in comparison to handling a collection of open Clojure maps?

2020-12-28T08:19:55.134200Z

Thanks for your answer and the library!

Anthony Khong 2020-12-28T08:52:58.134400Z

> Do you feel the burden of types in comparison to handling a collection of open Clojure maps? Not really, to me, it still feels like a dynamic language (or library in this case), because it all happens during runtime. But, just like Clojure, it’s strongly typed, so that you get type errors during run time. Also, I wouldn’t compare it to handling Clojure maps. Geni is for a different use case.. If your data is small enough, using collection of maps is probably better, because the reader of your code doesn’t have to learn Spark. But once you’re dealing with millions or billions of rows, you’d want to use Spark or similar libraries.

2020-12-28T22:22:45.134800Z

Thanks a lot for your explanations!

1👍