data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
2021-03-27T12:30:03.014600Z

In our scenario, we guess the contents of an excel file and present it in a UI as suggestion for later parsing. I have solved the Excel date thing, by reading the file with ->dataset as ususal. Then I take samples of columns from the dataset with type float64 to make an educated guess if it is a reasonable date, by using the Apache POI DateUtil/getLocalDateTime function. I collect the column names that are candidates for excel dates and build a parser-fn map and run the ->dataset again, with the parser functions. There's an extra roundtrip, but could work 😄 Let's see what my colleagues think of it 😅

chrisn 2021-03-27T12:47:45.015Z

I like that approach honestly. It probably parses a file more or less instantly and you get the entire dataset to run your type heuristics on.

chrisn 2021-03-27T13:00:22.015200Z

You can re-parse a column though: https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.column.html#var-parse-column Then you can just re-parse the columns using same syntax as your parser-fn.

chrisn 2021-03-27T13:07:06.015400Z

I should have directed you towards https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.html#var-column-cast - that is more general and not specific towards string columns.

2021-03-27T13:26:29.015600Z

I read about the column-cast and will give that one a try too. I think I may have misunderstood the api of it yesterday, but it looks like a good approach. Thank you!

1👍