code-reviews

jaihindhreddy 2018-12-17T10:15:31.006900Z

Wrote this fn to parse a line from our log files. Idiomatic?

(require '[java-time :as t])
(defn parse-log-entry [line]
  (let [log-format (array-map :timestamp "\\[([^\\]]+)\\]"
                              :file      "\\[Fl:\\s*([^\\]]+)\\]"
                              :class     "\\[Cl:\\s*([^\\]]+)\\]"
                              :method    "\\[Fn:\\s*([^\\]]+)\\]"
                              :line      "\\[Ln:\\s*([^\\]]+)\\]"
                              :level     "\\[([^\\]]+)\\]"
                              :summary   "\\[([^\\]]+)\\]"
                              :thread-id "([^_]+_[^\\s]+)\\s"
                              :server-ip "([\\d.]+)"
                              :client-ip "([\\d.]+)"
                              :msg       "\\[(.+)$")
        log-entry-regex (->> (concat "^" (vals log-format) "$")
                             (clojure.string/join "\\s*")
                             re-pattern)
        entry (->> (re-find log-entry-regex line)
                   rest
                   (map clojure.string/trim)
                   (map vector (keys log-format))
                   (into {}))
        parse-timestamp #(->> (concat % (repeat \0))
                              (take 24)
                              (apply str)
                              (t/local-date-time (t/formatter "dd-MM-yyyy HH:mm:ss.SSSS")))]
    (-> entry
        (update :line #(Integer/parseInt %))
        (update :timestamp parse-timestamp))))

jumar 2018-12-17T10:39:02.007300Z

It seems that you might want a real parser - take a look instaparse

jaihindhreddy 2018-12-17T12:13:11.007500Z

Yeah, instaparse seems like overkill, and bashing strings together to create regexps seems like underkill. Do you reckon instaparse is a better fit here?

dominicm 2018-12-17T12:17:16.007700Z

Look into named group regexes in java.

dominicm 2018-12-17T12:17:52.008Z

They might make the code a little more obvious.

jumar 2018-12-17T12:21:01.008200Z

Performance might have to be taken into consideration too (if your logs are large) - apart from readability and sensitivity to subtle errors, regexes aren't terribly performant. However, instaparse is also known to be quite slow (antlr is much better, but as you say that's probably an overkill in your case)

jaihindhreddy 2018-12-17T12:21:02.008400Z

Thanks Dominic!

dominicm 2018-12-17T12:23:56.008600Z

Regex is fast by a lot of measures. Especially if you don't use lookahead,

jaihindhreddy 2018-12-17T12:24:24.008800Z

Yeah @jumar this is just for a script that helps me analyze logs easily on my machine in the REPL. Will use a proper parser generator if we decide to put this somewhere at scale.

cvic 2018-12-17T13:02:54.009Z

hmm, what type of logs are you trying to parse?

jaihindhreddy 2018-12-17T15:42:41.009200Z

Logs of an application (Apache + PHP) at work.

val_waeselynck 2018-12-17T17:03:00.009400Z

array-map is fragile - when you want order, you usually don't want maps. I'd use a vector of tuples instead. I would also do some initialization outside of the function body - here you're compiling a big regex for each line of log!

👍 1
val_waeselynck 2018-12-17T17:04:25.009600Z

Likewise, it seems that zipmap is what you want for computing entry.

jaihindhreddy 2018-12-17T17:24:06.009800Z

TIL zipmap. I initially defined log-entry-regex as a var, but moved it in here later. Thank you for your insight.