this is probably a typo, right? https://github.com/cognitect/transit-clj#usage
(def reader (transit/reader in :json))
I don’t see the typo. What do you mean specifically?
greetings! what is the fastest way to reduce size of seq of maps by comparing only subset of keys, eg: [{:id (uuid) :a 1 :b 1} {:id (uuid) :a 1 :b 1}] => ;;just [{:id (uuid) :a 1 :b 1}]
seq is millions of items. the only thing I can think of - is to put those extra unique fields into meta, and accumulate maps into set from the get go
looking into it, thanks
specifically the in
symbol. i had a flashback to coffeescript
clojure.core/distinct
Or do you mean comparing only by :a
and :b
?
If so, what of the two :id
keys will be used?
the task itself is a game simulation, where you start from initial state, and given some rules, generate possible next states from it, and iterate until out of memory :) so with each iteration set of next states grows exponentially in addition to already calculated ones
compare by equivalent of (dissoc m :id)
or (select-keys m [:a :b])
But what ID will you use? Or are you OK with a random ID?
I mean the result, not for comparison.
If dissoc
+ distinct
+ adding a new ID (all via transducers) is not fast enough, then I would look into tries.
id format is not important, but what is important - next states will have :prev-state-id
That’s the input arg to read from
so I 1) cannot assign just random ids after everything is done. 2) given millions of items = just scanning through them and assigning ids after some step - takes very long too
(def in (ByteArrayInputStream. (.toByteArray out)))
(def reader (transit/reader in :json))
ever stare at something for so long that you don't see the forest through the trees? 😳
happens all the time, it's ok
You could dissoc
the id, put the original value in the meta data, use distinct, then retrieve the original data from the meta data. I have no idea if this will be suitable performance-wise, but it avoids the 2 issues you listed.
(defn distinct-ignore-id [items]
(map #(:original-item (meta %))
(distinct
(map (fn [item]
(with-meta (dissoc item :id)
{:original-item item}))
items))))
Sorry, I just noticed this is basically the same solution you suggested in the top level comment. Nevermind.
hi, if I want to take a string like this
this is a [test](test.md) of inline [links](<https://example.com>)
and remove the links but keep track of them, like this:
{:text "this is a test of inline links" :links [{:name "test" :path "test.md"} {:name "links" :path "<https://example.com>"}]}
what would be the best way to do this? (I am parsing a markdown file, but a full markdown parser is unavailable)One way (not necessarily the best) is to use a parser like instaparse and write a small grammar for the things you want to recognize differently from the rest of the text. That might be tricky for handling arbitrary Github-flavored markdown, since I know that some of their constructs are dependent on what comes first on the line, and other construts like [link to this text](something.md) can have the contents between [] spanning multiple lines.
This StackOverflow question has some answers that might lead you to a full parser for Github-flavored markdown, but perhaps not written in Clojure.
https://stackoverflow.com/questions/39560644/what-library-does-github-use-for-parsing-markdown
I know that I'm only going to have link contents on one line, and all I need to do is extract links rather than do proper markdown parsing
This sundown library is mentioned: https://github.com/vmg/sundown Its README says it has bindings for many languages, including Python, Ruby, JavaScript, Haskell, and Go, but I don't see Java there. The JavaScript library might be easily callable from ClojureScript.
I'm also running in a babashka environment, so the libraries that would be available arne't here
If you know such links are always going to be within a single line, you could attempt to use regex matching.
Is there existent tooling that would take function docstrings from a project to populate an API section of it's README? The idea being to avoid having to maintain documentation from two places
Do you ever expect the text to have comments in parentheses or square brackets, that aren't links? e.g. could someone write "the foo (which is variant of a bar) can be blurg [see reference 5]"
yeah
I know how to regex match to find if a line contains a link but not what the link is or how to then remove it
I'm not sure how you'd do that with regex
Something like this https://cljdoc.org or https://github.com/weavejester/codox ?
(let [s "this is a [test](test.md) of inline [links](<https://example.com>)"
re #"\[([^\]]*)\]\(([^\]]*)\)"]
{:links (->> (re-seq re s) (map (fn [[_ n p]] {:name n :path p})))
:text (clojure.string/replace s re "$1")})
maybe something like that?
ah, thanks
didn't know re-seq existed
Codox might be to target the README, using a custom writer. It does seem rather over featured for what I had in mind. I'm not sure how easy it would be to use it through clj
I had in mind a light program, using probably clj-kondo under the hood, to just lift some docstrings from .clj files to the README and using an api like this imaginary one:
clj -A:readme "./src/api.clj" --output README.md