@alexmiller @emccue Ah thanks. Was doing some testing with claypool and one of the tests calls shutdown-agents
which causes all other tests going boo boo ๐
yeah, you have to very careful about where you call that
> SFTP is also supported. > My colleagues used probably this library to monitor file drops on a file server. Long time ago so don't remember if this was the same lib, but looks similar enough :)
Does anyone has library suggestion for parsing HTML?
There's a Clojureverse thread about it. I think this particular post very useful: https://clojureverse.org/t/best-library-for-querying-html/1103/18
jsoup is really good
(if you don't mind doing interop)
yeah, i was searching for clojure library but there's no good. I'm unfamiliar with java ecosystem, so thanks for the suggestion, I'll try it out!
yep, jsoup
enlive uses jsoup doesn't it? https://github.com/cgrand/enlive
Hi all
What about Hickory?
So I see that reitit
supports malli
(of course), and reitit
supports swagger
, and malli
supports swagger
. But... can I use malli
on my reitit
routes and produce swagger
docs in one fell swoop?
@jmckitrick best to ask in #malli or #reitit
Will do
We use Jsoup (cleaning) + hickory (traversing, parsing, converting to other formats) - both are great
Heyo, I'm trying to wrap a given DB-worker-queue implementation in a way that allows me to do some intermediate mapping / side-effecting steps in a sane manner. So basically I have a method that yields a queue-item + context (containing the tx), some item-specific stuff and a consumer and I want to interject something between. First thing coming to mind would be a lazy-seq of sorts, i.e.
(loop
(let [new-queue-items (->> queue-items (map generic-preparation) (map specific-stuff) (map generic-post-processing))
item (take 1 new-queue-item)]
(enqueue item)
(recur)))
this however doesn't work because I always take the first realized item as it seems
I know how to do this in imperative-mess style, but I figure there ought to be a way to do it in more or less idiomatic clojureif queue-items is immutable this is an infinite loop on the first item. if queue-items is mutable i'm not sure where you block and wait and remove from it. you're using this very much like an async channel. if you are already using core.async you can put a transducer on a channel (comp (map generic-preparation) (map ...))
and this would handle your coordination
the idea was something like have the "queue" be a producer that yields when someone takes from it
Core.async is a big jump. PersistentQueue might be all you need: https://admay.github.io/queues-in-clojure/
what makes it mutable though? I was testing the idea with
(defn uuid-seq
[]
(lazy-seq
(cons (str (UUID/randomUUID))
(uuid-seq))))
(def us (uuid-seq))
(take 10 us)
which apparently is not.
the deletion part effectively happens via the enqueue, and blocking and waiting before the recur.
the problem I had with async was that I couldn't come up with something that would not require a dedicated thread for filling up the input channel and another one running the loop.
lest I put onto the channel in the same loop that takes from it?its not mutable. so recurring and doing all the mapping and taking 1 will get the same result from your queue-items
ok really dense question then but what makes a mutable lazy sequence then?
you can't make a clojure sequence mutable. you can look at java.util types like blocking queue and some others. this can block waiting on an item. or you can look at core.async channels
k guess async it is then
probably now as good as ever to wrap my head around it
as to the dedicated thread stuff, i don't think its true you need two dedicated threads. using async will use whatever thread is available in the async pool i believe. but you certainly need two concurrent threads of execution. you have a loop just watching the queue. if you don't have two threads i don't see how you would keep watching the queue and also populate the queue but with two threads of execution
My software is storing longer text segments in a database. Sometimes a text already exist in the database, and I donโt want to store it again in this case. Im playing with the idea of hashing the texts and storing the hash in the DB, too. In this case, I can just compare the hashes, and donโt have to send 20.000 charaters to the database and compare them. Are there any ways to create (relatively) non-colliding hashes in Clojure without adding a library?
I guess MessageDigest via Java Interop is a good start?
Yes (we do something similar)
a counterpoint, you may want to add a library. because you need these hashes to never change and i doubt you can depend on that from anything that's included already. Clojure has changed its hashing library in the past in which case you would be pretty sunk, right?
MessageDigest comes from the JDK, AFAIR it doesn't relate in any way how Clojure did it's underlying hashing
Will that be identical across newer jvms? I donโt know. Just wondering how you could ensure that things hash consistently as time goes by
I believe that depends on the type of the hashing algorithm used.
the thing with databases is you usually want to index things
so the hash is a way to reduce the size of what gets indexed (index the hash instead of all the text)
but any hash function can have collisions, so the best thing to do is to basically build a hash table in the database
index on the hash, but don't enforce a unique constraint
and search by hash, and then scan through possible results there to see if the text actually exists
Good point about consistency, but Iโd assume that e.g. the SHA256 is implemented in a consistent way across JVMs. Good point to research, assuming is not enought ๐
Good point about the collisions, @hiredman. Iโd almost tend to accept the probability of a SHA256 collision, but implementing a way to handle the collision is relatively easy, so there is no reason not to do it.
If you are willing to accept collisions you can also use smaller/weaker hash functions too https://ankane.org/large-text-indexes
the probability of a collision with SHA256 is for all intents and purposes non-existent, and can be safely ignored
@javahippie you can certainly rely on SHA256 to be consistent across JVMs, and using MessageDigest with SHA2-256 sounds like a good solution to me :thumbsup:
Hello everyone ๐ Any guides/tutorials on using Github actions + deps.edn for running tests automatically? Or maybe CircleCI + deps.edn
@matthewlisp https://github.com/seancorfield/next-jdbc/blob/develop/.github/workflows/test.yml is an example in the wild.
Most of my projects use GitHub Actions for CI and some use CircleCI as well.
Thanks a lot! ๐
https://github.com/seancorfield/next-jdbc/blob/develop/.circleci/config.yml
Hi I have a question. I want to take some source texts [target data set A] and remove some of the words from each sentence [training set] and then train a neural network to add the missing words back in. I am looking at Cortex. Anybody have any recommendations?
Sounds like you have a lot of text @sova! You might also look at Spark. Do you know what sort of model you want to train at scale? Have you tried it in the small?
I recommend prototyping your ML model until you see results you like, before trying to distribute it
In the โofficialโ guide about Programming at the REPL, there is an ending note about how things can go wrong if you switch to a namespace without first loading it. https://clojure.org/guides/repl/navigating_namespaces#_how_things_can_go_wrong, However, there is no mention about how to fix it there. Anyone know what the fix would be?
(clojure.core/refer-clojure)
(which is exactly what ns
does for you)
Thanks. So just call that and things will be good?
You will have clojure.core
symbols referred, not any code defined in the namespace
@blak3mill3r thanks! i am just getting started. will check out spark. Yes, presumably it would be quite a lot of text. Would be good to train on teeny tiny samples beforehand. I want to use it as a pre-processing step in language translation.
I suggest reading about how others are tackling similar problems. https://cs224d.stanford.edu/reports/ManiArathi.pdf
To do deep learning that understands time (or sequential stuff, like words) I think https://en.wikipedia.org/wiki/Recurrent_neural_network are pretty useful. Not really my area of expertise though, but maybe this gives you some ideas
There's a lot of NLP stuff that does not use deep learning, too, but I think the state of the art is some form of RNN
thanks, it's been about a decade since I looked at RNN stuff! ๐