clojure

New to Clojure? Try the #beginners channel. Official docs: https://clojure.org/ Searchable message archives: https://clojurians-log.clojureverse.org/
aratare 2021-03-10T00:22:11.405400Z

@alexmiller @emccue Ah thanks. Was doing some testing with claypool and one of the tests calls shutdown-agents which causes all other tests going boo boo ๐Ÿ˜…

alexmiller 2021-03-10T02:10:26.406Z

yeah, you have to very careful about where you call that

viesti 2021-03-10T06:04:50.406200Z

> SFTP is also supported. > My colleagues used probably this library to monitor file drops on a file server. Long time ago so don't remember if this was the same lib, but looks similar enough :)

scythx 2021-03-10T09:31:44.407400Z

Does anyone has library suggestion for parsing HTML?

p-himik 2021-03-10T09:35:16.407500Z

There's a Clojureverse thread about it. I think this particular post very useful: https://clojureverse.org/t/best-library-for-querying-html/1103/18

โค๏ธ 1
dharrigan 2021-03-10T09:36:09.407900Z

jsoup is really good

โค๏ธ 2
dharrigan 2021-03-10T09:36:23.408100Z

(if you don't mind doing interop)

scythx 2021-03-10T09:38:23.408300Z

yeah, i was searching for clojure library but there's no good. I'm unfamiliar with java ecosystem, so thanks for the suggestion, I'll try it out!

borkdude 2021-03-10T09:44:54.408600Z

yep, jsoup

Ed 2021-03-10T11:09:49.409200Z

enlive uses jsoup doesn't it? https://github.com/cgrand/enlive

Tomaz Bracic 2021-03-10T12:46:17.409800Z

Hi all

๐Ÿ‘‹ 6
2021-03-10T14:00:55.411Z

What about Hickory?

jmckitrick 2021-03-10T14:30:42.413900Z

So I see that reitit supports malli (of course), and reitit supports swagger, and malli supports swagger. But... can I use malli on my reitit routes and produce swagger docs in one fell swoop?

borkdude 2021-03-10T14:40:39.414300Z

@jmckitrick best to ask in #malli or #reitit

jmckitrick 2021-03-10T14:41:18.414600Z

Will do

lukasz 2021-03-10T15:02:22.414700Z

We use Jsoup (cleaning) + hickory (traversing, parsing, converting to other formats) - both are great

Elso 2021-03-10T16:14:57.423700Z

Heyo, I'm trying to wrap a given DB-worker-queue implementation in a way that allows me to do some intermediate mapping / side-effecting steps in a sane manner. So basically I have a method that yields a queue-item + context (containing the tx), some item-specific stuff and a consumer and I want to interject something between. First thing coming to mind would be a lazy-seq of sorts, i.e.

(loop
  (let [new-queue-items (->> queue-items (map generic-preparation) (map specific-stuff) (map generic-post-processing))
        item (take 1 new-queue-item)]
    (enqueue item)
    (recur)))
this however doesn't work because I always take the first realized item as it seems I know how to do this in imperative-mess style, but I figure there ought to be a way to do it in more or less idiomatic clojure

dpsutton 2021-03-10T16:21:14.425500Z

if queue-items is immutable this is an infinite loop on the first item. if queue-items is mutable i'm not sure where you block and wait and remove from it. you're using this very much like an async channel. if you are already using core.async you can put a transducer on a channel (comp (map generic-preparation) (map ...)) and this would handle your coordination

Elso 2021-03-11T12:22:09.474100Z

the idea was something like have the "queue" be a producer that yields when someone takes from it

John Conti 2021-03-17T18:05:20.069200Z

Core.async is a big jump. PersistentQueue might be all you need: https://admay.github.io/queues-in-clojure/

Elso 2021-03-10T16:39:49.425600Z

what makes it mutable though? I was testing the idea with

(defn uuid-seq
  []
  (lazy-seq
    (cons (str (UUID/randomUUID))
          (uuid-seq))))

(def us (uuid-seq))

(take 10 us)
which apparently is not. the deletion part effectively happens via the enqueue, and blocking and waiting before the recur. the problem I had with async was that I couldn't come up with something that would not require a dedicated thread for filling up the input channel and another one running the loop. lest I put onto the channel in the same loop that takes from it?

dpsutton 2021-03-10T16:40:54.425800Z

its not mutable. so recurring and doing all the mapping and taking 1 will get the same result from your queue-items

Elso 2021-03-10T16:41:55.426Z

ok really dense question then but what makes a mutable lazy sequence then?

dpsutton 2021-03-10T16:45:55.426200Z

you can't make a clojure sequence mutable. you can look at java.util types like blocking queue and some others. this can block waiting on an item. or you can look at core.async channels

Elso 2021-03-10T16:46:20.426400Z

k guess async it is then

Elso 2021-03-10T16:46:36.426600Z

probably now as good as ever to wrap my head around it

dpsutton 2021-03-10T16:48:05.427Z

as to the dedicated thread stuff, i don't think its true you need two dedicated threads. using async will use whatever thread is available in the async pool i believe. but you certainly need two concurrent threads of execution. you have a loop just watching the queue. if you don't have two threads i don't see how you would keep watching the queue and also populate the queue but with two threads of execution

javahippie 2021-03-10T19:32:06.436100Z

My software is storing longer text segments in a database. Sometimes a text already exist in the database, and I donโ€™t want to store it again in this case. Im playing with the idea of hashing the texts and storing the hash in the DB, too. In this case, I can just compare the hashes, and donโ€™t have to send 20.000 charaters to the database and compare them. Are there any ways to create (relatively) non-colliding hashes in Clojure without adding a library?

javahippie 2021-03-10T19:33:46.436200Z

I guess MessageDigest via Java Interop is a good start?

lukasz 2021-03-10T19:36:46.436700Z

Yes (we do something similar)

dpsutton 2021-03-10T19:40:13.436900Z

a counterpoint, you may want to add a library. because you need these hashes to never change and i doubt you can depend on that from anything that's included already. Clojure has changed its hashing library in the past in which case you would be pretty sunk, right?

lukasz 2021-03-10T19:46:10.437100Z

MessageDigest comes from the JDK, AFAIR it doesn't relate in any way how Clojure did it's underlying hashing

dpsutton 2021-03-10T19:49:25.438300Z

Will that be identical across newer jvms? I donโ€™t know. Just wondering how you could ensure that things hash consistently as time goes by

lukasz 2021-03-10T19:51:17.438500Z

I believe that depends on the type of the hashing algorithm used.

2021-03-10T19:56:12.438700Z

the thing with databases is you usually want to index things

2021-03-10T19:57:07.438900Z

so the hash is a way to reduce the size of what gets indexed (index the hash instead of all the text)

2021-03-10T19:57:30.439100Z

but any hash function can have collisions, so the best thing to do is to basically build a hash table in the database

2021-03-10T19:57:47.439300Z

index on the hash, but don't enforce a unique constraint

2021-03-10T19:58:14.439500Z

and search by hash, and then scan through possible results there to see if the text actually exists

โž• 1
javahippie 2021-03-10T20:06:24.440Z

Good point about consistency, but Iโ€™d assume that e.g. the SHA256 is implemented in a consistent way across JVMs. Good point to research, assuming is not enought ๐Ÿ˜‰

javahippie 2021-03-10T20:07:46.440200Z

Good point about the collisions, @hiredman. Iโ€™d almost tend to accept the probability of a SHA256 collision, but implementing a way to handle the collision is relatively easy, so there is no reason not to do it.

2021-03-10T20:19:13.440400Z

If you are willing to accept collisions you can also use smaller/weaker hash functions too https://ankane.org/large-text-indexes

๐Ÿ‘ 1
schmee 2021-03-10T21:28:43.440900Z

the probability of a collision with SHA256 is for all intents and purposes non-existent, and can be safely ignored

schmee 2021-03-10T21:29:30.441100Z

@javahippie you can certainly rely on SHA256 to be consistent across JVMs, and using MessageDigest with SHA2-256 sounds like a good solution to me :thumbsup:

MatthewLisp 2021-03-10T21:31:44.442100Z

Hello everyone ๐Ÿ‘‹ Any guides/tutorials on using Github actions + deps.edn for running tests automatically? Or maybe CircleCI + deps.edn

seancorfield 2021-03-10T21:34:14.442600Z

@matthewlisp https://github.com/seancorfield/next-jdbc/blob/develop/.github/workflows/test.yml is an example in the wild.

seancorfield 2021-03-10T21:34:37.443200Z

Most of my projects use GitHub Actions for CI and some use CircleCI as well.

MatthewLisp 2021-03-10T21:34:50.443500Z

Thanks a lot! ๐Ÿ˜„

sova-soars-the-sora 2021-03-10T21:42:01.445600Z

Hi I have a question. I want to take some source texts [target data set A] and remove some of the words from each sentence [training set] and then train a neural network to add the missing words back in. I am looking at Cortex. Anybody have any recommendations?

blak3mill3r 2021-03-10T21:48:07.446700Z

Sounds like you have a lot of text @sova! You might also look at Spark. Do you know what sort of model you want to train at scale? Have you tried it in the small?

blak3mill3r 2021-03-10T21:48:40.447200Z

I recommend prototyping your ML model until you see results you like, before trying to distribute it

pez 2021-03-10T21:52:14.450300Z

In the โ€œofficialโ€ guide about Programming at the REPL, there is an ending note about how things can go wrong if you switch to a namespace without first loading it. https://clojure.org/guides/repl/navigating_namespaces#_how_things_can_go_wrong, However, there is no mention about how to fix it there. Anyone know what the fix would be?

alexmiller 2021-03-10T21:52:36.450600Z

(clojure.core/refer-clojure)

alexmiller 2021-03-10T21:53:17.450900Z

(which is exactly what ns does for you)

pez 2021-03-10T21:55:14.451Z

Thanks. So just call that and things will be good?

blak3mill3r 2021-03-10T21:56:22.451200Z

You will have clojure.core symbols referred, not any code defined in the namespace

sova-soars-the-sora 2021-03-10T21:58:19.452500Z

@blak3mill3r thanks! i am just getting started. will check out spark. Yes, presumably it would be quite a lot of text. Would be good to train on teeny tiny samples beforehand. I want to use it as a pre-processing step in language translation.

blak3mill3r 2021-03-10T22:01:41.452700Z

I suggest reading about how others are tackling similar problems. https://cs224d.stanford.edu/reports/ManiArathi.pdf

blak3mill3r 2021-03-10T22:03:31.453Z

To do deep learning that understands time (or sequential stuff, like words) I think https://en.wikipedia.org/wiki/Recurrent_neural_network are pretty useful. Not really my area of expertise though, but maybe this gives you some ideas

blak3mill3r 2021-03-10T22:04:22.453400Z

There's a lot of NLP stuff that does not use deep learning, too, but I think the state of the art is some form of RNN

sova-soars-the-sora 2021-03-10T22:08:17.453600Z

thanks, it's been about a decade since I looked at RNN stuff! ๐Ÿ˜ƒ