off-topic

https://github.com/clojurians/community-development/blob/master/Code-of-Conduct.md Clojurians Slack Community Code of Conduct. Searchable message archives are at https://clojurians-log.clojureverse.org/
2020-12-16T15:10:00.183400Z

Someone on Hacker News has asked for advice about learning Lisp: https://news.ycombinator.com/item?id=25441664#25443184 … can you believe that the majority of respondents are not recommending Clojure? 😱

simongray 2020-12-17T12:42:54.201800Z

Yup. Try saying something nice about Clojure on /r/lisp and you will either be told that Clojure is not a Lisp or that Common Lisp can do everything Clojure does and more.

2020-12-17T15:48:13.203100Z

Heh I did just learn from the Hacker News comments about a whole load of stuff that Common Lisp does that Clojure doesn't (mainly around interactive programming). Maybe one day, I'll find the energy to learn Common Lisp just so I can experience those things :-)

euccastro 2020-12-26T23:32:49.272500Z

depending on their intended focus, to someone newly learning Lisp I may recommend Racket rather than Clojure, because it's way more beginner-friendly and has more learning resources (e.g. https://htdp.org/2020-8-1/Book/index.html)

👍 1
Joe 2020-12-16T15:22:21.186Z

I'd say depending on what they want to learn Lisp for there's an argument for suggesting SICP/Scheme. But less so if their goal is to use a lisp to build practical production software.

borkdude 2020-12-16T16:03:49.186600Z

SICP is pretty easy to read if you know Clojure, so there's an argument for learning Clojure if you want to read SICP too ;)

bherrmann 2020-12-16T16:48:18.188400Z

Is there a way to convert a let binding into a map ?

let [ name "joe"
      greeting (str "hi " name) ]
-- convert to 
    { :name "joe", :greeting "hi joe" }
in particular, I like how you can reference name from the definition of greet.

Renato Alencar 2020-12-16T17:11:04.188700Z

What are you trying to accomplish by removing the let binding?

Dane Filipczak 2020-12-16T17:47:16.189100Z

you can directly reference name in the expression you’re binding to greeting. Lets are evaluated sequentially and symbols defined previously in the let are available in later expressions

souenzzo 2020-12-16T18:00:55.189300Z

This:

(defmacro named [& vs]
  (let [ks (map keyword vs)]
    `~(into {} (map vector
                    ks vs))))
(defmacro qnamed [& vs]
  (let [ks (map (fn [n]
                  (keyword (str *ns*) (str n)))
                vs)]
    `~(into {} (map vector
                    ks vs))))
(let [a 42]
  {:named  (named a)
   :qnamed (qnamed a)})
=> #'user/named
=> #'user/qnamed
=> {:named {:a 42}, :qnamed {:user/a 42}}

2020-12-16T18:49:53.189500Z

(cmd)user=> (defmacro locals [] (into {} (map (juxt keyword identity) (keys &env))))
#'user/locals
(ins)user=> (let [name "joe" greeting (str "hi " name)] (locals))
{:name "joe", :greeting "hi joe"}

2020-12-16T18:50:41.189700Z

one gotcha with locals is it captures all let bindings, loop bindings, and fn args recursively - exactly what I want for debugging, a no go in app logic

2020-12-16T18:51:08.189900Z

(also captures for bindings etc. etc. - all locals)

2020-12-16T18:51:53.190100Z

oh wait, this isn't what you wanted at all and let already lets you do what you want, my bad

2020-12-16T18:52:00.190300Z

(also this isn't at all off topic :D)

Eric Ihli 2020-12-16T19:37:51.191Z

I've got a computer sciency question that's a bit off-topic from Clojure. It's related to "variable-length encodings" and data structures. I'm in need of a very memory efficient data structure that maps integers to strings. I'm encoding integers as bytes such that the most significant bit signifies whether or not the following byte is part of the current integer. The next 7 bits are part of the integer.

0 -> 10 00 00 00
1 -> 10 00 00 01
2 -> 10 00 00 10
...
127 -> 11 11 11 11
128 -> 00 00 00 01  10 00 00 00
129 -> 00 00 00 01  10 00 00 01
These encoded integers represent identifiers for strings. The identifiers are sorted. I'm imagining the data structure like:
<header-byte-length-of-data>
<variable-length-encoded-integer-identifier-0>
<byte-length-of-following-characters>
<utf-8-encoded-characters> 
<variable-length-encoded-integer-identifier-1> 
<byte-length-of-following-characters>
<utf-8-encoded-characters>
...
The problem I can't solve is efficient indexing. "Find the string with identifier 12345678". They are ordered, so I could binary search. But they aren't fixed width, so a binary search could drop me smack dab in the middle of anything. Is this impossible or am I just not clever enough?

Timur Latypoff 2020-12-17T08:42:19.199900Z

@ericihli if you're free to choose the memory layout, could I suggest you use not one but two continuous regions of memory? One region is just all your fixed-size structs (no variable-length ints) without the actual string data which you can easily binary search. Instead of storing the string data inside your struct, you store an array index where the string starts in your second region (which consists of all concatenated variable-length strings). With some extra effort, you could even do variable-length ints in the first region (if it's a major space saving), since you can more easily identify where the structs begin and end without the string data — for example, you could separate them with a single zero byte while ensuring no other bytes would be zero. Actually, while thinking about it, I think even your original data structure does not have zero bytes in it, so you could use a zero byte as a record separator for a random binary search indexing.

💯 1
❤️ 1
dpsutton 2020-12-16T19:46:08.192800Z

There was a really good blog post about rewriting grep in Haskell. They had to face this problem when facing code points I think. They had a clever way of dealing with it if you want to see.

🙏 1
dpsutton 2020-12-16T19:46:45.193200Z

Sorry it was wc not grep. https://chrispenner.ca/posts/wc

2020-12-16T19:51:29.194100Z

(Slightly related: GNU’s yes is ridiculously fast because people appear to have been code golfing it for speed)

nate 2020-12-16T20:20:57.194600Z

https://twitter.com/djrrb/status/1339244410387750920

😆 1
nate 2020-12-16T20:21:10.194800Z

totally worth it

val_waeselynck 2020-12-16T20:52:37.195Z

> I'm encoding integers as bytes such that the most significant bit signifies whether or not the following byte is part of the current integer @ericihli this problem seems interesting but I really don't understand that sentence

val_waeselynck 2020-12-16T20:58:49.195200Z

It's also not clear to me what you're after in "memory-efficient": fast access or low footprint?

bherrmann 2020-12-16T21:00:47.195400Z

@noisesmith Thanks, the locals macro does exactly what I wanted.

Eric Ihli 2020-12-16T21:08:16.195600Z

Low footprint. What I mean by that sentence is: To decode an integer from an array of bytes: Look at the first byte. If the left-most bit is a 1, then the integer is the next 7 bits. If the left-most bit is a 0, then the integer is the next 7 bits left-shifted 7 places and bit-or with the same logic on the next byte.

val_waeselynck 2020-12-16T21:21:35.195900Z

I don't see a direct solution to your problem, but this might help: https://nlp.stanford.edu/IR-book/html/htmledition/index-compression-1.html

🙏 1
2020-12-16T21:36:45.196200Z

I really do need to read SICP again, now that I know one Lisp. Going through the book and typing stuff out in Python was alright back in the day but I bet Clojure will be better.

2020-12-16T21:51:16.196400Z

So a common answer for efficient indexing is to define a decent hash function on the values that are your keys (in this case I believe those are your variable-length encoded integers), and then store elements in hash buckets indexed by the hash function result.

2020-12-16T21:52:26.196600Z

You can use binary search on any kinds of keys that you can define a total order on. The total order on variable-length encoded integers could simply be normal < between the integer values, but there are other total orders you could use.

2020-12-16T21:53:46.196800Z

e.g. a different total order on such an encoding would be "shorter byte sequences are smaller than longer byte sequences, and if two byte sequences are the same length, then use lexicographic ordering between those equal length byte sequences". I am not sure there is any practical advantage to using that total order over the one above, though.

2020-12-16T21:57:10.197Z

Hmm, rereading your description, I guess when you say "I'm imagining the data structure like:" you mean that the data structure would be a contiguous sequence of bytes in memory that starts with &lt;header-byte-length-of-data&gt; and continues as in your example?

👍 1
2020-12-16T21:58:41.197200Z

If there is no separate 'index' to this sequence of bytes, because you are trying to avoid the memory required to store such an index, then a question for you is: can you encode that sequence of bytes such that by starting at an arbitrary byte position in that sequence, you can scan forwards and/or backwards and determine where the start of the next record is, without having to scan from the beginning?

2020-12-16T21:59:38.197400Z

e.g. the simplest way to do that would be to have a unique byte value that could not be used within each 'record', but that probably isn't possible with your data.

2020-12-16T22:01:07.197600Z

If there is a way to find the next record boundary starting anywhere in the middle and scanning forward, you could do binary search on the position of a byte in that byte sequence, e.g. start at the middle byte, scan forward for the next record start. Compare the search key to the key of that record, stopping if you have a match. If your key is less than that one, take the half-way position, in bytes, between the current [start,end] range you've narrowed it down to, and scan forward to find the next beginning-of-record.

2020-12-16T22:01:36.197800Z

Whether that lookup is efficient enough for your purposes or not is up to your application.

2020-12-16T22:02:50.198Z

When people care more about lookup efficiency and are willing to trade it off for some amount of extra storage, they create a separate index with keys and pointers to starts of records.

2020-12-16T22:05:47.198300Z

UTF-8 encoding has some similarities to the variable length integer encoding you mention, and has a "synchronization" property where from an arbitrary position within a sequence of bytes, you can scan forward (and maybe also backwards?) to find the next start of "variable-length Unicode code point": https://en.wikipedia.org/wiki/UTF-8. (look for occurrences of the word "synchronization" on that page)

Eric Ihli 2020-12-16T22:21:15.198800Z

I thought briefly about whether I could use a flag bit to detect the start of a record. My instinct was that since UTF-8 uses a flag bit where the MSB is either 0 or 1 and the rest could be anything, that it kind of killed that possibility (unless I padded every byte and used the extra space for my own personal flags).

Eric Ihli 2020-12-16T22:22:54.199Z

And yeah, I was imagining a contiguous sequence of bytes. Got the overall idea from https://www.aclweb.org/anthology/W09-1505.pdf and just struggling with implementation. Just came across what I think will be a valuable resource. https://hdevalence.ca/blog/2014-02-13-fun-with-ngrams-part-ii-tightly-packed-tries

Ben Sless 2020-12-16T22:46:59.199500Z

There is a certain snobbery among CLers which makes them look down their noses on their cousins. They also have a reputation for generally not being nice

☝️ 1