clojure

New to Clojure? Try the #beginners channel. Official docs: https://clojure.org/ Searchable message archives: https://clojurians-log.clojureverse.org/
p-himik 2020-09-27T03:41:48.210200Z

Huh, so "Cartesian product of functions" != "Cartesian product of collections of functions and arguments". TIL

(defn X [f1 f2]
  (fn [x y]
    [(f1 x) (f2 y)]))

2020-09-27T15:42:36.214900Z

If I have a websocket component that has an internal clojure.core.async/mult to distribute all incoming messages, does it make sense to have that component extend the clojure.core.async/Mult protocol? Is that the intended use of the protocol? or is it better to just make my own tap functions in my own namespace

vemv 2020-09-27T15:58:07.215500Z

Given N sets, how do I find out the largest possible set (or sets) that is a subset of all N sets?

(magic [#{1 1} #{1 2} #{1 3}]) ;; => #{1}
Brute-force solutions wouldn't cut it for the size of my inputs

p-himik 2020-09-27T16:08:07.215700Z

clojure.set/intersection.

vemv 2020-09-27T16:13:44.217Z

That's not what I want, as the inputs can be disparate and most likely the intersection will be #{} 99% of the times So I should reword: of all N sets -> of the most sets as possible

p-himik 2020-09-27T16:15:55.217600Z

I still don't understand. What would your desired function return for [#{1 2} #{3 4}], for example?

p-himik 2020-09-27T16:17:58.218100Z

Or better yet, can you describe the algorithm? Because I cannot parse that sentence with "->". :)

chucklehead 2020-09-27T16:22:44.223800Z

something like the union of the results of intersecting of each subset with the union of all subsets?

๐Ÿ‘€ 1
p-himik 2020-09-27T16:24:22.225400Z

Uhm, that would be just the union of the input, no?

vemv 2020-09-27T16:24:31.225700Z

> What would your desired function return forย [#{1 2} #{3 4}], for example? that's a pretty degenerate case (for my needs) since it's very small and there's nothing in common. I don't care if it returns #{} or something else I'm trying to find the largest possible subsets given N sets of size 1000 to 100000 each, which are presumed to have quite a lot in common. There will be many possible answers, and what a 'good' anwer is can be tweaked. e.g. maybe for me a set with 300 elements that is a subset of 40 other sets (but not a subset of 20 other sets), is interesting enough

2020-09-27T16:25:55.229200Z

Combine the sets into a bag having all elements. Then use frequencies on the bag to find the largest count?

๐Ÿ‘ 1
p-himik 2020-09-27T16:27:45.230700Z

What would that return for [#{1 2} #{1 2} #{3} #{3}]? The frequencies are the same, yet there's no #{1 2 3} subset.

chucklehead 2020-09-27T16:29:20.230800Z

yeah, oops, I guess you could intersect each subset with union of all the other subsets?

p-himik 2020-09-27T16:30:58.232200Z

I don't know. I think the task is not defined clearly enough for us to come up with any robust solution.

p-himik 2020-09-27T16:31:40.233Z

But it feels like a task of turning a 2D metric into a 1D one, and that always requires some particular approach to dimensionality reduction.

vemv 2020-09-27T16:37:21.236800Z

#{1 2} seems a good answer, because its size is large, even if it's not a subset of all inputs. Ultimately I'm not seeking an unequivocal mathematical function, but something that gives good answers to humans for the desired size range (1000-100000). at sizes 1-10 it's probably easy to get lost because the differences are too small to be evident. for example, for an input of 100 sets sized 10000 each: * an answer of a set sized 5000 which is a subset of 80 sets is very valuable * an answer of a set sized 9999 which is a subset of 1 set is not very valuable * an answer of a set sized 2 which is a subset of 80 sets is not very valuable

vemv 2020-09-27T16:39:07.238200Z

...maybe my problem is too specific, sorry for that. at first I thought it could be a commonly solved problem.

vemv 2020-09-27T16:41:52.238300Z

yes, counting seems a good first step :) I hope it doesn't lead to an exhaustive search

p-himik 2020-09-27T16:44:41.240Z

The main issue is that you have 2 numbers (subset size and the amount of its supersets) that you have to reduce to just one ("score" of a particular subset). It's a dimensionality reduction problem, and there never is a cookie-cutter solution. You will have to somehow write that function that takes two numbers and outputs just one.

vemv 2020-09-27T16:47:18.240100Z

thanks! dimensionality reduction seems a good read

p-himik 2020-09-27T16:48:15.240300Z

And a deep rabbit hole. :)

1
p-himik 2020-09-27T16:49:17.240600Z

This is a quick informative read: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006907