Huh, so "Cartesian product of functions" != "Cartesian product of collections of functions and arguments". TIL
(defn X [f1 f2]
(fn [x y]
[(f1 x) (f2 y)]))
If I have a websocket component that has an internal clojure.core.async/mult
to distribute all incoming messages, does it make sense to have that component extend the clojure.core.async/Mult
protocol? Is that the intended use of the protocol? or is it better to just make my own tap
functions in my own namespace
Given N sets, how do I find out the largest possible set (or sets) that is a subset of all N sets?
(magic [#{1 1} #{1 2} #{1 3}]) ;; => #{1}
Brute-force solutions wouldn't cut it for the size of my inputsclojure.set/intersection
.
That's not what I want, as the inputs can be disparate and most likely the intersection will be #{}
99% of the times
So I should reword: of all N sets
-> of the most sets as possible
I still don't understand. What would your desired function return for [#{1 2} #{3 4}]
, for example?
Or better yet, can you describe the algorithm? Because I cannot parse that sentence with "->". :)
something like the union of the results of intersecting of each subset with the union of all subsets?
Uhm, that would be just the union of the input, no?
> What would your desired function return forย [#{1 2} #{3 4}], for example?
that's a pretty degenerate case (for my needs) since it's very small and there's nothing in common. I don't care if it returns #{} or something else
I'm trying to find the largest possible subsets given N sets of size 1000 to 100000 each, which are presumed to have quite a lot in common.
There will be many possible answers, and what a 'good' anwer is can be tweaked. e.g. maybe for me a set with 300 elements that is a subset of 40 other sets (but not a subset of 20 other sets), is interesting enough
Combine the sets into a bag having all elements. Then use frequencies on the bag to find the largest count?
What would that return for [#{1 2} #{1 2} #{3} #{3}]
? The frequencies are the same, yet there's no #{1 2 3}
subset.
yeah, oops, I guess you could intersect each subset with union of all the other subsets?
I don't know. I think the task is not defined clearly enough for us to come up with any robust solution.
But it feels like a task of turning a 2D metric into a 1D one, and that always requires some particular approach to dimensionality reduction.
#{1 2}
seems a good answer, because its size is large, even if it's not a subset of all inputs. Ultimately I'm not seeking an unequivocal mathematical function, but something that gives good answers to humans for the desired size range (1000-100000).
at sizes 1-10 it's probably easy to get lost because the differences are too small to be evident.
for example, for an input of 100 sets sized 10000 each:
* an answer of a set sized 5000 which is a subset of 80 sets is very valuable
* an answer of a set sized 9999 which is a subset of 1 set is not very valuable
* an answer of a set sized 2 which is a subset of 80 sets is not very valuable
...maybe my problem is too specific, sorry for that. at first I thought it could be a commonly solved problem.
yes, counting seems a good first step :) I hope it doesn't lead to an exhaustive search
The main issue is that you have 2 numbers (subset size and the amount of its supersets) that you have to reduce to just one ("score" of a particular subset). It's a dimensionality reduction problem, and there never is a cookie-cutter solution. You will have to somehow write that function that takes two numbers and outputs just one.
thanks! dimensionality reduction
seems a good read
And a deep rabbit hole. :)
This is a quick informative read: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006907