I'm interested in Clojure for AI projects. So far a lot of work seems to be done with wrappers for exisiting frameworks. I wonder why people here use Clojure instead of Python, R, Julia, Scala, etc. What is the advantage? Or is my understanding of these "wrappers" wrong? I think of them as a layer to functions in other programming languages. So when these to all the work, what is so special in using clojure to call them and not rely on Python with the huge ecosystem of libraries that is available?
I love clojure and feel that I'm much more productive in it than in python. A couple of years ago I tried to do as much of our ML code as possible in clojure, but gave up after a while. I still do all the data preparation work (easily more than 70% of the work) in clojure, but actual model training in python, and production deployment of the models as golang services. Probably the situation has changed a bit the last couple of years, but we're happy with this setup.
Thanks for the insight Mathias. Why do you switch to golang for the deployment?
C interop is easier than on the JVM. The go stuff just implements http services. Go because it's very easy to learn and high-performance web services are pretty much what it's best at...
Depending on the use case, of course doing the service in python should be fine. I don't exactly remember why we gave that up, but I think it was because some of the feature extraction things were just too slow in python.
We do the data preparation and feature extraction in Clojure, the training in Python using sage-maker, and our services are in Clojure as well doing the real time feature extraction and calling the deployed sage-maker inference endpoints.
Hi, Why Clojure doesnβt have a more flexible threading macro, for cases where ->
and =>
would need a as->
or as=>
.
Iβm thinking something, like this:
(=> [:first :second]
(map name _)
(first _)
(.substring _ 1))
Maybe try asking in #ai ?
Maybe you're looking for this? https://clojuredocs.org/clojure.core/as-%3E
Also realize that when people use Python for these tasks, most of the computational heavy lifting is implemented not in Python, but another language like C or C++, for performance.
And I think the main answer to the question that people are interested in creating Clojure wrappers for such functionality is that they really like Clojure and would prefer to use it for these tasks, but the existing libraries are so large and extensive that reimplementing all of them would be much more time consuming than wrapping them.
You're basically describing the as->
macro, but with a built-in predefined symbol (I think it's called an anaphora?). The advantage of using as->
instead of something with a hard-coded symbol is that sometimes you might want use a threading macro inside a threading macro, and then which value should your hard-coded symbol refer to?
Also you're probably aware that _
generally means "ignore this value," so you probably wouldn't pick that as your hard-coded symbol anyway.
I was just wondering if someone already thought about this, because I think I have seen something similar in Javascriptβs pipe operator proposals (or a similar language). the symbol was just an example too. (it can be a keyword, to play it safe)
someone thought about it and made it a long time ago. as->
is seemingly precisely this?
I know about the foundation in C, so it feels like it comes to the preference of the "control language" for these structures?
There are some extremely interesting projects happening to make Clojure a stronger language for data science. The stream is most active over on Zulip rather than here: http://clojurians.zulipchat.com. There's specifically a "why clojure for data science" thread in the data science stream that speaks to your question on both a practical and conceptual level. I think overall you're right in that one of the primary appeals for a lot of people is the ergonomics of dealing with data in Clojure compared to more verbose languages.
FWIW Clojure appeals to me for data science for a number of reasons:
* The REPL makes iterative development much easier
* Clojure's approach to sequential collections makes data munging far easier and more robust than Python or R. If you take advantage of transducers you get both batch and stream data processing "for free".
* spec
for enforcing and explaining expectations of data at application boundaries
* The interop story increasingly goes beyond just wrapping JVM libraries: libpython-clj / clojisr
both allow for direct calling of python and R code within a Clojure process.
* Experimental libraries like tech.datatype
and neanderthal
are greatly expanding the capabilities of Clojure to do high-performance numerics
Probably worth taking a look over in #data-science here too
Programming languages are wrappers around machine code. A programming language is a "language" like English or French. It's a way to express your thoughts and ideas. A lot of people find the Clojure language more expressive and concise. Meaning they feel they can more quickly and simply explain to the computer what to do. Libraries/frameworks can be seen as new vocabulary for existing languages. Ideas that were complex and difficult to explain and quite long to define are themselves wrapped behind a shorter word which takes a new meaning of its own. So now you can quickly refer to something quite complex with a single word, allowing you to once again express more things faster. Now you speak this language for both machine and human. You need to make sure the computer can understand it, but that future you and others will also easily follow your intent. The other thing most programming languages are is an implementation of a compiler. Different compilers offer different features. Clojure for example gives you fast incremental compilation, hot-reloads, and the ability to compile to various platforms like JVM, JS, CLR, etc. When it comes to the language, some people just like Clojure more than Python or Scala or R, etc. They find it easier to speak Clojure, and express themselves in it. They may also find it easier to read and understand other code written in it. And they like the features the language provides as.well, like how you can build more abstract ideas on it to grow the language yourself to express even more things in new ways (meta-programming). When it comes to the compiler, most people love the Interactive Development experience, with the REPL integrated editor and hot-code reload which also works very well with the way the language is spoken. And they love the fact that it compiles to performant machine code. That it can be compiled to JS and Java. That it can be further compiled natively. And that it can use the vocabulary of all of Java, JS and Python that already exists.
Wow @didibus that is a really nice ELI5 style post about the topic
Can you explain the hot-reload a bit more? The repl feels like a jupyter book in python and the hot relaod is to re-run the book?
Ya there'd a ton of utility libraries out there which have macro like you described. Normally people call it either -<>
or it->
(it-> 1
(inc it)
(+ it 3)
(/ 10 it))
or
(-<> 1
(inc <>)
(+ <> 3)
(/ 10 <>))
Tupelo is one that I know which has it-> https://cljdoc.org/d/tupelo/tupelo/20.07.28/api/tupelo.core
And swiss-arrows is one that I know which has -<> https://github.com/rplevy/swiss-arrows
That said, as someone who was using -<>
a lot in the past, I've moved away from it and just use as->
most of the time now if I need to thread multiple positions. The convenience of as->
just always being available in clojure.core kinda won me to it.
(as-> 1 it
(inc it)
(+ it 3)
(/ 10 it))
What's also nice about as-> is you can thread it inside of ->
hot-reload means you can swap part of the code for new code at run-time without needing to restart the application
Thank you everyone for your replies. @didibus that it->
macro is exactly what i was talking about, thank you sharing π
I think Iβll give tupelo or swiss-arrows a try
In practise, as->
is rarely necessary, since you more often are either threading
1) fns that operate on sequences (`map`, filter
, first
, etc) and you use ->>
or
2) fns that operate on a map as an "object" (stuff like assoc
), then you use ->
(or at least, that's my impression)
Repl isn't exactly like Jupiter but kinda.
Basically you can change some code and rerun parts of it and see the results immediately
It's a lot like Jupyter in how interactive it is, but it's like more general than Jupyter. So think how Jupyter you interact with it through a browser. So imagine if you could interact with your Jupyter notebook not just from a browser, but also from your IDE, editor, command-line and wherever else you would want too in theory you could
Now, I just want to be honest here. If you're going to try and use Clojure for data-science, you need to have pretty good programming skills as well. If you're a great data-scoentist but your programming is only soso, and you don't have a full grasp of programming tools and workflows, it's going to be challenging to use Clojure for data-science. The state it is at its missing the polish of Python data-science, its like Linux VS MacOS. So if you decide to try data-science in Clojure I'd come to it with the attitude that you're doing it as a learning opportunity, and I'm sure you'll learn some things. But if you instead think you'll be immediately more productive and better at doing data-science because of Clojure, that's not the case.
https://dragan.rocks/ is the first place I would think of to read about machine learning with Clojure. The author is the creator of Deep Diamond, which is a Clojure based tensor/neural net library, though still in Alpha https://github.com/uncomplicate/deep-diamond
The Alpha status echoes what didibus is saying. This is all pretty new in Clojure so you won't find any well trodden paths
anyone has played with https://github.com/OpenHFT/Java-Thread-Affinity in Clojure? Would seem interesting for high-performance threaded (non-async) webservers. Each webserver handling thread would get 1 core, no context switches
hello , i can do clojure.core/+ but how to use if like that ? if is not in clojure.core , i know its special form , but how to refer to it with full namespace ?
why do you need to refer to it with a full namespace?
i want to make my if , so i want to refer to clojure if seperatly
to distinquice them
i just dont know the namespace where if is in
I'm not sure that's possible:
(let [if (fn [& args]
(prn "overriden"))]
(if true
(prn "nope")))
;; "nope"
How would it prevent context switches? The OS isn't allowed to preempt it? That can't be
true doesnt look possible,thank you phronmophobic
(defn if [] (prn "hello"))
(if)
didnt work also
its ok its not so big problem, i can pick a similar name
if there is no way
special forms cannot be overridden
depending on your use case, you might be able to write a macro that swaps if
with some other implementation. I'm not sure that's a good idea, but maybe for some kind of DSL, it might be reasonable
they're also not vars so they're not resolveable via namespaces
eg.
;; my-dsl replaces 'if symbol with another implementation
(my-dsl
(if true
:foo
:bar))
thank you people, i cant say that i got the last one,but picking a similar name is fine
IF
The lib seems to boil down to Linux' taskset
which binds a thread to a core. This is not supported in macOS for example
It makes sense to me that a core-bound thread never ctx-switches
That be an interesting feature, I just feel it's as easy to assume that all it means is the thread will always be scheduled on the same core, not that it will get exclusive access to the core
Which can still improve potential performance due to improving cache hits
But it wouldn't prevent context switches
A little bit of googling seems to confirm what I'm saying. But I'm not sure if it's possible to set exclusive affinity.
Affinity
is bit of an overloaded term. The lib offers AffinityLock: https://github.com/OpenHFT/Java-Thread-Affinity/blob/e71df9e51724300a977e3aa99138e0d66782fbd5/affinity/src/main/java/net/openhft/affinity/AffinityLock.java#L33-L34
Hum...
I guess it seems you could, but I think you need to preconfigure the OS like explained here: https://unix.stackexchange.com/a/326585/100711
Since it looks like isolating a CPU requires a reboot
Nice one, I appreciate the scrutiny. Didn't know that one. In any case, if playing with this I'd definitely be ready for spending some time down various rabbitholes Surely one could have a nice Docker image with the various necessary bits (one needs a thing called JNA for example)
JNA should just be a .jar dependency
I also don't think this will work in Docker. Or like, I don't think you can use Docker to isolate a CPU. But it seems appart from isolating a few CPUs beforehand, everything else would just be Java dependencies and you'd be good to go. Ya it does seem interesting, I'd be curious if it really increase performance or not.
It does seem that you can do something where a Docker instance runs with reserved CPUs though. Which is a bit different, but interesting as well. Also, I wonder if you'd see any benefit if you're not running on bare metal. Cause if your cores are virtual anyways, they're already shared with other instances.
I'm not much of Docker guy myself so I can't say much. Maybe I just vaguely know that in Linux, its isolation story is different than macOS' - it can match bare-metal performance since virtualization isn't mandatory there Given that, it wouldn't surprise me if you could also reach these isolated cores. Otherwise yeah, I'd ditch Docker
Similarly, e.g. AWS offers bare-metal instances so one could use those without deviating much from the cloud paradigm (I hope)
I have seen Dragan's page and also thouht it was a lot less polished than the big counterparts in python. But these are made with big teams and not a single person. It took me a while to get an overview.