@blueberry I understand. As far as I see in the cortex codebase, they use C++ templating to specialize operations on all datatypes at the same time and provide it for the JVM through JavaCPP. I also think that Integer-based tensors are mostly good for indices and their permutations. I have not used them for anything else yet.
In ClojureCL and ClojureCUDA such functionality is available without writing any C++ whatsoever. You can do all this in Clojure, and all Java/Clojure primitives are supported.
Nice!