I'd like to write up an announcement about the verification plan and timeline first, just so we have a place to point folks to as word of the changes percolate through the community. I'll try to get that out this weekend.
Sounds good! I won't update my READMEs or clj-new
until after that is up.
Thanks!
@tcrawley food for thought (in screenshot, conversation from #tools-deps) https://github.com/borkdude/deps-infer
@tcrawley I think for the purpose of libs like these, it would be super awesome if clojars had some kind of index of jars + the list of files in each jar, as EDN, or transit, which refreshed every so often (daily, weekly, monthly)
I tried to run it on the repo cached on the server last night, but realized my recollection of how we build the maven index was wrong - we pull down the poms, not the jars for indexing :( However, I think we could: • pull down the jars once and index those, then store the index in s3 • index new jars as they are deployed, then merge with the existing index This should work since existing releases are immutable. We could also store the index as many timestamped files - that would allow clients to be able to cache the index, pulling down new files and merging them. I suspect the full index file will be pretty large.
yeah, those are good ideas
I like the second idea
then we can just pull only the latest files
Good deal. We should probably open an issue at https://github.com/clojars/clojars-web/issues/new/choose and continue this discussion there
Thanks!
I think it might be better to have one file per namespace actually, since the amount of namespaces to check is usually little and downloading the entire index would be wasteful in that case. Just one http request per namespace would be ideal.
If you agree, I can change the code to produce those files
I think that would be great! I'm focused on adding group validation currently, but we could tackle this afterward. Do you have code already that will generate the index for a single jar?
@tcrawley Yeah, this code is in https://github.com/borkdude/deps-infer We could work on this together if you want. The part I do not control is the "ops" side, but I can write the "script" that produces the index from a dir of jars
A script to processes a sparse maven repo dir would do the trick. "sparse" meaning it is in the correct shape (`group-name/artfact-name/0.1.0/artifact-name-0.1.0.jar`), but has no pom files. The repo is in s3, but we sync down all of the jar files nightly in order to generate the maven-style indexes for tooling, and could generate this index as part of that process.
We could then upload these ns indexes to s3 alongside the feeds/jar lists: https://github.com/clojars/clojars-web/wiki/Data#list-of-jars-and-versions-in-leiningen-syntax
Sounds excellent
@tcrawley Right now I have some code which walks over a dir with .jar files and produces one huge map:
{accountant.core
[{:mvn/version "0.2.5",
:file "accountant/core.cljs",
:group-id "venantius",
:artifact "accountant"}],
adzerk.boot-cljs
[{:mvn/version "2.1.5",
:file "adzerk/boot_cljs.clj",
:group-id "adzerk",
:artifact "boot-cljs"}],
adzerk.boot-cljs-repl
[{:mvn/version "0.4.0",
:file "adzerk/boot_cljs_repl.clj",
:group-id "adzerk",
:artifact "boot-cljs-repl"}],
adzerk.boot-cljs.impl
[{:mvn/version "2.1.5",
:file "adzerk/boot_cljs/impl.clj",
:group-id "adzerk",
:artifact "boot-cljs"}],
adzerk.boot-cljs.js-deps
[{:mvn/version "2.1.5",
:file "adzerk/boot_cljs/js_deps.clj",
:group-id "adzerk",
:artifact "boot-cljs"}],
adzerk.boot-cljs.middleware
[{:mvn/version "2.1.5",
:file "adzerk/boot_cljs/middleware.clj",
:group-id "adzerk",
:artifact "boot-cljs"}],
Perhaps it would be better to partition this into multiple files
For my local .m2 dir the file is 130822
lines long
@tcrawley I have this code here:
https://github.com/borkdude/deps-infer/blob/main/src/deps_infer/clojars.clj
It prints to stdout.
You can run it with clojure -M -m deps-infer.clojars > /tmp/index.edn
This file takes 200ms to parse to EDN on my machine which is still quite ok
But for the entire clojars it might get a little bit bloated
You can change the location of the dir it scans for .jar files with --repo
Thanks! I'll see if I can find some time today to kick this off on the server to see how long it takes and how large of a file it produces.
I produced both an .edn and .transit file and zipped both, here's how it looks on my machine:
$ ls -la /tmp/index*
-rw-r--r-- 1 borkdude wheel 4363922 Feb 24 16:07 /tmp/index.edn
-rw-r--r-- 1 borkdude wheel 214482 Feb 24 17:00 /tmp/index.edn.zip
-rw-r--r-- 1 borkdude wheel 3594066 Feb 24 16:59 /tmp/index.transit.json
-rw-r--r-- 1 borkdude wheel 393184 Feb 24 17:01 /tmp/index.transit.zip
Funnily enough, the zipped edn looks better than the zipped transit.