clojars

http://clojars.org discussion and “support”, see http://status.clojars.org for status.
xtreak29 2018-03-16T05:45:06.000059Z

@danielcompton Thanks, the wiki was helpful. I used the above rsync command to download only the pom files of around 1GB of data. I am parsing them up to write to database. I could see multiple duplicate poms that result in duplicate data. Is there a reason for that? Does clojars have redundant copies for some reason?

➜  my-wonderful-copy-of-clojars $ diff aleph/aleph/0.1.0-SNAPSHOT/aleph-0.1.0-20100502.112537-10.pom aleph/aleph/0.1.0-SNAPSHOT/aleph-0.1.0-20100502.112537-11.pom
No difference in the above as from the diff command.

xtreak29 2018-03-16T05:50:31.000059Z

If there is already a database present with this info it will save me a lot of time.

danielcompton 2018-03-16T06:00:05.000021Z

Those are SNAPSHOT versions, they’re work in progress versions that people can publish. They probably have changed in source files but not necessarily in POM

danielcompton 2018-03-16T06:00:16.000114Z

You can probably ignore them TbH

xtreak29 2018-03-16T06:10:06.000146Z

Makes sense thanks. Is there a database already present with the info because some of the projects have some unicode symbols which break the XML parsing. E.g. file : my-wonderful-copy-of-clojars/speclj/speclj/2.1.3/speclj-2.1.3.pom content : <comments>Copyright \251 2011 Micah Martin All Rights Reserved.</comments> Exception :

Unhandled java.lang.IllegalAccessException
   class clojure.lang.Reflector cannot access class
   com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException (in module java.xml) be$
   java.xml does not export <http://com.sun.org.apache.xerces.internal.impl.io|com.sun.org.apache.xerces.internal.impl.io> to unnamed module @3629b018

2018-03-16T12:30:02.000252Z

@xtreak29 be aware that just parsing the pom files will give you the direct dependencies, not the transitive dependencies the direct ones have. In order to get the full dependency tree, you’ll need to use a maven resolver to walk the tree

2018-03-16T12:31:59.000214Z

one hacky (but somewhat straightforward) way to do that is with Maven: mvn -f /path/to/pom dependency:tree

xtreak29 2018-03-16T12:32:00.000111Z

Yes, that is a good catch. Also I could see some of the libraries using the dependencies as part of code and not as part of project.clj .

2018-03-16T12:32:33.000256Z

right, those would be dependencies brought in transitively

2018-03-16T12:32:58.000239Z

with dependency:tree, you’ll have to regex parse the output, but it’s regular at least

xtreak29 2018-03-16T12:34:04.000002Z

Looks like the above command downloads jars

xtreak29 2018-03-16T12:34:41.000341Z

➜  ~ mvn -f my-wonderful-copy-of-clojars/speclj/speclj/2.6.1/speclj-2.6.1.pom dependency:tree
[INFO] Scanning for projects...
Downloading: <https://repo.maven.apache.org/maven2/org/codehaus/mojo/build-helper-maven-plugin/1.7/build-helper-maven-plugin-1.7.pom>
Downloaded: <https://repo.maven.apache.org/maven2/org/codehaus/mojo/build-helper-maven-plugin/1.7/build-helper-maven-plugin-1.7.pom> (6 KB at 2.9 KB/sec)

2018-03-16T12:34:42.000059Z

yes, it will resolve all the dependencies in order to process their pom files, unfortunately

2018-03-16T12:34:59.000353Z

it will also download jars it needs to run dependency:tree

xtreak29 2018-03-16T12:36:08.000032Z

Got it. That will be a network heavy task for me. Since doing a full dependency detection of clojars in a tiny DigitalOcean droplet will kill it 🙂

xtreak29 2018-03-16T12:37:53.000452Z

Right now I have parsed the pom and inserted into MongoDB and using a query to see which libraries use dependencies that cannot run on Clojure 1.9 to file issues.

&gt; db.clojars.distinct("url", {dependencies: {$elemMatch : {artifactId: "core.async"}}, "version": {$lt: "0.3.442"}}, {url: 1, '_id': false})
[
	"<http://github.com/pleasetrythisathome/tao>",
	"<http://dsteurer.org>",
	"<https://github.com/cncommerce/beetlejuice>",
]
# More output

xtreak29 2018-03-16T12:39:47.000100Z

Problem is that there I parse every pom file and there are cases where the older maven file will tell me that there is an outdated dependency but it's fixed in the new one. Need to do something to get the latest pom in the directory to avoid false positives.

xtreak29 2018-03-16T12:41:13.000052Z

@tcrawley Do you know of any database that has all the info as a structured output, I am assuming that clojars uses a DB backend to construct the pages.

2018-03-16T12:42:06.000534Z

Clojars does store the direct dependencies in the db, but not the transitive ones

xtreak29 2018-03-16T12:42:36.000326Z

Is the db available for download somewhere in public ?

xtreak29 2018-03-16T12:43:47.000506Z

Assuming there is a dump of db or tables without any sensitive info I can use that instead of trying my own efforts.

2018-03-16T12:44:46.000331Z

No, there’s currently no way to get that db. We could possibly expose the dependencies via an api call, but that still wouldn’t give you the full tree

2018-03-16T12:45:10.000122Z

and you’d be lacking any custom repos defined in the poms to resolve the transitive tree for those dependencies

xtreak29 2018-03-16T12:46:05.000362Z

Got it. Thanks a lot for your help on this 🙂

2018-03-16T12:46:36.000333Z

My pleasure