https://github.com/dainiusjocas/lucene-grep Lucene based grep-like utility compiled with GraalVM native-image. Grab a binary and tell me what you think. Cheers!
That was fast, @dainius.jocas! https://github.com/dainiusjocas/lucene-grep/commit/4e4556b7602e21aa4188f30c788e56e98a74220e π₯
It was not complicated π
However, the issue is that with the default Lucene I can get either Scoring of highlighting π€¦e
I have to implement a class that does both π
@tvaughan the feedback is welcome π
π Scoring is more important to me. And normalization of scores too. From my prior experience with Elasticsearch, I remember that scores across indicies were not comparable. I'm hoping scores across different files are π€ Again, thanks for taking the time to create lmgrep, @dainius.jocas
Yeah, with elasticsearch score are not comparable not only between indices but also between fields within an index π
The scoring with lmgrep is with gotchas. As of now, every line is scored separately. Every line is treated as a document with one field. The temporary index is being created with that one document. Then the query is run against that temporary index.
Interesting. Thanks for sharing these details
I plan to write a blog post on the details in the coming week
wow, awesome :)
thanks!
If I do this in a Clojure repo:
lmgrep "select-keys" .
should this work? it turns up emptyI would expect it to search the dir recursively
the problem is with the . at the end
as of now the file pattern is GLOB
Super cool! Would it be possible to output the "score" associated with each match?
the problem with glob is always: is it recursive or not? this is always different per platform
@borkdude if you specify then it is recursive
this also doesn't return anything for me:
lmgrep "keys" **/*
Oh I see:
lmgrep "keys" "**/*"
I should quote the glob patternyes, that works, perfect
@borkdude yeah, put the GLOB in double quotes π
@tvaughan as of now it is not supported, but there is a Class in Lucene that does just that, so it is possible
@borkdude for code search I'd suggest to specify the letter
tokenizer, because the default analyzer doesn't split text on .
, which is a bit unexpected IMO, e.g. lmgrep --tokenizer=letter "select-keys" "**.*"
yeah. it would be cool if the score was returned as @tvaughan suggests and EDN output would also be nice, so you could sort the results (e.g. pipe the results to babashka and then do some processing)
probably just maps with :file, :line, :column, the line :text (optionally) and :score?
I would just output the maps on the fly, streaming, not wrapped inside a collection
maybe one map on each line
Got it. So I imagine it will be something like lmgrep --with-score "query" GLOB
, i.e. under a flag
Assuming compatibility with grep isn't a concern and results are sorted by score: [SCORE]:[FILE_PATH]:[LINE_NUMBER]:[LINE_WITH_A_COLORED_HIGHLIGHT]
. I personally don't have much of a preference. I could awk/cut the output easily enough. As @borkdude suggests, edn output would be super helpful
@dainius.jocas Maybe you can make this more flexible by allowing a --columns
argument with a comma separated list of options, which also determines the order
or even better, a template:
--template "{{score}}:{{file}},{{line}}:{{column}}:{{text}}"
and you can have {{text}}
or {{colored-text}}
if you want one of both
or maybe --no-colors
should just be an option
Yeah, I was thinking about a template or a pattern as an option π left it out for the first iteration
I support something similar in clj-kondo
https://github.com/clj-kondo/clj-kondo/blob/master/doc/config.md#print-results-with-a-custom-format
Nice! I'll shamelessly copy it as much as possible π
It's that time of year again - the https://www.surveymonkey.com/r/clojure2021 is now open! We would love to get your feedback from all Clojure/ClojureScript/ClojureCLR users. Takes < 10 minutes and we release all the data. Please share with your colleagues who might not be seeing it in forums like these.
I also think babashka should be in there! Mabye even clojerl?
as mentioned above, added babashka for consideration next year. I don't think anyone is actually using clojerl in anger.
@p-himik I think Chromium can be used independently as a component? David Nolen requested that, can't remember now why
@dnolen Could you please comment on the above? I'm genuinely interested but can't find any information.
I wonder if babashka should be a dialect option.
I will make a note to consider for next year
May I suggest something too, or would you perfer another way of suggesting an addition?
here's fine
Could you add in "Insurance" as an sector/industry for next year.
huuuge area π
please add that as an Other response - I look at those every year and anything with high responses I add for the next year
no problemo
Other for that particular question that is
understood
I review all of those from prior year
Insurance was only mentioned 8 times last year in the other responses
if only there was some way you could have been ready for the risk and claimed some kind of compensation (sry, sry)
Need better tutorials / guides. For me it is rather βNeed more tutorials / guidesβ.