announcements

Project/library announcements ONLY - use threaded replies for discussions. Do not cross post here from other channels. Consider #events or #news-and-articles for other announcements.
2021-01-14T11:46:10.002500Z

https://github.com/dainiusjocas/lucene-grep Lucene based grep-like utility compiled with GraalVM native-image. Grab a binary and tell me what you think. Cheers!

πŸ’― 6
😍 1
2021-01-15T14:51:55.031600Z

It was not complicated πŸ˜„

2021-01-15T14:52:58.031800Z

However, the issue is that with the default Lucene I can get either Scoring of highlighting 🀦e

2021-01-15T14:53:46.032100Z

I have to implement a class that does both πŸ™‚

2021-01-15T14:54:46.032300Z

@tvaughan the feedback is welcome πŸ˜‰

tvaughan 2021-01-15T15:00:04.032500Z

πŸ™‚ Scoring is more important to me. And normalization of scores too. From my prior experience with Elasticsearch, I remember that scores across indicies were not comparable. I'm hoping scores across different files are 🀞 Again, thanks for taking the time to create lmgrep, @dainius.jocas

2021-01-15T15:27:33.032700Z

Yeah, with elasticsearch score are not comparable not only between indices but also between fields within an index πŸ™‚

2021-01-15T15:29:22.032900Z

The scoring with lmgrep is with gotchas. As of now, every line is scored separately. Every line is treated as a document with one field. The temporary index is being created with that one document. Then the query is run against that temporary index.

tvaughan 2021-01-15T15:36:32.033100Z

Interesting. Thanks for sharing these details

2021-01-16T11:08:57.051Z

I plan to write a blog post on the details in the coming week

borkdude 2021-01-14T11:50:01.002700Z

wow, awesome :)

2021-01-14T11:52:06.002900Z

thanks!

borkdude 2021-01-14T11:52:43.003100Z

If I do this in a Clojure repo:

lmgrep "select-keys" .
should this work? it turns up empty

borkdude 2021-01-14T11:53:46.003300Z

I would expect it to search the dir recursively

2021-01-14T11:54:00.003500Z

the problem is with the . at the end

2021-01-14T11:54:22.003700Z

as of now the file pattern is GLOB

tvaughan 2021-01-14T11:54:37.003900Z

Super cool! Would it be possible to output the "score" associated with each match?

borkdude 2021-01-14T11:54:46.004100Z

the problem with glob is always: is it recursive or not? this is always different per platform

2021-01-14T11:55:26.004300Z

@borkdude if you specify then it is recursive

borkdude 2021-01-14T11:55:37.004500Z

this also doesn't return anything for me:

lmgrep  "keys" **/*

borkdude 2021-01-14T11:56:04.004700Z

Oh I see:

lmgrep  "keys" "**/*"
I should quote the glob pattern

borkdude 2021-01-14T11:56:20.004900Z

yes, that works, perfect

2021-01-14T11:57:00.005100Z

@borkdude yeah, put the GLOB in double quotes πŸ˜‰

2021-01-14T11:58:04.005300Z

@tvaughan as of now it is not supported, but there is a Class in Lucene that does just that, so it is possible

πŸ‘ 1
2021-01-14T12:00:35.005800Z

@borkdude for code search I'd suggest to specify the letter tokenizer, because the default analyzer doesn't split text on ., which is a bit unexpected IMO, e.g. lmgrep --tokenizer=letter "select-keys" "**.*"

borkdude 2021-01-14T12:01:51.006Z

yeah. it would be cool if the score was returned as @tvaughan suggests and EDN output would also be nice, so you could sort the results (e.g. pipe the results to babashka and then do some processing)

2021-01-14T12:06:05.006200Z

@tvaughan,@borkdude, I agree that it would be nice to sort on score, but hint me how would you like the output to look like?

borkdude 2021-01-14T12:07:20.006400Z

probably just maps with :file, :line, :column, the line :text (optionally) and :score?

borkdude 2021-01-14T12:07:51.006600Z

I would just output the maps on the fly, streaming, not wrapped inside a collection

borkdude 2021-01-14T12:08:06.006800Z

maybe one map on each line

2021-01-14T12:10:07.007Z

Got it. So I imagine it will be something like lmgrep --with-score "query" GLOB , i.e. under a flag

tvaughan 2021-01-14T12:10:10.007200Z

Assuming compatibility with grep isn't a concern and results are sorted by score: [SCORE]:[FILE_PATH]:[LINE_NUMBER]:[LINE_WITH_A_COLORED_HIGHLIGHT] . I personally don't have much of a preference. I could awk/cut the output easily enough. As @borkdude suggests, edn output would be super helpful

πŸ‘ 1
borkdude 2021-01-14T12:12:55.007800Z

@dainius.jocas Maybe you can make this more flexible by allowing a --columns argument with a comma separated list of options, which also determines the order

borkdude 2021-01-14T12:13:57.008Z

or even better, a template:

--template "{{score}}:{{file}},{{line}}:{{column}}:{{text}}"

borkdude 2021-01-14T12:14:29.008300Z

and you can have {{text}} or {{colored-text}} if you want one of both

borkdude 2021-01-14T12:14:48.008500Z

or maybe --no-colors should just be an option

2021-01-14T12:15:50.008700Z

Yeah, I was thinking about a template or a pattern as an option πŸ‘ left it out for the first iteration

borkdude 2021-01-14T12:18:47.008900Z

I support something similar in clj-kondo

2021-01-14T12:20:46.009500Z

Nice! I'll shamelessly copy it as much as possible πŸ˜„

alexmiller 2021-01-14T17:15:08.013600Z

It's that time of year again - the https://www.surveymonkey.com/r/clojure2021 is now open! We would love to get your feedback from all Clojure/ClojureScript/ClojureCLR users. Takes < 10 minutes and we release all the data. Please share with your colleagues who might not be seeing it in forums like these.

27
πŸ“œ 1
πŸŽ‰ 18
βœ… 11
dgb23 2021-01-15T10:46:01.025800Z

I also think babashka should be in there! Mabye even clojerl?

alexmiller 2021-01-15T13:53:09.029500Z

as mentioned above, added babashka for consideration next year. I don't think anyone is actually using clojerl in anger.

alexmiller 2021-01-15T13:54:10.029700Z

@p-himik I think Chromium can be used independently as a component? David Nolen requested that, can't remember now why

p-himik 2021-01-15T15:47:43.033300Z

@dnolen Could you please comment on the above? I'm genuinely interested but can't find any information.

2021-01-14T17:26:04.014800Z

I wonder if babashka should be a dialect option.

πŸ’― 4
alexmiller 2021-01-14T17:34:05.015500Z

I will make a note to consider for next year

πŸ‘ 9
dharrigan 2021-01-14T17:42:13.015900Z

May I suggest something too, or would you perfer another way of suggesting an addition?

alexmiller 2021-01-14T17:48:02.016400Z

here's fine

dharrigan 2021-01-14T17:48:23.016600Z

Could you add in "Insurance" as an sector/industry for next year.

dharrigan 2021-01-14T17:48:34.016800Z

huuuge area πŸ™‚

☝️ 1
alexmiller 2021-01-14T17:49:25.017Z

please add that as an Other response - I look at those every year and anything with high responses I add for the next year

dharrigan 2021-01-14T17:49:38.017200Z

no problemo

alexmiller 2021-01-14T17:50:01.017500Z

Other for that particular question that is

dharrigan 2021-01-14T17:50:07.017700Z

understood

alexmiller 2021-01-14T17:50:08.017900Z

I review all of those from prior year

alexmiller 2021-01-14T17:50:47.018300Z

Insurance was only mentioned 8 times last year in the other responses

2021-01-14T17:55:44.018500Z

if only there was some way you could have been ready for the risk and claimed some kind of compensation (sry, sry)

7
pez 2021-01-14T18:50:46.020400Z

Need better tutorials / guides. For me it is rather β€œNeed more tutorials / guides”.