Good Morning!
✌️
Good morning :spock-hand:
Good morning
Morning
Morning :)
morning
What algorithm would you choose to compare post addresses? My current project is using Jaro-Winkler, and it’s horrible 😕
I believe every approach that uses a string distance is a bad one for structured addresses. Leave out a ZIP code, and the distance explodes. Have similar words with different meanings, and you get a false match
Normalizing the data is, of course, not possible
equivalence of esszets and double s is a personal favourite
similiarly for umlauts
Yeah. Or “P.O. Box”, “Post Box”, “P.O”… “Street”, “Str.“, … Just blindly applying a function to a string won’t do good, you’d need to intelligently tokenize normalize. Oh, and all of the addresses are global, of course. Germany, China, U.S.A and Chile alone don’t have comparable address formats in general
The main issue is, that every time the software matches something wrong, somebody creates an issue I have to investigate 🙈
that algorithm sounds like a case for a new SaaS business
:thinking_face:
Bucks and Buckinghamshire is a fun one too
> that algorithm sounds like a case for a new SaaS business Good luck with the GDPR compliance...
The issue is, that the customer cannot understand that “I want to find a business partner with a similar address in my database of 3.000.000 addresses” is not an easy problem to solve.
what’s wrong with gdpr? as far as I understood it’s only about adresses and not names
@javahippie “similar address” is very broad 🙂 does it need to go by street?
It’s “Company, Street/P.O Box, Zip Code, City, Country”. But can be anything somebody in an office enters. Things that also appear sometimes: District, building, floor, office number…..
Never said that to a customer but.. I believe they need an AI 😄
Good luck debugging the AI when the customer reports the next "wrong match". Hm... I use this "good luck" phrase too often. 🙈
The address send to your SaaS could be anything. Including names, persons and the like.
insert universal greeting
Howdy! Made this script to detect code using some spec pattern: https://gist.github.com/borkdude/a391146ad81a06c28fb97ccdc1f64d44 I'm considering of building this out to a library.
Note sure if spec would be the way to go or malli. I guess that's a typical 2020 Clojure problem. As of now, it would be spec, but in the future malli might be more flexible
@borkdude while you're here. I was thinking about something we've probably discussed before: clj-find-usages
Which would be something I could invoke (a bit like clj-kondo) from emacs which would statically analyze my project and find the usages of some symbol.
Basically, my problem is that the find-usages
in Cider is not Working(TMO)
I think some plugins already do this. It's possible using clj-kondo's analysis output.
@slipset One example is https://github.com/didibus/anakondo which provides completions, but could in theory also jump to definition. Maybe it can be extended with usages as well
I see it's on their roadmap. Maybe you could help @didibus
But for the spec tool I'm considering, I possibly want to support patterns using fully qualified symbols so alias usage will match on that as well
e.g. (require '[foo :as f]) (f/dude)
and searching with foo/dude
will give you the match for (f/dude)
For matching addresses -> https://github.com/openvenues/libpostal
there are also higher level language bindings though it seems like the Java lib hasn't been touched for a while
That’s nice! Will suggest this, it should be possible to wire it into the pipeline. Thanks!
Made a thing today: https://github.com/borkdude/grasp
as I saw on the #announcements woah
I think we all know kung fu now
and congrats 🙂
Mailing lists: a cool way to receive updates or kinda lame? Thinking for software projects.
mailing lists are cool
everyone has email