clojure-europe

For people in Europe... or elsewhere... UGT https://indieweb.org/Universal_Greeting_Time
dominicm 2020-12-05T07:27:58.312700Z

Morning.

dominicm 2020-12-05T07:28:57.314300Z

My mind is on access control this morning. I'd love to hear the nitty gritty of your access control approach. Do you round trip the database? Use roles? Permissions? Jwt assertions?

slipset 2020-12-05T08:45:40.316300Z

@dominicm thanks for playing. Because of your interest, I’ll write up a blog post about it. I’ll let you know when it’s out, probably over the weekend.

slipset 2020-12-05T08:46:20.317600Z

It won’t be going into the nitty gritty of the security related questions, so I’ll be happy to answer them here:

slipset 2020-12-05T08:48:10.320600Z

We have users stored in mongo, so we do involve the database. We use friend and it’s workflows to deal with different authentication protocols like oauth and SAML

borkdude 2020-12-05T08:48:18.321Z

fwiw, we use the built-in role based stuff in yada

slipset 2020-12-05T08:51:01.325100Z

A word of caution. Even though the common saying is that you should not roll your own security, I’d say that the quality of some of the offerings in the Clojure ecosystem is not great. saml2.0-clj 1.x is a leaky bucket, and monger-session-store uses read-string to deserialize session data.

slipset 2020-12-05T08:54:50.325800Z

So we used to store all kinds of stuff in the session and stick it in mongo, but now we store only the user-id (IIRC) and stick it in redis instead.

slipset 2020-12-05T08:57:01.328300Z

As for authorization, we have a somewhat crazy mix of roles and permissions on given entities in a system. This means that a pure role based authorization scheme doesn’t quite work for us. I came in 3 years ago, and the authoriztion system had by that time “evolved” “organically” for four years, so it’s quite messy and because of backwards compatability it’s not straight forward to change it.

dominicm 2020-12-05T08:59:09.331600Z

I've just been reading owasp recommendations. Doing role checks is considered bad practice. It sounds like checks should be permission based from day 1, with roles having permissions.

dominicm 2020-12-05T08:59:45.333200Z

@borkdude how do you handle "has permission to access this resource" as in "only the owner of the flowers can view the flowers"?

slipset 2020-12-05T09:00:32.334Z

Basically, to understand if a user has access to a VERB /foo/bar/:id you need to know which verb (which we do), and here we do our first role-based check, reader ’s are (in general) not allowed to PUT, POST, nor DELETE Then, you have to check if the user has permissions (either through role, group, or granted permission) on the thing identified by :id to do the requested operation. This obviously involves the database

slipset 2020-12-05T09:01:59.335800Z

One of the problems we have with our role-based approach is that readers are being blocked from writing in our middleware, basically on a static whitelist basis (some urls are available for readers to write), which means that currently we can not give a user with a reader role write access by putting her in a group which has write-access.

slipset 2020-12-05T09:04:06.336800Z

So, it seems to me (which kind’a follows @dominicm’s owasp advice) that role based access to uri’s is a bit to simple of an approach for the real world.

slipset 2020-12-05T09:04:36.337400Z

Shit, I used the word simple. It has all sorts of connotations, should have probably used simplistic.

slipset 2020-12-05T09:05:52.338200Z

As to the flowers, I guess authorizaton only becomes interesting once you throw business rules into the mix, and when you do so, role based access to urls ain’t gonna cut it.

slipset 2020-12-05T09:07:31.339600Z

And, roles and groups are somewhat the same thing in my head. At least in some ways. The people who have a role form a group, so I guess roles could be implemented in terms of groups. but maybe not the other way around?

slipset 2020-12-05T09:07:51.340100Z

As I’m sure you’ve realized by now, this is not my area of expertise.

dharrigan 2020-12-05T09:08:47.340800Z

When I’ve implemented security, I modelled it upon Spring’s Security offering, which is also reflected in Apache Shiro.

dharrigan 2020-12-05T09:08:56.341Z

I use HMAC authentication and JWT too

slipset 2020-12-05T09:09:57.342200Z

I’ve not looked much into shiro, but as far as I understand, shiro calculates up front the ids of the tings a user has access to. In our system that upfront calculation could be expesive.

slipset 2020-12-05T09:10:27.343200Z

and if a user “only” wants to PUT /api/flower/123 why calculate all the other things she has access to?

dharrigan 2020-12-05T09:10:51.343700Z

The way I’ve used it, is I have users, groups, roles in the db. Each user belongs to a role and a role belongs to a group (groups can have group permissions too).

dharrigan 2020-12-05T09:12:51.346100Z

Every API request uses HMAC authentication - the user has to hit /api/authenticate first to get back the initial token. I store that user in a Redis cache (which by then has been enriched by their permissions). The JWT only contains the role and group then are in. When they then hit a protected URI, I do HMAC authentication, then I check their decoded JWT token against their user that I’ve looked up on Redis to discover their authorisation to do stuff. only then do I let them through.

dharrigan 2020-12-05T09:13:27.346600Z

All the JWT tokens are signed and checked too.

dharrigan 2020-12-05T09:14:07.346800Z

Security is hard 🙂

slipset 2020-12-05T09:17:05.347900Z

Our problem is that the calculation of a users permission is a somewhat hard problem.

slipset 2020-12-05T09:21:39.349300Z

SO, you might not have access to GET /api/flower/:id , but you might have access to it if you access it through GET /api/flower/:id?shop=myshop and you have been granted access to myshop

slipset 2020-12-05T09:23:14.350800Z

Now, you could argue that this should be modeled as access to GET /api/shop/:shop-id/flower/:id and that there should be some hierarchical check somewhere which checked access to shop-id first.

slipset 2020-12-05T09:23:46.351400Z

But anyhoop, the question of if you have access to the flower is not just dependent on its id.

dominicm 2020-12-05T09:32:33.353400Z

Well, urls are meaningless really. /lastflower could be equivalent to /shop/10/flower/100, which could be equivalent to /dfbhudsghjljfdd

slipset 2020-12-05T09:33:48.355500Z

Yeah, but I guess what I’m hinting at is that the effective permission in our system is dependent on a lot of things, so that it’s nearly impossible to calculate it up front.

dharrigan 2020-12-05T09:35:10.356600Z

the second form GET /api/shop/:shop-id/flower/:id would have been more “restful” (admittingly each place seems to implement rest differently 🙂 ) But I feel your pain Erik! 🙂

slipset 2020-12-05T09:36:07.357500Z

I’m with @dominicm here on the meaningfulness of urls :)

dharrigan 2020-12-05T09:36:21.357900Z

Would it be possible to deprecate the first form and transition to the second form?

dominicm 2020-12-05T09:37:05.359Z

@dharrigan all 3 are rest. Urls are meaningless in rest.

dharrigan 2020-12-05T09:37:23.359200Z

I disagree

dharrigan 2020-12-05T09:38:04.360500Z

This one GET /api/shop/:shop-id/flower/:id at least encodes some meaning within it

dominicm 2020-12-05T09:39:02.362500Z

As far as I recall, Roy Fielding didn't mention url hierarchies or meaning when he defined REST. I personally think encoding structure and meaning heavily into urls works against other rest principles.

dharrigan 2020-12-05T09:39:49.365100Z

Emperically, I believe it goes with REST principles, the evidence shows that accessing a resource via an id is widely used.

dominicm 2020-12-05T09:40:47.368200Z

But 1 is explicitly mentioned as an example of REST by Roy (kinda, he used "today's weather in Los Angeles" as well as "weather for Los Angeles on <date>", and later he uses "version of paper presented at conference")

slipset 2020-12-05T09:41:05.368800Z

It’s anyway hard to model REST apis correctly and vast amounts of time can be spent discussing which is better, but the difference in value between two urls representing the same thing is minimal. So I rather approach this from the view that I don’t really care.

😄 1
dharrigan 2020-12-05T09:41:39.369600Z

Agreed

dharrigan 2020-12-05T09:42:07.371Z

I used to care very passionately about whether to encode /v1 or whatever inthe URL (I am for putting it within the accept)

dharrigan 2020-12-05T09:42:14.371600Z

I gave up on that battle

borkdude 2020-12-05T09:42:40.372400Z

I tried to use the :copy method with yada, but it didn't support it :/

slipset 2020-12-05T09:42:51.372800Z

Another thing that this discussion brings to the table is that authentication is difficult to get right, but not a hard problem, whereas authorization is hard as the business rules grow.

dharrigan 2020-12-05T09:43:22.373200Z

yes, totally - authentication is pretty easy and straightforward. Authorization is a hairy mess 🙂

dharrigan 2020-12-05T09:43:38.373400Z

There’s always exceptions

slipset 2020-12-05T09:46:56.375200Z

With respect to api-versioning, I read https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/ and then Rich came along and solved the problem by saying always be backwards compatible.

👍 1
borkdude 2020-12-05T09:48:31.376600Z

Even in clojure core circles there's exceptions to this. E.g. clj made or will make some breaking changes as to not support two code paths indefinitely

slipset 2020-12-05T09:50:07.379100Z

Yeah, but is t that allowed because either the thing is in alpha which means anything goes, or it falls under fixation, ie a bug and as such can be squashed. Ie disregarding Hyrums law.

borkdude 2020-12-05T09:50:48.379800Z

The command line thing clj was never marked alpha. Only the underlying tools.deps.alpha lib which is not visible to the user.

1
dominicm 2020-12-05T09:51:32.380900Z

Sure, but I think it's still viewed as alpha despite the lack of communication/docs on that @borkdude

dharrigan 2020-12-05T09:51:51.381800Z

That’s a great article

borkdude 2020-12-05T09:51:52.381900Z

I'm not sure what you mean: clj is well documented?

dominicm 2020-12-05T09:51:53.382Z

Sometimes it seems like the only way to know whether something is really stable is to ask the author.

dominicm 2020-12-05T09:52:33.383600Z

@borkdude usage is. But the fact it was built on something alpha has always made me think that breaking changes would be forthcoming.

dharrigan 2020-12-05T09:52:36.383900Z

I’m just glad there are only 3 ways to specify a version…I would hate to think if there was 20 ways to specify an API version 🙂

borkdude 2020-12-05T09:52:38.384Z

Alex has said that he considered clj stable despite that tools.deps.alpha was marked alpha.

borkdude 2020-12-05T09:52:43.384200Z

In interviews

dominicm 2020-12-05T09:52:57.384900Z

Oh. But then, breaking changes. Well now I'm doubly confused!

borkdude 2020-12-05T09:52:58.385Z

In the ClojureScript podcast if I remember correctly

borkdude 2020-12-05T09:55:14.386700Z

I think not-breaking should be considered in context: 1) Do you control all of the usage (i.e. internal API): then it's fine I guess. 2) Is your feature still in development, is it only used by early adopters? Maybe it's fine. 3) Is the feature widely established and used by the masses, probably not ok. 4) Does breaking only affect a minority, maybe it's ok.

👍 1
slipset 2020-12-05T09:56:57.389100Z

That’s part of an argument I was planning on including in a talk proposal for the conj, in reply to a tweet by Stuart Halloway who was on Twitter saying it’s never ok to change the behavior of a name.

👍 1
dominicm 2020-12-05T09:58:34.390500Z

@slipset I still want to know more about how he breaks codebases into libraries like he mentioned.

slipset 2020-12-05T10:02:30.391100Z

https://twitter.com/stuarthalloway/status/1234261008560115712

👍 1
dominicm 2020-12-05T10:03:08.391800Z

That tweet was really interesting

dominicm 2020-12-05T10:03:53.392700Z

https://shiro.apache.org/realm.html#permission-based-authorization I think this implies a db round trip is made when you check permission.

borkdude 2020-12-05T10:04:01.393100Z

Personally I don't view such an utterance as absolute but just as something to think about. E.g. I can see where's he's going. When your IDE allows advanced refactorings, it can be very easy to create nonsense boilerplate code and make breaking changes.

borkdude 2020-12-05T10:04:40.393800Z

I read one example of a Haskell program that was developed in this way. It turned into a spaghetti monster type signature, but the author was even proud of this.

borkdude 2020-12-05T10:05:47.394400Z

Advanced refactoring can lead to fast coding without thinking, while thinking should always be balanced with coding.

borkdude 2020-12-05T10:06:14.394700Z

At least, this is my interpretation of Stu's tweet ;)

borkdude 2020-12-05T10:07:03.395Z

Personally I don't use a lot of refactoring tools. Maybe sort-ns is the only one. This may also be in part that I'm too lazy to set this all up.

dharrigan 2020-12-05T10:11:20.396200Z

sort-ns is about the only one I use too 🙂

dharrigan 2020-12-05T10:11:49.397100Z

I absolutely love the way, writing in Clojure, very strongly encourages me to write small functions and to think about what I write

dharrigan 2020-12-05T10:11:57.397500Z

so if I have to change something, it’s very localised and small

slipset 2020-12-05T10:11:58.397600Z

I use rename symbol quite a lot.

borkdude 2020-12-05T10:12:25.398600Z

Renaming locals is a local change (by nature) and this is where refactoring tools are quite harmless.

slipset 2020-12-05T10:12:29.398800Z

I think ‘extract function’ should burn in hell

borkdude 2020-12-05T10:13:28.399900Z

I also use quite a lot of rg and just go through the list manually. If I need to find patterns that are more sophisticated than just names, I can use: https://github.com/borkdude/grasp

👍 1
slipset 2020-12-05T10:14:41.402700Z

Renaming symbols inside some boundary which I control should also be ok. Which means that any fn I don’t expose to the outside of our company can be renamed. Which are all of them. The rest endpoints on the other hand....

borkdude 2020-12-05T10:15:07.403300Z

This is one of the things I appreciated when I was in a company that used Java and Ruby. When I sat next to a Ruby guy, he just used vim and rg (or grep). This blew my mind: simple tools can get a long way.

borkdude 2020-12-05T10:15:59.404Z

slipset: yeah, that should also be ok, unless you're writing libraries that are used in multiple projects maybe, etc, context.

borkdude 2020-12-05T10:17:38.404800Z

While on the topic, has anyone tried clojure-lsp in their editor of choice?

slipset 2020-12-05T10:22:04.406600Z

Btw, totally off topic, but it feels safe to discuss stuff here. I dare express stuff that might be stupid. Thanks for creating such an environment.

👍 3
borkdude 2020-12-05T10:22:40.407Z

The feeling is mutual :)

borkdude 2020-12-05T10:27:07.409800Z

While on the topic of safe environment: What if you're in a team and your team lead is leaving. Someone maybe has to step up to take the lead role, or you should hire a new person for this. Personally I just like software development and would like to reduce the number of meetings and manager-like tasks. I feel like going more towards the manager-type things is a bit of a trap that leads away from coding (the thing I currently enjoy) eventually. Some people may ambition this role (more responsibility, more money possibly, higher on the ladder). What are your thoughts on this?

borkdude 2020-12-05T10:30:49.410900Z

Maybe this should also be viewed in context. Might differ per company, team, background in the domain, etc.

borkdude 2020-12-05T10:33:34.413300Z

We're not actively recruiting yet, but it's likely that we are going to need a 1) team lead (we have a search product indexing biomedical literature) who can align business goals with tech 2) devops type person who likes to work with managed hardware and cloud. 3) UI/UX person (good at UX, but also good at making things look nice). Mostly Clojure, ClojureScript, NLP, AI, triplestores.

slipset 2020-12-05T10:39:53.420200Z

I think this is a situation that any somewhat experienced dev will find herself in. As your experience grows, you voice your opinions and you somewhat become a leader, even if you’re not given the title. As such, I tend to request getting the tech-lead role, not because I want to be the dictator of all things tech, but I want to have the authority to have the final word if there are discussions that never ends and a decision needs to be made. How to structure rest-endpoints could be such a discussion. I’m also a person who enjoys product development, like mentoring people, have an eye for ux (especially the stuff that doesn’t work) and as such probably make for a great team lead. My problem with the team lead role is that it very often leads to a mix between a project secretary (do all the boring, but important things that the project lead can’t be bothered with, ie estimations, keeping track on progress) and kindergarten-auntie, the person that follows up that people do what they’ve promised to do. The latter part of the team-lead role can go away as your team matures and understands its purpose, but neither the project-secretary nor the kindergarten-aunti are things that I do very well, and certainly not in combination with coding and problemsolving, which are the things that I really enjoy.

borkdude 2020-12-05T10:41:36.420900Z

There is project-secretary but also the guy who is pulled into every meeting that's on the surface between your team and higher up or sibling teams.

slipset 2020-12-05T10:43:26.424Z

I’ve been listening a lot to the idealcast with @genekim lately, and a lot of the stuff from team-of-teams resonates with me. I could be a teamlead of a team of my choosing which are given clear objectives, but with freedom to decide how to acheive those objectives.

borkdude 2020-12-05T10:43:31.424300Z

Btw, if any of the above roles appeal to you, you're welcome to reach out to me in private.

mpenet 2020-12-05T10:44:17.425300Z

some companies do not necessarly map climbing the supposedly career ladder with seniority and salaries

mpenet 2020-12-05T10:45:10.426300Z

I personally hate managing people and the associated tasks (I was "CTO" for a few years), never again. But I also choose companies where dev "seniority" is valued

mpenet 2020-12-05T10:45:26.426700Z

in our field I think it's more and more the norm

mpenet 2020-12-05T10:45:41.427300Z

You don't have to go the manager route necessarly

mpenet 2020-12-05T10:46:19.428600Z

then if it's a necessity because nobody is suited for it or you dont find a good fit it's another matter, especially if you have heavy stakes into the company

slipset 2020-12-05T10:47:06.429700Z

The team-of-team stuff comes from the Navy SEALS, and while they do seem to have their management issues, they’re different from ours. They seem to be given somewhat clear objectives like “capture that guy”, and are then let free to figure out how to do that

mpenet 2020-12-05T10:47:09.430Z

I guess that's how we got to these "staff engineer" titles and the like

mpenet 2020-12-05T10:47:45.431400Z

highly valued engineers that contribute to tech decisions while spending their day to day actually doing engineering tasks

borkdude 2020-12-05T10:48:07.432100Z

I know there are (technically highly skilled) people who like building and empowering teams from scratch or reviving existing teams.

slipset 2020-12-05T10:48:09.432200Z

In terms of software, one of my colleagues used to work on the the dragonfly project at Opera. The objective he and his team was given was “build the best javascript-debugger”

slipset 2020-12-05T10:48:40.432600Z

He had a meeting once a year or so with his stakeholder. That was it.

borkdude 2020-12-05T10:49:14.433600Z

That's perfect. Although I usually tend to seek more feedback from users than once a year. This is also a hard problem in our company: it is still trying to find out who its users are.

slipset 2020-12-05T10:49:48.434700Z

He and his team got feedback from users (the users of the debugger) all the time on the internets. Just not from the stakeholders inside the company.

borkdude 2020-12-05T10:49:54.435Z

Ah right

mpenet 2020-12-05T10:51:14.436900Z

exoscale as a nearly flat hierarchy, small squads (like 5 people per), 1 squad lead per squad, that is sort of a speaker for the whole squad and coordination pt with higher hierarchy (not many levels up, basically they report to vp eng/cto directly). little meeting overhead. It's a pretty good setup

slipset 2020-12-05T10:51:22.437200Z

So, our company, the difference would be somewhat like “build x in the way the stakeholder has envisioned the thing on a timeline decided by someone”, and “it seems like our users have this problem, please go spend a year or so solving that problem for them”

mpenet 2020-12-05T10:52:08.438Z

about dev vs ops, the lines are getting more blury as time passes

slipset 2020-12-05T10:52:24.438600Z

and of course, given the second approach, you’d be reporting back at regular intervals as new findings are discovered.

mpenet 2020-12-05T10:53:14.440Z

It's more a more a dev role to know/do ops nowadays. You have to plan in consequence when you dev anyway so you get to know most of "ops" task. SRE is something else

slipset 2020-12-05T10:53:44.440600Z

The second approach also requires a cross functional team and acknowledging that not all team-members be 100% utilized all the time,.

borkdude 2020-12-05T10:55:34.441500Z

Whatever you want to call it: our software currently runs on bare metal on the team lead who left's servers in a rack. We have to migrate this to other managed hardware and/or cloud. He will give us the time we need, so it's not a panic operation, but it requires work and expertise.

mpenet 2020-12-05T11:01:12.443500Z

Right, tricky situation

borkdude 2020-12-05T11:02:20.444700Z

Everything is dockerized and currently runs on docker swarm. Porting to another system should not be a problem. It's more about requirements in terms of CPU, GPU and RAM.

borkdude 2020-12-05T11:03:07.445400Z

And choosing what is cost-effective

raymcdermott 2020-12-05T11:39:56.445800Z

morning

raymcdermott 2020-12-05T11:42:53.448600Z

may I add to the earlier team discussion, that I detest the notion that the Navy Seals or any other “elite” military group should be the inspiration for writing software or working in teams. It feels gross and I particularly dislike the salivating over their “learnings” and experience in Iraq :face_vomiting:

☝️ 2
slipset 2020-12-05T11:45:35.448800Z

May I ask why?

raymcdermott 2020-12-05T11:46:45.449800Z

mostly cos of the killing but also because their objectives are often illegal

raymcdermott 2020-12-05T11:47:04.450100Z

they are the ultimate rogues

raymcdermott 2020-12-05T11:47:36.450800Z

and I was wondering whether that is why FaceBook etc. is often lauded for its rogue behaviour

raymcdermott 2020-12-05T11:48:23.451700Z

the culture of assassination teams is not something that I feel comfortable about owning

raymcdermott 2020-12-05T11:48:38.452Z

I hope that clarifies 🙂

slipset 2020-12-05T11:49:14.452700Z

It does, but it does raise a whole bunch of questions in my head. And I really appreciate that, because it forces me to think about stuff.

raymcdermott 2020-12-05T11:50:10.454400Z

by rogues I don’t mean the ‘bad boy’ style I mean actual war criminals oppressing and openly killing and torturing civilians

slipset 2020-12-05T11:50:32.455100Z

My first thought is that I think that it could be argued that our whole field is based on needs of the military. And that killing is killing.

raymcdermott 2020-12-05T11:50:39.455400Z

which is not quite FaceBook but they are on a continuum

slipset 2020-12-05T11:51:02.456200Z

The second thought is if it’s wrong to observe highly effective units and learn from their organization.

slipset 2020-12-05T11:53:19.458800Z

Example: From team of teams, they state that the time from sighting to killing went from 72 hours to 40 minutes. This is in one way horrible, but it’s also a fairly clear example of an organization that evolved in a way that could be desirable. Why not learn from what that organization did even though you disagree with the outcome of the organization?

slipset 2020-12-05T11:55:09.460Z

But, I appreciate that there can be different views on this, and also that it’s more interesting thinking about your viewpoint than trying to convince you that mine is correct.

slipset 2020-12-05T11:57:35.461700Z

FWIW, I think the SEAL teams have quite strict rules of engagement.

1
raymcdermott 2020-12-05T12:00:27.464200Z

For me, the notion that reducing sighting -> killing is a transferrable metric is weird. And I had to turn off the IdealCast when he was interviewing the authors because the sycophancy towards the military was just too much. The war on Iraq was an illegal act which left 100s of 1000s dead and the whole region in turmoil. Thinking about it like that provides better lessons IMHO.

slipset 2020-12-05T12:03:13.465800Z

Did you listen to the episodes with/about Rickover? Are the learnings from how they built the first nuclear subs lessons we should learn from?

slipset 2020-12-05T12:04:08.466400Z

Not really an interesting question, sorry.

slipset 2020-12-05T12:04:56.467400Z

I guess one problem is that post Iraq, we have a bunch of SEALS/Special Ops people who try to make a living as management-consultants.

🤢 1
slipset 2020-12-05T12:07:13.469700Z

I really appreciate your views though, as I haven’t thought about it this way before, but I guess the story telling for simple minds like myself is appealing: “Here’s an example of these elite dudes operating efficiently in a difficult environment” I guess what we need is similar stories, but from other fields.

raymcdermott 2020-12-05T12:49:56.470400Z

I’m getting a headache just thinking about it tbh

slipset 2020-12-05T13:07:14.471100Z

Thinking in general does that to me.

😂 1
jasonbell 2020-12-05T13:44:20.471300Z

Morning

2020-12-05T15:02:34.471600Z

Morning

orestis 2020-12-05T15:48:35.477200Z

Nice discussion. I share @raymcdermott’s feelings about borrowing terms from the military. After a long solo career I’ve found myself leading a team for 1 year now, including making hard decisions like firing, but also hiring and onboarding new members. The number one thing I’m going for is fostering safety and comfort so that people enjoy showing up at meetings and sharing their thoughts, and also offering up their code for review without any hesitation.

orestis 2020-12-05T15:50:07.479400Z

Other than that, coordinating with the “business” and trying to figure out what to build next and how to go about it (a bit of PM work there), indeed doing “housekeeping” work like taking notes and calling meetings... I still code like 75% of the time though.

orestis 2020-12-05T15:52:45.481800Z

Regarding authorization, I think REST and URLs muddy the waters. We’re using graphql which has fine-grained mutations so authorization is easy to define (can user x do action y on resource z). It almost always involve DB access.

orestis 2020-12-05T15:53:50.483600Z

For reading, we usually push authorization down to the query level, as in we try to encode rules in an sql/mongo query. Sometimes you need a query to do that though :)

borkdude 2020-12-05T15:56:48.486200Z

@orestis I think it totally depends on the company and context. I don't have a strong enough background in the domain and don't know the market well enough in which the product is operating, so I don't feel like the right person to lead a team in this context. In other contexts I might feel comfortable doing so, not sure.

orestis 2020-12-05T16:00:02.487900Z

It’s a tall order to make both technical decisions and know the market or having a background in the domain... in my case I just make sure I talk a lot to people who do and ask a lot of questions to figure out needs etc.

orestis 2020-12-05T16:02:06.491100Z

From time to time we will hold planning meetings and for that I will usually have ready some proposals of things that have both been raised in the past and within technical reach. We are in the process of moving away from a terrible legacy system to a new code base and database so not everything we’d like is possible, but there’s a long term plan going on in the back of my mind to guide us through.

orestis 2020-12-05T16:03:27.493Z

Oh and I usually have to repeat and repeat and repeat myself - in presentations and meetings and conversations, both because people forget but also because it’s common for myself to forget to mention details that have changed over the course of time :)

borkdude 2020-12-05T16:04:09.493400Z

What kind of product are you making, if that's not too secret?

orestis 2020-12-05T16:05:20.494Z

https://nos.co/

orestis 2020-12-05T16:05:55.494900Z

Not hugely challenging in a technical point of view, but definitely challenging in other ways :)

borkdude 2020-12-05T16:07:07.495600Z

That's one of the things. Some parts in our stack over a bit over my head, our previous team lead had all kinds of ideas about AI solutions. I'm not that kind of person (so context).

orestis 2020-12-05T16:09:11.498200Z

Yeah I feel that. I would like to hire some export for some ML/AI stuff too since I’ve never done it and it seems promising, but hiring for an area you’re clueless in is so difficult.

borkdude 2020-12-05T16:09:51.498700Z

Luckily we already have this expertise. But none of the team is probably desiring the lead role.

borkdude 2020-12-05T16:10:19.499100Z

Anyway, thank you for letting me express my worries and thoughts here ;)

orestis 2020-12-05T16:23:24.499500Z

Anytime, I love this channel 🥰

slipset 2020-12-05T16:24:30.001700Z

@orestis sounds like you should hire an AI/ML consultant for a while to map out the possibilities. Hiring one without really knowing if you’ll need one seems strange.

slipset 2020-12-05T16:24:41.002Z

Or, you could hire me.

slipset 2020-12-05T16:24:49.002400Z

I offer No as a service

2
orestis 2020-12-05T16:25:04.003100Z

It’s also a personality type. I abhor a vacuum of leadership so unless someone plays that role I will push myself to assume it, at least transitionally :)

orestis 2020-12-05T16:25:11.003400Z

Haha 😂

slipset 2020-12-05T16:25:22.003800Z

The deal is, you ask a question, like “Do we need AI to solve our problem”, I answer “No”

slipset 2020-12-05T16:25:36.004300Z

I’m not very expensive.

borkdude 2020-12-05T16:25:42.004600Z

Hired.

slipset 2020-12-05T16:26:13.005400Z

https://github.com/slipset/no

borkdude 2020-12-05T16:26:40.006200Z

No as in YAML no, or boolean no?

slipset 2020-12-05T16:27:11.007400Z

No as in No, Not no as in Norwegian 🙂

slipset 2020-12-05T16:28:11.009Z

I can also recommend the services of my slightly less serious friends http://hahahaha.no

orestis 2020-12-05T16:29:06.010600Z

Ok so here’s the problem. We have small corpora of documents, like ranging from dozens to perhaps tens of hundreds (each client is completely independent from others). Within those corpora, we need to suggest similar documents, but also take into account user activity (you read this and that so you might like to read that thing too). Also clustering and topic extraction. Off the shelf solutions we’ve tried usually give disappointing results. What field of informatics would help there?

borkdude 2020-12-05T16:30:04.011900Z

@orestis We already have that in our stack

orestis 2020-12-05T16:30:18.012500Z

Each corpus is isolated and business specific so it contains unique jargon. And we can’t spend our time cleaning up text, it has to be completely hands off...

borkdude 2020-12-05T16:30:58.013300Z

Check out https://covid19.doctorevidence.com/. This is fully built on top of medical ontologies paired with NLP and AI stuff

orestis 2020-12-05T16:31:27.014100Z

I’d love some suggestions on your NLP and AI stack :)

borkdude 2020-12-05T16:31:49.014900Z

similarity you can do when you know what features to extract. we use: https://milvus.io/

borkdude 2020-12-05T16:32:02.015500Z

you basically have to build a vector for each document

borkdude 2020-12-05T16:32:23.016300Z

and then this thing will give you suggestions based on vectors you put in it as the prototypical documents

orestis 2020-12-05T16:32:35.016900Z

The funny thing is that this is not our core thing, just a super super nice to have and impressive sales demo... so I can’t just yet justify spending a ton of time when other things are definitely doable and more important.

borkdude 2020-12-05T16:32:52.017400Z

For NLP we use StandfordNLP. Mind that its license doesn't let you distribute your software, but SAAS should be ok

borkdude 2020-12-05T16:34:56.017900Z

@orestis I guess you are also using elasticsearch right?

orestis 2020-12-05T16:35:54.018500Z

We are currently using SolR but will migrate to ES soon.

borkdude 2020-12-05T16:36:58.020100Z

Do you also use postgres perhaps?

orestis 2020-12-05T16:37:06.020500Z

Gotta run, the AI has woken up and demands feeding and interaction and a nappy change :) thanks for the talk, I might ask again about these things next year!

orestis 2020-12-05T16:37:22.021100Z

We’re running on RDS so no zombodb for us :(

borkdude 2020-12-05T16:37:29.021300Z

ah right.

val_waeselynck 2020-12-05T18:42:47.023200Z

One of the great upsides of small corpora is that you can feel free to use algorithms that don't scale linearly in corpus size, such as certain clustering algorithms, and that you can do everything in memory without unwieldy Big Data infrastructure.

borkdude 2020-12-05T18:53:46.024800Z

> Be aware, though, that you are engaging into the domain of ML/IR problems, which are inherently harder than business logic problems. If off-the-shelf tech doesn't work right away, I wouldn't count too much on achieving something impressive in a short time. It's an unfortunate reality that this sort of problem seems both easier to laypersons than ordinary information-processing, and is actually much harder on programmers. True. I also feel more comfortable with business logic because I can understand it. With AI/ML it's always: yeah, now it's 95% accuracy, tweak this parameter, now it's 94%, use this algorithm, 96%.

borkdude 2020-12-05T18:55:01.025400Z

And then the questions from the business come: can you tweak it so this results makes it higher up? Yes, we can, but it has this consequence on the other results ;)

val_waeselynck 2020-12-05T19:01:58.026800Z

That, and you never really know when you're done. You typically can't write a test suite for your code that will give a reliable binary answer to the "does it work?" question.

2020-12-05T19:18:23.027500Z

I'm really happy with the community we're building on this channel too

❤️ 1
2020-12-05T19:26:28.028500Z

One of these days I'll do a little personal nlp project and build rememberance agent

val_waeselynck 2020-12-05T20:02:53.029500Z

@otfrom «rememberance agent» ? What does that mean?

2020-12-05T20:06:20.029700Z

https://github.com/zzkt/remembrance-agent

orestis 2020-12-05T20:25:38.031700Z

Thanks for the book rec @val_waeselynck , there seems to be an online edition too. I’ve absorbed most of the terms through random walks but I vastly prefer books.

orestis 2020-12-05T20:26:55.033200Z

I hear you on people expecting this to be easy. The bright side is that all of our competitors “related content” feature is crap too 😅

orestis 2020-12-05T20:29:38.036900Z

Having spent some time thinking about how to solve this, I think what we’ll end up doing is building a “markup tool” where authors of content could annotate and highlight keywords, topics, summaries etc. I believe it’s a necessity to “show your work” in this kind of context so black box algorithms are not going to cut it...

orestis 2020-12-05T20:31:34.039600Z

I’m rambling a bit cause it’s late but what I mean is - the chances for a team of 3 to build a cutting edge topic extraction algorithm as a side project is pretty slim, and it seems like our original content is also crap - typos, grammatical errors, jargon etc. So even the best NLP software can’t deal with those unsupervised.

orestis 2020-12-05T20:32:29.041Z

So I’m thinking of a very basic weighted search algorithm that operates on rich data that people provide. No fancy ML perhaps but probably attainable :)

2020-12-05T21:17:36.045300Z

For some values of “similar documents,” you can get quite acceptable results with relatively basic statistically unlikely word match algorithms. Something like: throw out stopwords (very common words), maybe stem the remainder (remove grammatical inflections), and rank other documents by the degree to which their words overlap. Understanding the semantics is surprisingly unnecessary for many (but not all) problem domains. A next level enhancement is extending this with word proximity / ngrams: sequences of n words that appear together. Annotations / markup can be helpful if people actually use them. The main issue I’ve run into there is that few users will actually bother to do the annotating.

2020-12-05T21:19:34.046300Z

Some people bypass this whole process by just dumping everything into Elasticsearch / Lucene. It can get expensive to operate when the document set is very large, but OTOH the time to get something running that’s approximately useful is much smaller than trying to build your own.

borkdude 2020-12-05T21:21:29.047800Z

> The main issue I’ve run into there is that few users will actually bother to do the annotating. Yes, this is where AI often comes in, for tagging things into buckets

2020-12-05T21:21:31.048Z

Another non-DIY approach is to use Postgres’ built in full text search engine or similar software. Search for the first heading in the current document (or first sentence, paragraph, whatever is useful in your specific document type), and get a ranked list of similar documents back.

borkdude 2020-12-05T21:23:18.049100Z

@paul.legato I see you are in a postgres-as-a-service company. Do you offer solutions with plugins like ZomboDB supported and perhaps also a connected ElasticSearch cluster?

2020-12-05T21:23:41.049500Z

Oh, I haven’t been involved with that company for years now! Can I ask where you saw that, so I can update it? 🙂

2020-12-05T21:24:07.050100Z

The short version is: Google and Amazon built identical hosted Postgres services and started selling them at a massive loss, in a price war with each other.

borkdude 2020-12-05T21:24:10.050300Z

Twitter

2020-12-05T21:24:16.050500Z

ahh, thanks

2020-12-05T21:24:56.050700Z

We did stuff like that, yes

borkdude 2020-12-05T21:25:13.051200Z

Oh I see you haven't updated your profile in a while on Twitter, sorry for not looking better ;)

2020-12-05T21:25:29.051600Z

Heh, no worries. I rarely use Twitter personally