By the way, neither the :uri nor the :path-info should be url-decoded.
:uri is always the raw URI. And (str context path-info) == uri
Just to clear things up. :simple_smile:
@weavejester: in the servlet spec, path-info is always returned decoded from HttpServletRequest.getPathInfo()
so (str context (encode path-info)) == uri
@tcrawley: Right, but one of the nice things about Ring is that we don’t need to make the same mistakes as servlets. Because URL encode/decode is lossy, particularly since Java changes “+” into “ “, it makes more sense to keep path-info raw.
ah, so you're saying immutant should instead calculate the raw path-info when constructing the request map instead of using getPathInfo
? That might work
Yep. I actually had assumed that the servlet spec didn’t decode PathInfo, which is why there isn’t a huge amount of documentation for it.
getContextPath
isn’t decoded, strangely.
So you can just use (ring.util.request/set-context request (.getContextPath servlet-request))
Great, thanks for your input. I'll take a look
@weavejester: i take your point, but since context and path-info are strictly servlet things (not ring), why not follow their spec? someone familiar with getPathInfo isn't going to expect to see that "+" in :path-info
jcrossley3: They’re not strictly servlet things. Other libraries, like Compojure, make use of them.
I also don’t see the point of adopting mistakes made in earlier libraries just for compatibility.
well, i meant that they're not http things, and therefore not ring things
They’re not in the Ring spec, but they are in the Ring library, like :params or :session.
i guess it's only context that is servlet-y. there is a CGI variable PATH_INFO
Sure, but again, why adopt a faulty convention?
still, it does make me wonder why they would explicitly spec it to be decoded
that's a pretty explicit mistake :simple_smile:
I don’t mean that it was accidental; just that it was a mistake in the design.
It looks like the full URI of servlets is: contextPath + servletPath + pathInfo
But servletPath and pathInfo are decoded
and contextPath and the URI isn't
And because URL decodes are lossy, it’s not a direct mapping.
do you know if the http rfc says anything about pathinfo?
It doesn’t.
It’s not something that’s part of the HTTP request.
It’s just something made up originally by CGI scripts.
https://groups.google.com/forum/#!topic/rack-devel/3fRiLmm-pT8 some background, i guess
:path-info might have been a poorly chosen name, considering that it’s slightly different to the PATH_INFO in CGI, and getPathInfo in servlets.
However, at the time I didn’t realise that
I think that's where my confusion came from - since it's not in the ring spec, and named the same as the servlet value, I assumed it was the servlet value
That’s a reasonable assumption.
I’ll make the documentation clearer in the next version of Ring.
I had assumed that :path-info in Ring was the same as the servlet version, but it turns out that’s not the case.
I might use :path
in future and deprecate :path-info
… but that likely won’t happen for a while.
I was surprised that is was decoded by servlets, I only discovered that yesterday
Yeah. My guess is that it’s to follow the CGI definition
Which makes sense for shell scripts
But doesn’t really make sense for modern web applications.
Reading that Rack thread talks about lossy decoding as well
But because Rack follow the CGI spec they have no choice
Fortunately Ring doesn’t need to, so we can break with tradition a little. Ideally it would be named something different, but for now it can just be documented better.
I know you've stated that it shouldn't be part of the ring spec, but if it was part of the spec, it would be a stronger stance than documentation of the implementation's behavior would be
Well, the Ring spec just covers HTTP. However, it might be worth putting up something else for the keys introduced by the library itself, like :context, :path-info, :params, :session and so forth.
@weavejester: not to dissuade you from fixing the web, but do you see no benefit in keeping ring at least a little in-line with wsgi, rack, servlets, etc?
e.g. after introducing :path, you might then decode :path-info rather than deprecate it
jcrossley3: I do see a benefit in being consistent. However (a) it’s too late now, and (b) the problems a decoded :path-info provides outweigh the advantages of compatibility, IMO.
problems with existing apps/middleware you mean?
Yes. I’d prefer not to change the behaviour of an existing key.
I’m careful about backward compatibility in Ring.
i can respect that, but like you say, it'd be good to be clear about where you diverge with CGI, et. al.
jcrossley3: I don't see users getting confused about :path-info being raw in ring, as long as it is clearly documented
Yes, that I’ll agree on :simple_smile:
The only reason there isn’t documentation is because I didn’t realise that it was different
and clearly no one has complained about it until now, and the complaint was "this shouldn't be decoded"
Yeah
@tcrawley: it confused both of us, duh :simple_smile:
Thanks for bringing this to my attention. I’ll make the documentation clearer.
Gotta go for now. Bye!
@weavejester: thanks!
jcrossley3: sure, but we're not users :)
pb
I just caught this conversation but I’m a bit confused what the outcome was
what is actually going to be documented in the ring documentation?
assuming https://clojurians.slack.com/archives/immutant/p1456537643000081
It sounds like nothing is really going to change, and this “bug” such as it is will remain.
It’s certainly not a huge deal to pass in a builder to Undertow to make sure that the url and path info is not decoded, so as to let Ring and Compojure do their thing. But seems like we may need better documentation for both Immutant (in terms of what Undertow is doing) and Ring/Compojure (in terms of where stuff is actually decoded) to help users not flail about when they run into what I did.
Apologies if I’m misunderstanding what the above discussion concluded; I just don’t see where it addresses my particular issue.
@ddellacosta: I’ll add some documentation to the path-info
and set-context
functions, and probably add a page to the wiki as well documenting the keys that the Ring library adds above and beyond the Ring specification. It’s also a possibility that I can add this into the servlet library as well.
I see—so one will still have to toggle decoding off in Undertow. (I realize this is not your particular concern @weavejester, just clarifying)
That’s right.
Please note also that where I first encountered this was in Compojure’s handling of params extracted from :path-info
Compojure doesn’t decode the path-info as far as I’m aware.
Very deliberately so, since otherwise context
wouldn’t work correctly.
whoops
Right, it decodes the parameters once they’re matched.
But not the path-info
hmm, those are extracted from path-info I thought
Right, but it grabs the parameters, and then decodes them.
yes—but if path-info is already decoded, it’s a problem
Sure.
sorry, not sure if I’m simply missing your point here
I’m not sure what the correct behavior is
/should be rather
The correct behaviour is that the :path-info
key should not be decoded.
@ddellacosta: I think we will fix this in Immutant by populating the request map with a :path-info that isn't decoded
extracted from the raw uri
okay, that helps clarify things @tcrawley
and in the meantime I will continue to pass in the builder w/DECODE_URL toggled off to handle this, as a workaround
cool, I'm glad you have a workaround for now. I'd love to get an Immutant release out in the next couple of weeks
(and thanks @weavejester too for jumping in and explaining this)
I'll hijack your existing jira for this fix
No problem. Sorry for the confusion.
sure thing—and if I can contribute anything else (happy to write the patch if you’d like) just let me know.
np @weavejester, it all makes sense now re-reading what you wrote initially
if you want to write a patch, that would be swell - you would have to fix it in two places
definitely, would love to contribute
the former is used when inside WildFly or EAP, the latter when using undertow directly
I see, simply ensure that those are not url-encoded?
(I assume I’ll have to possibly extract that info from somewhere else if .getPathInfo/.getRelativePath returns the decoded value by default)
you won't be able to prevent the decoding of the values returned by getPathInfo
and getRelativePath
- you'll have to ignore those methods and extract the path-info from the uri by stripping off the context, yeah
right, gotcha
okay, think we’re on the same page—will try to get something ASAP
will let you know if I’m going to be slower than by Monday @tcrawley, if that works for you
@ddellacosta: well, this is embarrassing - I did know about path-info being escaped, and how that was the wrong behavior, because I fixed it in Immutant 1: https://github.com/immutant/immutant/tree/1.x/modules/web/src/main/clojure/immutant/web/servlet.clj#L31
haha, whoops!
In fact, I remember it being that way in Immutant 2, but without the comment to explain why
you won’t be the first package maintainer to have forgotten about a fix...
so at some point we said "why aren't we just calling getPathInfo
?" and "fixed" it
@jcrossley3: ^
that looks like the right fix too; was trying to find some utils already available to pull that info out of
(sorry, I suppose you would know that better than I)
no problem!
yeah, I think that's the right way to calculate it
in any case, thanks for the heads up on that, I can simply try to port that over
great, thanks! and no rush
and feel free to ping me if you have any questions
will do, thanks @tcrawley !
I did plumb through the commits for Immutant 2, but can't find anywhere where we had it setting a raw path-info, so I feel less dumb
okay, good to know. And you should definitely not feel dumb about this, easy thing to understand
heh, thanks :)