What libraries we need
Lassi Kortela
(07 Apr 2019 08:55 UTC)
|
Re: What libraries we need
Peter Bex
(07 Apr 2019 09:31 UTC)
|
URI/URL handling Lassi Kortela (07 Apr 2019 10:11 UTC)
|
Re: URI/URL handling
Peter Bex
(07 Apr 2019 10:56 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 12:03 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 12:46 UTC)
|
Re: URI/URL handling
Peter Bex
(07 Apr 2019 14:20 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 15:06 UTC)
|
Re: URI/URL handling
Peter Bex
(07 Apr 2019 15:39 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 15:52 UTC)
|
Re: URI/URL handling
Peter Bex
(07 Apr 2019 16:03 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 16:30 UTC)
|
Re: URI/URL handling
Arthur A. Gleckler
(09 Apr 2019 21:06 UTC)
|
Re: What libraries we need
Arthur A. Gleckler
(09 Apr 2019 20:49 UTC)
|
Thanks for the great comments Peter! I for one love working with people who care about getting things right at this level of detail. > Just a note. In CHICKEN, we have the uri-generic and uri-common eggs. In light of your comments, this sounds like a good division. > The situation is somewhat confusing and weird, but it turns out to be a > good compromise, because whenever you need the not-fully-decoded path, > you can access the underlying uri-generic object. As long as you haven't > manipulated any component, you will get back the original input. This is good. > It would be nice if we can come up with cleaner API for this. In the archive file interface, I do this: (archive-entry-path entry) => safe normalized pathname as list (archive-entry-raw-path entry) => raw unsafe pathname as bytevector I've generally had good experiences this kind of API. I.e. the procedure with the short and obvious name returns the thing people usually want, and there's a separate procedure to get the raw/unsafe/complex version. We could have something like: (uri-path "/foo%3Abar/qux/") => (/ "foo:bar" "qux") (uri-raw-path "/foo%3Abar/qux/") => (/ "foo%3Abar" "qux") By the way, what about paths that contain more than one consecutive slash: e.g. (uri-path "///")? And relative paths that don't start with a slash? What happens then a URI path contains a backslash? > Regardless, I would recommend using the uri-generic parser > implementation for any reference implementation for a SRFI; it has a > large test suite and is super compliant with the RFC spec; moreso than > any other library I've come across in any language. This is one library > I am extremely proud of being a co-maintainer for. From your description, it sounds like you did exactly the right thing on all counts. > Note that there are several alternative implementations using different > parser generators inside the "alternatives" directory. The main one > still uses "matchable" and the implementation is a bit fiddly (but fast > as hell). There's one in irregex too (which could be easily ported to > SRFI-115) which comes close, performance-wise, and is a lot easier to > understand and maintain. Could we specify a common interface for these implementations (or do they already have the same interface)? This means they can also share the same test suite, which ensures they are interchangeable (except for speed and compatibility). The request abstraction could be specified so that it just gets the raw URL as a string from the HTTP server. The the application could parse it before passing it on to the router/dispatcher (or the r/d could call the library to parse it). But is it more convenient if the request object already contains the parsed URL? Do e.g. Apache of Nginx module get pre-parsed URLs from those web servers? In that case it would probably not make sense to parse it again ourselves.