URI/URL handling - Simplelists

Show/hide message thread

What libraries we need Lassi Kortela (07 Apr 2019 08:55 UTC)

Re: What libraries we need Peter Bex (07 Apr 2019 09:31 UTC)

URI/URL handling Lassi Kortela (07 Apr 2019 10:11 UTC)

Re: URI/URL handling Peter Bex (07 Apr 2019 10:56 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 12:03 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 12:46 UTC)

Re: URI/URL handling Peter Bex (07 Apr 2019 14:20 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 15:06 UTC)

Re: URI/URL handling Peter Bex (07 Apr 2019 15:39 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 15:52 UTC)

Re: URI/URL handling Peter Bex (07 Apr 2019 16:03 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 16:30 UTC)

Re: URI/URL handling Arthur A. Gleckler (09 Apr 2019 21:06 UTC)

Re: What libraries we need Arthur A. Gleckler (09 Apr 2019 20:49 UTC)

URI/URL handling Lassi Kortela 07 Apr 2019 10:11 UTC

Thanks for the great comments Peter! I for one love working with people
who care about getting things right at this level of detail.

> Just a note.  In CHICKEN, we have the uri-generic and uri-common eggs.

In light of your comments, this sounds like a good division.

> The situation is somewhat confusing and weird, but it turns out to be a
> good compromise, because whenever you need the not-fully-decoded path,
> you can access the underlying uri-generic object.  As long as you haven't
> manipulated any component, you will get back the original input.

This is good.

> It would be nice if we can come up with cleaner API for this.

In the archive file interface, I do this:

     (archive-entry-path entry)     => safe normalized pathname as list
     (archive-entry-raw-path entry) => raw unsafe pathname as bytevector

I've generally had good experiences this kind of API. I.e. the procedure
with the short and obvious name returns the thing people usually want,
and there's a separate procedure to get the raw/unsafe/complex version.

We could have something like:

     (uri-path     "/foo%3Abar/qux/") => (/ "foo:bar"   "qux")
     (uri-raw-path "/foo%3Abar/qux/") => (/ "foo%3Abar" "qux")

By the way, what about paths that contain more than one consecutive
slash: e.g. (uri-path "///")? And relative paths that don't start with a
slash? What happens then a URI path contains a backslash?

> Regardless, I would recommend using the uri-generic parser
> implementation for any reference implementation for a SRFI; it has a
> large test suite and is super compliant with the RFC spec; moreso than
> any other library I've come across in any language.  This is one library
> I am extremely proud of being a co-maintainer for.

 From your description, it sounds like you did exactly the right thing
on all counts.

> Note that there are several alternative implementations using different
> parser generators inside the "alternatives" directory.  The main one
> still uses "matchable" and the implementation is a bit fiddly (but fast
> as hell). There's one in irregex too (which could be easily ported to
> SRFI-115) which comes close, performance-wise, and is a lot easier to
> understand and maintain.

Could we specify a common interface for these implementations (or do
they already have the same interface)? This means they can also share
the same test suite, which ensures they are interchangeable (except for
speed and compatibility).

The request abstraction could be specified so that it just gets the raw
URL as a string from the HTTP server. The the application could parse it
before passing it on to the router/dispatcher (or the r/d could call the
library to parse it). But is it more convenient if the request object
already contains the parsed URL? Do e.g. Apache of Nginx module get
pre-parsed URLs from those web servers? In that case it would probably
not make sense to parse it again ourselves.