Re: URI/URL handling - Simplelists

Show/hide message thread

What libraries we need Lassi Kortela (07 Apr 2019 08:55 UTC)

Re: What libraries we need Peter Bex (07 Apr 2019 09:31 UTC)

URI/URL handling Lassi Kortela (07 Apr 2019 10:11 UTC)

Re: URI/URL handling Peter Bex (07 Apr 2019 10:56 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 12:03 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 12:46 UTC)

Re: URI/URL handling Peter Bex (07 Apr 2019 14:20 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 15:06 UTC)

Re: URI/URL handling Peter Bex (07 Apr 2019 15:39 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 15:52 UTC)

Re: URI/URL handling Peter Bex (07 Apr 2019 16:03 UTC)

Re: URI/URL handling Lassi Kortela (07 Apr 2019 16:30 UTC)

Re: URI/URL handling Arthur A. Gleckler (09 Apr 2019 21:06 UTC)

Re: What libraries we need Arthur A. Gleckler (09 Apr 2019 20:49 UTC)

Re: URI/URL handling Peter Bex 07 Apr 2019 14:20 UTC

Show/hide attachments

On Sun, Apr 07, 2019 at 03:02:53PM +0300, Lassi Kortela wrote:
> > I think this will work.  If you update the path, it will clobber the raw
> > path, presumably?  Or should the code try hard to maintain components
> > that weren't changed?
>
> My intuition would suggest immutable URL objects. Things are much simpler
> when transformations can go one way only (from generic URI to common URI)
> and not the other way around. If it's a two-way street then we have to make
> sure every kind of URL is correctly round-tripped. I wouldn't do all that
> work unless it's specifically needed for some use case.

If you only have constructors, that's really tedious in use.  In CHICKEN,
the uri-common and uri-generic eggs are mutable, but there's also an
immutable copying "update" API which allows you to update an existing URL.
This is very handy in cases where, for example, the user supplies a base
URI for an endpoint and you want to add query params to it.

For example if you have a base URI of http://example.com/search, when
you hit the API you'd update the query string with options.  For example,
http://example.com/search?first-name=peter&last-name=bex

So in uri-common that'd be

(define base-uri (uri-reference "http://example.com/search"))

and then, to construct the URI and hit the endpoint:

(define (search/firstname uri name)
  (update-uri base-uri
     query: (alist-update 'firstname name (uri-query uri))))

(define (search/lastname uri name)
  (update-uri base-uri
     query: (alist-update 'lastname name (uri-query uri))))

(let* ((s base-uri)
       (s (search/firstname s "peter"))
       (s (search/lastname s "bex")))
  (with-input-from-request s #f read-lines))

The same can be done for the path and such.  Also, given a base URI you
should be able to easily add a username/password to it.

I don't think it's necessary to have a mutable API, but an updating API
would be very nice.

> > Ideally there's a way to override this, because there are some servers
> > out there which don't allow percent-encoded characters everywhere and
> > insist on having the raw characters, even if those are not treated
> > specially.
>
> Great :D We should probably specify a conservative normalization form in
> which differences like this don't matter...

In uri-common, there's a SRFI-39 parameter you can set (fluidly or
globally) which controls this.  This is a bit ugly.  Unfortunately,
when you update the common query object, the underlying query string
in the generic object needs to get set, which means you need to know
when you update what the separator character is.

Otherwise, it'd make more sense to have uri->string accept the separator
and other options.

> Ugh, good points once again :D I didn't even realize that URIs and query
> strings are specified by different standards organizations.

It's quite weird.  Still, it's all W3C nowadays.

> Do you think we should supply fully parsed query strings into the URL
> dispatcher? Does anyone actually dispatch by query string in practice? I've
> never thought about that. On face value it seems too brittle.

I don't think you need to be able to dispatch on the query _string_,
only on the parsed values.

> Once again we could have the accessor procedure with the friendly and
> obvious name give the conservatively pre-decoded query parameters, and
> special accessors would give the raw query string or some other decoding.

Yeah, that makes the most sense.

> Thanks for the detailed explanation. Turning consecutive slashes into empty
> components ("") makes sense.
>
> I think we should normalize the URLs that go into the router/dispatcher
> because most people will not realize that they should think about these edge
> cases. We can still let people access the non-normalized URL via special
> accessor procedures if they want it.

I don't really see the value in that.  foo//bar is not the same path
as foo/bar.  A web server could issue a 301 redirect to "fix" such
paths, which I think is a better level at which to fix it.

> By the same token, if the URL dispatcher captures URL components into
> variables, the obvious way to write the URL specifications should be a
> conservative one -- for example in this URL:
>
>     ("document" int "comment" int "edit")
>
> The `int` parser should not permit negative numbers or leading zeros because
> many people will not realize they should consider the issue.

That makes sense.

> If these URL-component-into-variable parsers are strict and conservative,
> that will also help catch errors due to (lack of) URL normalization. E.g. if
> the "string" parser rejects blank strings (as it probably should -- if a
> website's URL layout uses blank strings in URLs, then someone is doing
> something too fancy with URLs) then it doesn't matter if the URL parser
> keeps empty components.
>
> In all parts of the API the boring, safe and ordinary way to do things
> should be the obvious way to do it IMHO :)

Agreed, but it should always be possible to recover such "losses" for
cases where you want more precision.

> > [about uri-generic alternatives and tests]
>
> That's excellent. There is a _lot_ of good work done in the Scheme
> community. People just keep quiet about it :)

Yeah, we should be a bit more vocal.

> > [re-parsing being expensive and inconsistent]
>
> These are very good points.
>
> I compiled a wiki page with some links to what other languages are doing: <https://github.com/schemeweb/wiki/wiki/Request-abstraction-in-other-languages>.

Cool, I'll have a look at that.

Cheers,
Peter