Re: Normalization of percent-encoding
Peter Bex 16 Jun 2026 10:19 UTC
Sorry, this went to the wrong list (that's what I get for copy/pasting
the alias).
On Tue, Jun 16, 2026 at 11:53:01AM +0200, Peter Bex wrote:
> Hi there,
>
> I saw the URI SRFI this morning and as the author of the CHICKEN
> uri-common and co-author of the uri-generic library I would like to at
> least point out that percent-encoding is a bit of a pitfall, especially
> when programmatically constructing or updating URIs, also when based
> on user input.
>
> For this reason, the uri-generic library deconstructs the path
> segments into list form. Any percent-encoded characters that cannot
> occur in a path segment are automatically decoded. So for example, the
> slash is encoded as %2F but in decoded path-segments-as-list form, the
> individual segments will contain the decoded slash. So a relative ref
> like "foo%2Fbar/qux" will read like '("foo/bar" "qux") when decoded.
>
> And Unicode characters will also be decoded using the supplied encoding
> (UTF-8 by default).
>
> Note that all the "reserved" characters have a different meaning whether
> percent-encoded or not. See section 2.2 of RFC 2986.
>
> For this reason, we wrap the uri-generic library with the uri-common to
> make this easier for the user when dealing with "common" schemes like
> HTTP. This library fully decodes percent-encoded characters in path and
> query string (which is handled as an alist). This is a lossy process,
> as pointed out above, reserved characters have potentially different
> meaning whether encoded or decoded. However, in the vast majority of
> cases the programmer does not care and just wants to stuff a value into
> a path segment. It's nice not having to deal with the low-level nitty
> gritty of percent-encoding.
>
> Cheers,
> Peter
>