Normalization of percent-encoding

Show/hide message thread

Normalization of percent-encoding Peter Bex (16 Jun 2026 09:53 UTC)

Re: Normalization of percent-encoding Peter Bex (16 Jun 2026 10:19 UTC)

Normalization of percent-encoding Peter Bex 16 Jun 2026 09:53 UTC

Hi there,

I saw the URI SRFI this morning and as the author of the CHICKEN
uri-common and co-author of the uri-generic library I would like to at
least point out that percent-encoding is a bit of a pitfall, especially
when programmatically constructing or updating URIs, also when based
on user input.

For this reason, the uri-generic library deconstructs the path
segments into list form.  Any percent-encoded characters that cannot
occur in a path segment are automatically decoded.  So for example, the
slash is encoded as %2F but in decoded path-segments-as-list form, the
individual segments will contain the decoded slash.  So a relative ref
like "foo%2Fbar/qux" will read like '("foo/bar" "qux") when decoded.

And Unicode characters will also be decoded using the supplied encoding
(UTF-8 by default).

Note that all the "reserved" characters have a different meaning whether
percent-encoded or not.  See section 2.2 of RFC 2986.

For this reason, we wrap the uri-generic library with the uri-common to
make this easier for the user when dealing with "common" schemes like
HTTP.  This library fully decodes percent-encoded characters in path and
query string (which is handled as an alist).  This is a lossy process,
as pointed out above, reserved characters have potentially different
meaning whether encoded or decoded.  However, in the vast majority of
cases the programmer does not care and just wants to stuff a value into
a path segment.  It's nice not having to deal with the low-level nitty
gritty of percent-encoding.

Cheers,
Peter