What libraries we need
Lassi Kortela
(07 Apr 2019 08:55 UTC)
|
Re: What libraries we need Peter Bex (07 Apr 2019 09:31 UTC)
|
URI/URL handling
Lassi Kortela
(07 Apr 2019 10:11 UTC)
|
Re: URI/URL handling
Peter Bex
(07 Apr 2019 10:56 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 12:03 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 12:46 UTC)
|
Re: URI/URL handling
Peter Bex
(07 Apr 2019 14:20 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 15:06 UTC)
|
Re: URI/URL handling
Peter Bex
(07 Apr 2019 15:39 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 15:52 UTC)
|
Re: URI/URL handling
Peter Bex
(07 Apr 2019 16:03 UTC)
|
Re: URI/URL handling
Lassi Kortela
(07 Apr 2019 16:30 UTC)
|
Re: URI/URL handling
Arthur A. Gleckler
(09 Apr 2019 21:06 UTC)
|
Re: What libraries we need
Arthur A. Gleckler
(09 Apr 2019 20:49 UTC)
|
On Sun, Apr 07, 2019 at 11:55:47AM +0300, Lassi Kortela wrote: > https://github.com/schemeweb/wiki/wiki/What-libraries-we-need > > - Do you find anything missing? Just a note. In CHICKEN, we have the uri-generic and uri-common eggs. The latter builds on top of the former. The reason for that is that the URI spec (the RFC, not the poor excuse for a spec the W3C is working on currently) differentiates between reserved characters and regular ones. The reserved ones (can) have a special meaning. For example, the query string decoding of ampersands, semicolons and equals characters is something that's handled by the HTML spec, not the URI spec. So, strictly speaking, according to the URI spec, foo?bar%3Dqux is possibly different foo?bar=qux but it doesn't have to be. Also, the path /foo:bar/qux/ is possibly different from /foo%3Abar/qux/. Of course, in most cases it's most convenient for the user to do full decoding whenever possible. The vast majority of users don't care about the difference and want to treat both /foo:bar/qux/ and /foo%3Abar/qux/ as '(/ "foo:bar" "qux" ""). But if you are writing, say, a web proxy in Scheme, it will be up to the upstream server how it handles these paths. In CHICKEN we handle this by having the uri-generic egg parse as much as it can without losing information. So /foo%2Fbar/qux/ is decoded to '(/ "foo/bar" "qux" ""), but in /foo%3Abar/qux/ the encoded chars are left alone and are decoded to '(/ "foo%3Abar" "qux" ""). This also means that a literal percent sign needs to stay encoded as %25. Of course this is super-inconvenient, so in uri-common we decode fully at the expense of losing information. Most users will use uri-common in their web code, because you rarely care about these encoded characters. The situation is somewhat confusing and weird, but it turns out to be a good compromise, because whenever you need the not-fully-decoded path, you can access the underlying uri-generic object. As long as you haven't manipulated any component, you will get back the original input. It would be nice if we can come up with cleaner API for this. Regardless, I would recommend using the uri-generic parser implementation for any reference implementation for a SRFI; it has a large test suite and is super compliant with the RFC spec; moreso than any other library I've come across in any language. This is one library I am extremely proud of being a co-maintainer for. You can find the implementation in the CHICKEN subversion repo at [1]. You can also browse it online at [2]. Note that there are several alternative implementations using different parser generators inside the "alternatives" directory. The main one still uses "matchable" and the implementation is a bit fiddly (but fast as hell). There's one in irregex too (which could be easily ported to SRFI-115) which comes close, performance-wise, and is a lot easier to understand and maintain. [1] https://code.call-cc.org/svn/chicken-eggs/release/5/uri-generic/trunk [2] https://bugs.call-cc.org/browser/project/release/5/uri-generic/trunk Cheers, Peter