Re: Please publish johnwcowan/srfi-233 as draft #2

Show/hide message thread

Please publish johnwcowan/srfi-233 as draft #2 John Cowan (26 Sep 2022 17:50 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 Arthur A. Gleckler (26 Sep 2022 17:53 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 John Cowan (26 Sep 2022 18:58 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 Arthur A. Gleckler (26 Sep 2022 20:09 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 Marc Nieper-Wißkirchen (28 Sep 2022 11:22 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 John Cowan (28 Sep 2022 13:18 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 Marc Nieper-Wißkirchen (30 Sep 2022 07:28 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 John Cowan (30 Sep 2022 14:46 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 Marc Nieper-Wißkirchen (30 Sep 2022 15:16 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 John Cowan (30 Sep 2022 20:35 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 Marc Nieper-Wißkirchen (30 Sep 2022 20:45 UTC)

Re: Please publish johnwcowan/srfi-233 as draft #2 Marc Nieper-WiÃkirchen 30 Sep 2022 07:28 UTC

Thanks for the prompt response.

Am Mi., 28. Sept. 2022 um 15:18 Uhr schrieb John Cowan <xxxxxx@ccil.org>:

[...]

>> - The generator really should raise a well-defined exception (that would derived from &error in the case of R6RS) if the INI file format is not of the specified format.  Only this way, one can be sure that the provided textual data is interpreted correctly.  Silent extensions by implementations must not be permitted (at least not without setting a special flag).

>
> There is no use in pretending that INI format is precisely specified like XML or JSON.  This library attempts to extract as much information as possible from an INI file no matter how ill-formed it is.  I have added a version of this sentence to the Rationale.

I don't think that this is a good idea. Instead,  I think it is much
more dangerous than the issue below about symbol GC because the latter
can be fixed by implementers. A SRFI 233 implementation may try to
extract as much information as possible, but it is unclear whether the
information the implementation thinks is there is the same information
the creator of the INI file intended to put into it if they don't
speak the same language. In other words, whatever is returned by the
SRFI 233 generator comes either from a correctly formatted INI file
(in the sense of SRFI 233) or is, at the worst, arbitrary. If as user
cannot detect the difference, potentially they have to consider all
information returned by the SRFI 233 to be arbitrary.

Thus, should SRFI 233 continues to turn a blind eye on ill-formed INI
files (ill-formed in the sense of SRFI 233) and not raise a
well-defined error, I have to join company with those on the mailing
list who said that making a SRFI from the implementation and API
proposed by you and Arvydas is not a good idea and that it should
instead become some library on GitXXX.

>> - The keys and section names really should be represented by symbols, not strings.  Symbols are interned, can be compared quickly and have a fast hash function.  All this is quite advantageous in the case of INI files.
>
>
> Generating symbols from untrusted input allows a DoS attack on Schemes with naive garbage collectors (i.e. most of them), allowing memory to fill up with unreclaimable symbols.  In any case, typically a program reads a small number of INI files, often just one, with no duplicate section or key names, so the advantages of using symbols are effectively nullified.

Then these Schemes should be fixed ASAP or not be used in critical
areas. Or they should not implement SRFI 233. Instead of having the
design of an API guided by obvious deficiencies of some Schemes in
existence, it should be seen as a motivation for the authors of these
Schemes to fix their GC.  It will also help Scheme's reputation if
implementations become better and more robust.

A user of the SRFI 233 wants to process the returned keys somehow. The
simplest form would be a case construct. This does not work with
strings so the user would probably use `string->symbol` by themselves.
Or the key-value pairs have to be stored somehow. Again, a
symbol-based hash table is more efficient than a string-based
hashtable. In any case, I think that symbols are by far a better
abstraction than strings.

[...]

>> - Should the accumulator be allowed to cache results so that the writing may be deferred until it receives an eof object?  Should the generator be allowed to read ahead?
>
>
> Since there are no guarantees that Scheme ports do or don't don't buffer already, I see no point in mentioning this.

No, but the same thread (or even other threads) can interleave
writing/reading from the port. This can even make sense to skip
certain lines (known to be ill-formed) during reading or to write
extra information during writing.

[...]

Marc