Re: On the implications of adding file-open to SRFI-170 Lassi Kortela 28 Nov 2019 14:56 UTC

Excellent remarks Harold.

>>     This means any port using a file
>>     descriptor opened with it must not have output buffering, so that
>>     a single port output procedure call results in only one call to
>>     write(2), otherwise parts of strings could get interleaved with
>>     others.
>>
>> I don't think that should be enforced:  I often open files for append
>> even when I expect to be the only writer, in which case not buffering
>> is very inefficient.
>
> Indeed, I was reviewing our previous discussion about the case of
> multiple writers.
>
> Should we mandate an optional no buffer argument to fdes->*output-port?
> Adding an option to change the buffer size to something non-zero and
> other than the default doesn't strike me as necessary for this SRFI, or
> perhaps necessary in general, and we want to limit what we demand of
> SRFI-170 implementors.  In that case, we might name the argument to
> something like atomic-writes taking a boolean to make our intent clear.
>
> Another option would be making the argument buffer-size, and not
> requiring implementation to support any value other than 0.  Code using
> this would be most portable if a non-supported positive argument didn't
> raise an error, this is a performance rather than raw functionality option.

Hmm. How would parts of strings get interleaved -- are there
multi-threaded Scheme implementations that don't protect their port
objects with mutexes?

If the output port is buffered but also mutexed internally, isn't

(write a)  ; in thread 1
(write b)  ; in thread 2
(write c)  ; in thread 3

guaranteed to write a, b and c (in some order) so that none of their
contents are intermingled?

If there's no locking then the strings can indeed get mixed up, but
wouldn't that be better fixed by adding a mutex instead of turning off
buffering?

If an explicit buffer size argument is supported, it should probably be
a minimum value so the implementation is permitted to use a bigger
buffer internally.

Standard error (stderr) is traditionally unbuffered to ensure error
messages get written out if the program crashes unexpectedly, so an
unbuffered output mode is definitely useful. But it probably should not
be the default.

It's also worth noting that textual ports often use line buffering (as
opposed to a fixed-size buffer; though there's probably a buffer with a
fixed maximum size under the hood to avoid buffering infinitely long lines).

>>     compared to using the normal Scheme port procedures, unless you
>>     add set-port-position! AKA POSIX lseek to this or another SRFI.
>>
>> However, if the textual conversion is specified in fdes->textual-port,
>> then you do need plain open.
>>
>>       (At the same time, you'd naturally also add port-position to
>>     find out the current position (just another call to lseek with
>>     special arguments), and perhaps can-set-port-position?.)
>>
>> I certainly do expect to have the R6RS port-position functions, but I
>> don't think they belong in SRFI 170.

Port position on a textual port is notoriously hazardous. It's not clear
whether it should be the byte position or the character position. And
how do you implement an efficient and reliable set-port-position! using
character positions? What if it uses byte positions and the user jumps
into the middle of a multi-byte character?

Is there a reasonable way to only support port position for binary ports
without making the API seem confusing to users?

>>     That could be ignored, but not open/read+write, which doesn't make
>>     any sense without set-port-position!  FilesAdvanceCowan specifies
>>     that switching between reading and writing first requires calling
>>     set-port-position!, which I assume is to invoke its buffer
>>     manipulating functionality.

>>     Of course open/read+write opens an even bigger cans of worms,
>>     because you'll have to implement bi-directional ports.
>>     FilesAdvancedCowan lays out the minimum required behavior from the
>>     API, but obviously a noticeable amount of new work will be
>>     required to support them when starting from a Scheme
>>     implementation that sticks to the standards and only does
>>     unidirectional ports.

How about this requirement for implementations:

- In a multi-threaded Scheme, the port object returned by file-open or
fdes->*port must be thread-safe so only one thread can read or write at
a time.

- For an output-capable port, any call to read or (set)-port-position
first causes the output buffer to be flushed. This implies that read and
(set-)port-position can raise any error that flush-output-port can raise.

>>     And then there's complexities with textual ports, which I will
>>     leave for the future depending on how much of the above extra
>>     functionality we decide is proper to demand from implementors of
>>     SRFI-170.  Which is my closing question, where should we draw the
>>     line?
>>
>> I think we need to be able to convert a fd to a textual input or
>> output port, as we have now.

Agreed.

>> The question is whether we need to have
>> the detailed control, or just leave it to the Scheme implementation
>> (which would presumably choose it based on things like $LANG and the
>> OS name).

If this goes into SRFI 170, character encoding and newline
considerations would complicate the SRFI a lot.

If we had keyword arguments, 170 could allow implementations to add
arbitrary keyword arguments to the open-file and fdes->textual-port
procedures...