index/cursor merging [was: 120 days]

Show/hide message thread

SRFI 130: 120 days Arthur A. Gleckler (01 Apr 2016 19:25 UTC)

Re: SRFI 130: 120 days John Cowan (01 Apr 2016 20:40 UTC)

Re: SRFI 130: 120 days Alex Shinn (02 Apr 2016 14:32 UTC)

Re: SRFI 130: 120 days John Cowan (02 Apr 2016 16:29 UTC)

index/cursor merging [was: 120 days] Per Bothner (03 Apr 2016 19:16 UTC)

Re: index/cursor merging [was: 120 days] John Cowan (03 Apr 2016 19:40 UTC)

Re: index/cursor merging [was: 120 days] Alex Shinn (04 Apr 2016 01:18 UTC)

Re: index/cursor merging [was: 120 days] Per Bothner (04 Apr 2016 02:56 UTC)

Re: index/cursor merging [was: 120 days] Alex Shinn (04 Apr 2016 05:39 UTC)

index/cursor merging [was: 120 days] Per Bothner 03 Apr 2016 19:15 UTC

On 04/02/2016 09:29 AM, John Cowan wrote:
> Cursors need not be heap allocated if they are negative fixnums, which
> is explicitly called out as legitimate.  0 is both a cursor and an index
> under this scheme, but that's all right, because it means the same thing
> in both cases.

I feel very uncomfortable with this approach.  It seems very error-prone.

There is a minor performance hit for testing index vs cursor, but it
probably only matters for string-ref/cursor.

It feels inconsistent: Some procedures have "cursor" in the same, while some don't.
Why string-ref/cursor rather than just string-ref, but string-pad rather
than string-pad/cursor?

It precludes an extension where negative indexes count from the end, as in Python.

You [John] have said before that the Scheme way is to not overload procedures
to support different types, even when the types are different implementations
of the same concept (such as sequences), but rather to have different procedures.

In Kawa the string-ref procedure is the "legacy" O(N) version,
if you use the "procedural" syntax it uses "sequences indexing", which is O(1):
http://www.gnu.org/software/kawa/Strings.html#Strings-as-sequences
The idea is: (STR N) returns the *whole* (Unicode) character at offset N.
If offset N points to the second half of a surrogate pair, the result is a special
pseudo-character #\ignorable-char.

Kawa doesn't go so far as to have string-ref possible return #\ignorable-char,
but maybe that is worth considering SRFI 130.  It wouldn't be 100% compatible
with R7RS, but it would be a rare program where it mattered.

In this model an "index" can count either characters or some other value
that increases monotonically with characters:

(string-length (string CH)) can return 1 or a small integer
(string-ref (string CH) 0) is CH
for i > 1 and I < (string-length CH) we have:
   (string-ref (string CH) I) is #\ignorable-char
(string #\ignorable-char) returns the empty string ""
i.e (string-length (string #\ignorable-char)) is 0
(string-length (string-append STR1 STR2))
   == (+ (string-length STR1) (string-length STR2))
(string-ref STR N) is O(1) - it returns a char at N or #\ignorable-char
(string-length (make-string N CH)) is not necessarily N,
but is (* N (string-length (string CH)))

(apply string (map (lambda (I) (string-ref S I) (iota (string-length S))))
is always equal? to S, at least as long as S contains no partial characters.

(string-set! S I C) is deprecated, but equal to SRFI-118
(string-replace! S I (+ I (string-length (string-ref S I)) (string C))

As I said: Kawa doesn't go quite this far, but it might be worth considering:
It's efficient and simpler that having string-cursors as a separate type.
--
	--Per Bothner
xxxxxx@bothner.com   http://per.bothner.com/