Re: Strings/chars - Simplelists

Show/hide message thread

Couple things... felix (22 Dec 2003 17:51 UTC)
(missing)
(missing)
(missing)
Re: Couple things... felix (24 Dec 2003 11:43 UTC)
Re: Couple things... tb@xxxxxx (24 Dec 2003 23:30 UTC)
Re: Couple things... Michael Sperber (27 Dec 2003 18:46 UTC)
Re: Couple things... felix (24 Dec 2003 12:40 UTC)
Re: Couple things... Michael Sperber (26 Dec 2003 15:16 UTC)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
Re: Couple things... felix (04 Jan 2004 18:51 UTC)
Re: Couple things... Tom Lord (04 Jan 2004 22:13 UTC)
Re: Couple things... Michael Sperber (05 Jan 2004 19:18 UTC)
Re: Couple things... Tom Lord (05 Jan 2004 21:53 UTC)
Re: Couple things... Michael Sperber (05 Jan 2004 19:19 UTC)
Re: Couple things... felix (04 Jan 2004 18:42 UTC)
(missing)
Re: Couple things... felix (24 Dec 2003 12:01 UTC)
Re: Couple things... Jim Blandy (24 Dec 2003 16:29 UTC)
(missing)
(missing)
(missing)
Re: Strings/chars Tom Lord (24 Dec 2003 04:47 UTC)

Re: Strings/chars Tom Lord 24 Dec 2003 05:11 UTC

    > From: Alex Shinn <xxxxxx@synthcode.com>

    > Shiro's proposal is well thought out, handles encoding simply,
    > and is based on real working practice in Gauche.

    > The main complication is that Scheme strings don't necessarily
    > have anything to do with C strings.  Shared substrings, in fact,
    > are not C strings as already acknowledged by the API and strings
    > as lists or Boehm cords aren't even consecutive memory
    > references.  Handle these issues and the only thing left for
    > Unicode is to specify the default encoding (and an advanced SRFI
    > could specify fetching w/ alternate encodings for efficiency).

My own thinking in this area isn't fully cooked yet but let me make a
few general observations.

* portable FFI vs. native FFI

  It's worth keeping clear the difference between an FFI for writing
  code portable across multiple implementations vs. an FFi exposing
  the full glory of a particular implementation.

  In a portable FFI, we can tolerate moderate inefficiences, loss of
  generality, and all kinds of sins -- just so long as the result
  really is portable and really is enough to write useful code
  in a large number of cases.

  In terms of strings, I like the idea of ALLOCATE_COPY_OF rather than
  EXTRACT:  function(s) that give you copies of strings or parts of
  strings, in whatever encoding you like (from a small set), but
  which don't share state with the actual Scheme string and do have
  to be explicitly freed.

  That's at least enough to be able to, for example, get the name of a
  file you're supposed to open.

* indexes are a total nightmare

  Let's suppose a C function wants to hand Scheme the return value of
  mb_strlen.   Or that Scheme wants to hand C a "string index".

  Total train wreck.

* the real problem is C and C libraries

  The standard C facilities for large character sets are fairly lame.
  The de facto standard practice of using UTF-8 for everything is
  limiting.   Indeed, there are no standard libraries for things such
  as ropes, edit buffers, and so forth.

  It's beyond the scope of SRFI-50 but I think that in the longer
  term, as we build these next generation Schemes with good Unicode
  support, an interesting possibility is to aim for a run-time system
  that doubles as a next-generation C library for Unicode text
  manipulation.

-t