strings draft
Tom Lord
(22 Jan 2004 04:58 UTC)
|
Re: strings draft
Shiro Kawai
(22 Jan 2004 09:46 UTC)
|
Re: strings draft
Tom Lord
(22 Jan 2004 17:32 UTC)
|
Re: strings draft
Shiro Kawai
(23 Jan 2004 05:03 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 00:31 UTC)
|
Re: strings draft
Matthew Dempsky
(24 Jan 2004 03:00 UTC)
|
Re: strings draft Shiro Kawai (24 Jan 2004 03:27 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 04:18 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 04:49 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 18:47 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 22:16 UTC)
|
Octet vs Char (Re: strings draft)
Shiro Kawai
(26 Jan 2004 09:58 UTC)
|
Strings, one last detail.
bear
(30 Jan 2004 21:12 UTC)
|
Re: Strings, one last detail.
Shiro Kawai
(30 Jan 2004 21:43 UTC)
|
Re: Strings, one last detail.
Tom Lord
(31 Jan 2004 00:13 UTC)
|
Re: Strings, one last detail.
bear
(31 Jan 2004 20:26 UTC)
|
Re: Strings, one last detail.
Tom Lord
(31 Jan 2004 20:42 UTC)
|
Re: Strings, one last detail.
bear
(01 Feb 2004 02:29 UTC)
|
Re: Strings, one last detail.
Tom Lord
(01 Feb 2004 02:44 UTC)
|
Re: Strings, one last detail.
bear
(01 Feb 2004 07:53 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(26 Jan 2004 19:04 UTC)
|
Re: Octet vs Char (Re: strings draft)
Matthew Dempsky
(26 Jan 2004 20:12 UTC)
|
Re: Octet vs Char (Re: strings draft)
Matthew Dempsky
(26 Jan 2004 20:40 UTC)
|
Re: Octet vs Char (Re: strings draft)
Ken Dickey
(27 Jan 2004 04:33 UTC)
|
Re: Octet vs Char
Shiro Kawai
(27 Jan 2004 05:12 UTC)
|
Re: Octet vs Char
Tom Lord
(27 Jan 2004 05:23 UTC)
|
Re: Octet vs Char
bear
(27 Jan 2004 08:35 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(27 Jan 2004 08:33 UTC)
|
Re: Octet vs Char (Re: strings draft)
Ken Dickey
(27 Jan 2004 15:43 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(27 Jan 2004 19:06 UTC)
|
Re: Octet vs Char
Shiro Kawai
(26 Jan 2004 23:39 UTC)
|
Re: strings draft
bear
(22 Jan 2004 19:05 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 01:53 UTC)
|
READ-OCTET (Re: strings draft)
Shiro Kawai
(23 Jan 2004 06:01 UTC)
|
Re: strings draft
bear
(23 Jan 2004 07:04 UTC)
|
Re: strings draft
bear
(23 Jan 2004 07:20 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 00:02 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 01:59 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 02:22 UTC)
|
Re: strings draft
bear
(26 Jan 2004 02:35 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 02:48 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 03:00 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 03:14 UTC)
|
Re: strings draft
Shiro Kawai
(26 Jan 2004 04:57 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 04:58 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 18:48 UTC)
|
Re: strings draft
bear
(24 Jan 2004 02:21 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:10 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:29 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:44 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:53 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:04 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 03:16 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:42 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 02:35 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:42 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:49 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 02:58 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:13 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 03:19 UTC)
|
Re: strings draft
Bradd W. Szonye
(23 Jan 2004 19:31 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 02:22 UTC)
|
Re: strings draft
Bradd W. Szonye
(06 Feb 2004 23:30 UTC)
|
Re: strings draft
Bradd W. Szonye
(06 Feb 2004 23:33 UTC)
|
Re: strings draft
Alex Shinn
(09 Feb 2004 01:45 UTC)
|
specifying source encoding (Re: strings draft)
Shiro Kawai
(09 Feb 2004 02:51 UTC)
|
Re: strings draft
Bradd W. Szonye
(09 Feb 2004 03:39 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:12 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 03:28 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:44 UTC)
|
Parsing Scheme [was Re: strings draft]
Ken Dickey
(23 Jan 2004 17:02 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
bear
(23 Jan 2004 17:56 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(23 Jan 2004 18:50 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Per Bothner
(23 Jan 2004 18:56 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 20:26 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Per Bothner
(23 Jan 2004 20:57 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 21:44 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Ken Dickey
(23 Jan 2004 21:47 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 23:22 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Ken Dickey
(25 Jan 2004 01:03 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(25 Jan 2004 03:01 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 20:07 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(23 Jan 2004 21:22 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 22:38 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(24 Jan 2004 06:48 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(24 Jan 2004 18:41 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(24 Jan 2004 19:34 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(24 Jan 2004 21:48 UTC)
|
Re: strings draft
Matthew Dempsky
(25 Jan 2004 06:59 UTC)
|
Re: strings draft
Tom Lord
(25 Jan 2004 07:16 UTC)
|
Re: strings draft
Matthew Dempsky
(26 Jan 2004 23:52 UTC)
|
Re: strings draft
Tom Lord
(27 Jan 2004 00:30 UTC)
|
Thanks for the detailed reply. Now I'm getting the point. * An implementation are free to have non-Unicode-compatible char/string, as far as it shares the mimimum requirement, which is not much more than current R5RS with some clearification (case mapping issues aside). * _If_ an implementation can also have a subset of Unicode- compatible char/string, this subset of char/string should follow the codepoint-index. The index handling of the rest of char/string is up to the implementation. Did I get it right? So, when the EUCJP Scheme reads a string "\U+30AB.\U+309A." Then it can produce a string which consists of a single characetr EUCJP #xA5F7. It is outside of the scope of your document, so the implementation is free to imlement such as (define x "\U+30AB.\U+309A.") (string-length x) => 1 (string-ref x 1) => <character EUCJP #xA5F7> (let ((y (string-copy x))) (string-set! y 0 #\a) y) => "a" If so, I have no problem to adopt the "codepoint index" proposal. [About O(1) property] >From: Tom Lord <xxxxxx@emf.net> Subject: Re: strings draft Date: Fri, 23 Jan 2004 16:45:16 -0800 (PST) > > No. String search, regexp match, or precalculated prefix/suffix > > database, all can return some sort of reference that directly > > points into the string, so that the subsequent use of such > > reference wouldn't need to count characters. > > (The implementation that shares substrings and uses write-on-copy > > for string mutation, those basic operations even can efficiently > > return substring directly.) > > Well, I don't think it's that simple. > > It would be hard to implement those "string reference objects" to > preserve the O(1) property in the face of STRING-SET! given a flat, > variable-width, string representation. > > And if you have a tree representation or something like what I > described for Pika -- then you don't need those "string reference > objects" after all. They might be nice for indepenent reasons -- but > you won't need them to get O(1) string-ops. I'm not sure we're talking about the same issue. Probably I mixed up two issues. * For STRING-REF and SUBSTRING, a string pointer object will allow O(1) access property to known locations of a string, in variable-width character string representation. And we hardly lose anything on "array of characters" implementation, since such implementation can just use integer index as a string pointer object---it doesn't need to be a disjoint object at all. What it loses is an ability to extract a character/string using index without prior knowledge of the target string. And what I'm saying is it is not a common case (maybe only when you're parsing fixed-column syntax?) But I might be missing something, and I'll appreciate if a concrete example is given. * For STRIGN-SET!, the copy-on-write of whole string implementation can't have O(1) property, regardless of whether it uses "array of characters", variable-width charcter, rope or other tree representation (you can be close though, if you use tree and only share the leaf, for exmaple). And I argue that it wouldn't be a common case that you want to replace exactly one character within a string of specific location---it is rather a special case of generic string replacement as srfi-13's string-xcopy!. There may be an application that uses such "one character replacement" heavily, but I don't think it is such a common case so that O(1) STRING-SET! should be a "strong recommendation". Again, I may miss something, though. You mentioned that you came to O(1) recommentation through your experience. If it's not too much trouble, I'd like to hear the concrete experience that made you think so. > > It's OK to have STRING-REF as well---after all, we have LIST-REF > > and nobody complains its O(N) complexity. > > In some sense, I think that the strong recommendation for O(1) > string-ops is already present in the spec. Were it not, why wouldn't > the string syntax be a fancy way to write lists and STRING? and LIST? > not disjoint? The same argument can be done that why the string syntax wouldn't be a fancy way to write vectors and STRING? and VECTOR? not disjoint. I don't know what the rrrs authors thought when they decided to have disjoint string type. Some old discussion, such as: http://www.swiss.ai.mit.edu/ftpdir/scheme-mail/HTML/rrrs-1985/msg00002.html suggests that they viewed a string as an array of characters. But at least such a view isn't explicitly in R5RS, and I see it fortunate. [About character-set independence] > > What I felt ambiguous is the degree of "character-set independence" > > you're aiming at. If we'd like to have a character-set independent > > language spec, we need to be much more careful to separate > > Unicode-specific issues and character-set independent issues. > > Hey, I'm partisan but fair, I think. > > My recommendations suggest _requirements_ for the portable character > set. Those aren't Unicode specific. My recommendations suggest > _requirements_for_implementations_providing_optional_features_: and > some of those are indeed Unicode specific. As far as it is clear that the portable Scheme can't rely on those features, I'll settle on it. > > > How would you remove that restriction in a way that supports writing > > > portable FFI-using code? > > > What I'm picking there is the word "must". > > scm_extract_string8 can put answer in eucjp packed format into > > t_uchar* array if the implementation supports that, so I don't > > see why this restriction is needed. > > I would not object to an addition to the portable FFI which is > > scm_extract_string_opaque > scm_enter_string_opaque > > that returns/accepts the data from a string, plus its length, but says > nothing about how the data is encoded. It's purpose would be to > extract that data in the "most convenient form" for a given > implementation. Would that do? I don't object that scm_{extract|enter}_string_opaque, but still fail to see why scm_{extract|enter}_string8 shouldn't handle both. > > Of course using such encoding wouldn't be portable. But so > > as iso8859_1 implementation is asked to convert the string > > into iso8859_2. > > I don't see why it wouldn't be portable. I was thinking it would be > helpful to have a "libscheme-ffi-helpers.a" with the necessary tables. Because iso8859-2 doesn't have INVERTED EXCLAMATION (iso8859-1 #xA1), for example. The implementation can return an error and it's fine, but then, why not eucjp? Alternatively, you can specify "when iso8859-1 implementation is asked to extract the string in iso8859-2, then it can map iso8859-1 characters that don't have correspondence to iso8859-2 characters to the iso8859-2 characters with the same codepoint". Although it's a hack, it's what the CCS/CES-unaware software does all the time. But then again, there's no reason that iso8859-1 implementation can extract string as jisx0201, with the similar rule described above. > Neither the 0..256 mapping nor the O(1) access time are _required_ > in the proposed Scheme changes. [...] > Requiring the 0..256 mapping in the FFI means just that `char' can > always be converted to CHAR? and back again. Is that really so > onerous? C 'char' doesn't have encoding information, but merely an integer with limited range. If we want to have Scheme character to be defined more strictly, the programmer should be more conscious about distinction between octets and characters. It wasn't requirement in the proposal, but "explicitly and strongly encouraging" will in fact encourage the bad practice that regards an octet and a character the same. I'm afraid that it encourages people to write a code that uses strings as a buffer of octet stream, for example. --shiro