strings draft
Tom Lord
(22 Jan 2004 04:58 UTC)
|
Re: strings draft
Shiro Kawai
(22 Jan 2004 09:46 UTC)
|
Re: strings draft
Tom Lord
(22 Jan 2004 17:32 UTC)
|
Re: strings draft
Shiro Kawai
(23 Jan 2004 05:03 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 00:31 UTC)
|
Re: strings draft
Matthew Dempsky
(24 Jan 2004 03:00 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 03:27 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 04:18 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 04:49 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 18:47 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 22:16 UTC)
|
Octet vs Char (Re: strings draft)
Shiro Kawai
(26 Jan 2004 09:58 UTC)
|
Strings, one last detail.
bear
(30 Jan 2004 21:12 UTC)
|
Re: Strings, one last detail.
Shiro Kawai
(30 Jan 2004 21:43 UTC)
|
Re: Strings, one last detail.
Tom Lord
(31 Jan 2004 00:13 UTC)
|
Re: Strings, one last detail.
bear
(31 Jan 2004 20:26 UTC)
|
Re: Strings, one last detail.
Tom Lord
(31 Jan 2004 20:42 UTC)
|
Re: Strings, one last detail.
bear
(01 Feb 2004 02:29 UTC)
|
Re: Strings, one last detail.
Tom Lord
(01 Feb 2004 02:44 UTC)
|
Re: Strings, one last detail.
bear
(01 Feb 2004 07:53 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(26 Jan 2004 19:04 UTC)
|
Re: Octet vs Char (Re: strings draft)
Matthew Dempsky
(26 Jan 2004 20:12 UTC)
|
Re: Octet vs Char (Re: strings draft)
Matthew Dempsky
(26 Jan 2004 20:40 UTC)
|
Re: Octet vs Char
Shiro Kawai
(26 Jan 2004 23:39 UTC)
|
Re: Octet vs Char (Re: strings draft)
Ken Dickey
(27 Jan 2004 04:33 UTC)
|
Re: Octet vs Char
Shiro Kawai
(27 Jan 2004 05:12 UTC)
|
Re: Octet vs Char
Tom Lord
(27 Jan 2004 05:23 UTC)
|
Re: Octet vs Char
bear
(27 Jan 2004 08:35 UTC)
|
Re: Octet vs Char (Re: strings draft) bear (27 Jan 2004 08:33 UTC)
|
Re: Octet vs Char (Re: strings draft)
Ken Dickey
(27 Jan 2004 15:43 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(27 Jan 2004 19:06 UTC)
|
Re: strings draft
bear
(22 Jan 2004 19:05 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 01:53 UTC)
|
READ-OCTET (Re: strings draft)
Shiro Kawai
(23 Jan 2004 06:01 UTC)
|
Re: strings draft
bear
(23 Jan 2004 07:04 UTC)
|
Re: strings draft
bear
(23 Jan 2004 07:20 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 00:02 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 01:59 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 02:22 UTC)
|
Re: strings draft
bear
(26 Jan 2004 02:35 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 02:48 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 03:00 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 03:14 UTC)
|
Re: strings draft
Shiro Kawai
(26 Jan 2004 04:57 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 04:58 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 18:48 UTC)
|
Re: strings draft
bear
(24 Jan 2004 02:21 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:10 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:29 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:44 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:53 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:04 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 03:16 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:42 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 02:35 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:42 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:49 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 02:58 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:13 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 03:19 UTC)
|
Re: strings draft
Bradd W. Szonye
(23 Jan 2004 19:31 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 02:22 UTC)
|
Re: strings draft
Bradd W. Szonye
(06 Feb 2004 23:30 UTC)
|
Re: strings draft
Bradd W. Szonye
(06 Feb 2004 23:33 UTC)
|
Re: strings draft
Alex Shinn
(09 Feb 2004 01:45 UTC)
|
specifying source encoding (Re: strings draft)
Shiro Kawai
(09 Feb 2004 02:51 UTC)
|
Re: strings draft
Bradd W. Szonye
(09 Feb 2004 03:39 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:12 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 03:28 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:44 UTC)
|
Parsing Scheme [was Re: strings draft]
Ken Dickey
(23 Jan 2004 17:02 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
bear
(23 Jan 2004 17:56 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(23 Jan 2004 18:50 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Per Bothner
(23 Jan 2004 18:56 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 20:26 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Per Bothner
(23 Jan 2004 20:57 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 21:44 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Ken Dickey
(23 Jan 2004 21:47 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 23:22 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Ken Dickey
(25 Jan 2004 01:03 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(25 Jan 2004 03:01 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 20:07 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(23 Jan 2004 21:22 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 22:38 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(24 Jan 2004 06:48 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(24 Jan 2004 18:41 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(24 Jan 2004 19:34 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(24 Jan 2004 21:48 UTC)
|
Re: strings draft
Matthew Dempsky
(25 Jan 2004 06:59 UTC)
|
Re: strings draft
Tom Lord
(25 Jan 2004 07:16 UTC)
|
Re: strings draft
Matthew Dempsky
(26 Jan 2004 23:52 UTC)
|
Re: strings draft
Tom Lord
(27 Jan 2004 00:30 UTC)
|
On Mon, 26 Jan 2004, Ken Dickey wrote: >Well color me dumb, but I don't see why getting O(1) is such a big deal. > >Let's say an implementation reads UTF-8 chars. A uni-string could be >represented as a tuple. The required part is a byte-vector of characters >that fit into 8 bits. If characters have 16 bit representation, use a >second, 16-bit vector and a map of which chars are the long ones. If 16 bit, >index into the 2nd array, else into the first. The 2nd array could be >shortened by using the byte-vector to hold indexes into the 16-bit array in >the mapped/unused bytevector slots. In the "worst case" (storage wise), we >only need the 16-bit array. The mapping could be a bit-vector or whatever >makes sense in a particular implementation. > >It is true that there are more code paths in this scheme (sic), but each kind >of representation "knows that" and this still makes refs & sets ~ O(1). > >What am I missing here? O(1) reference or character setting comes at the expense of O(n) insertions, deletions, and non-identical-sized replacements. EG, if I change "the" to "a" at the beginning of a long string, and I've represented it as a vector to get O(1) reference time, the rest of the string has to be copied to move it two character spaces in memory. This is no big deal, on the same order as a function call overhead, for strings of 250 characters or less. But it starts to be a very big deal when the string is the size of a large novel, around 2 million characters. Likewise, when the application decides it wants to make a copy of a string and change just the copy, the naive implementation using strings made of byte vectors copies the whole thing to a new vector allocated for a new variable, then copies it again within that vector to move the bulk of the new string 2 spaces. A more sophisticated implementation still copies the whole thing at least once. Implementing ropes allows the system instead to copy just the root node of the string's Btree (constant-size unit) in order to copy a string, and then copy tree nodes down to the substring where "the" will be changed to "a", make a copy of just that substring, and shift the bulk of just that copy 2 characters. If the strings are large, this saves you a bunch of CPU and a bunch of memory, since the bulk of the structure - all the tree nodes except a single root-to-leaf path and all the substrings but the one changed - is still shared between the two strings. Below, I'm comparing ropes to the simplest type of vector string, in which all characters are equal width. You've offered a technique to get similar performance with strings having unequal- width characters (such as UTF-8 encoding) and I think it'd work, but here are still compelling reasons in at least some applications to choose ropes. Here's how the operations break down (use a constant-width font): ========================================================== "Rope" string | vector string indexed reference; | O(log n) | O(1) ------------------------------------------------- indexed mutation | O(log n) | O(1) ------------------------------------------------- string-copy | O(1), | O(n), Omega(1) | omega(n) ------------------------------------------------- unequal-size mutation | O(log n) | O(n) Omega(log n) *1 | Omega(1) ------------------------------------------------- *1; Unequal-size mutation uses Omega(log n) memory only if the data is shared with another string (ie, it uses part of the memory spaces that has already been saved in a string-copy operation). If the data is unshared, as with vector strings, it's Omega(1). =========================================================== As you can see, vector strings win on indexed reference and indexed mutation, but rope strings win everywhere else. It is possible to use more memory doing an unequal-size mutation, but that's only in the case that the vector string would already have used even more memory in a string-copy operation. Order(1) indexed reference and indexed single-character mutation is nice, but the time and space taken by the other operations starts becoming untenable when the size of the strings gets sufficiently large. If you need to know which is best for your application, you may want to instrument your code and use a profiler to find out just how much time you're spending copying strings. Bear