strings draft
Tom Lord
(22 Jan 2004 04:58 UTC)
|
Re: strings draft
Shiro Kawai
(22 Jan 2004 09:46 UTC)
|
Re: strings draft
Tom Lord
(22 Jan 2004 17:32 UTC)
|
Re: strings draft
Shiro Kawai
(23 Jan 2004 05:03 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 00:31 UTC)
|
Re: strings draft
Matthew Dempsky
(24 Jan 2004 03:00 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 03:27 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 04:18 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 04:49 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 18:47 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 22:16 UTC)
|
Octet vs Char (Re: strings draft)
Shiro Kawai
(26 Jan 2004 09:58 UTC)
|
Strings, one last detail.
bear
(30 Jan 2004 21:12 UTC)
|
Re: Strings, one last detail.
Shiro Kawai
(30 Jan 2004 21:43 UTC)
|
Re: Strings, one last detail.
Tom Lord
(31 Jan 2004 00:13 UTC)
|
Re: Strings, one last detail.
bear
(31 Jan 2004 20:26 UTC)
|
Re: Strings, one last detail.
Tom Lord
(31 Jan 2004 20:42 UTC)
|
Re: Strings, one last detail.
bear
(01 Feb 2004 02:29 UTC)
|
Re: Strings, one last detail.
Tom Lord
(01 Feb 2004 02:44 UTC)
|
Re: Strings, one last detail.
bear
(01 Feb 2004 07:53 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(26 Jan 2004 19:04 UTC)
|
Re: Octet vs Char (Re: strings draft)
Matthew Dempsky
(26 Jan 2004 20:12 UTC)
|
Re: Octet vs Char (Re: strings draft)
Matthew Dempsky
(26 Jan 2004 20:40 UTC)
|
Re: Octet vs Char
Shiro Kawai
(26 Jan 2004 23:39 UTC)
|
Re: Octet vs Char (Re: strings draft)
Ken Dickey
(27 Jan 2004 04:33 UTC)
|
Re: Octet vs Char
Shiro Kawai
(27 Jan 2004 05:12 UTC)
|
Re: Octet vs Char
Tom Lord
(27 Jan 2004 05:23 UTC)
|
Re: Octet vs Char
bear
(27 Jan 2004 08:35 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(27 Jan 2004 08:33 UTC)
|
Re: Octet vs Char (Re: strings draft)
Ken Dickey
(27 Jan 2004 15:43 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(27 Jan 2004 19:06 UTC)
|
Re: strings draft
bear
(22 Jan 2004 19:05 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 01:53 UTC)
|
READ-OCTET (Re: strings draft)
Shiro Kawai
(23 Jan 2004 06:01 UTC)
|
Re: strings draft
bear
(23 Jan 2004 07:04 UTC)
|
Re: strings draft
bear
(23 Jan 2004 07:20 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 00:02 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 01:59 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 02:22 UTC)
|
Re: strings draft
bear
(26 Jan 2004 02:35 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 02:48 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 03:00 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 03:14 UTC)
|
Re: strings draft Shiro Kawai (26 Jan 2004 04:57 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 04:58 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 18:48 UTC)
|
Re: strings draft
bear
(24 Jan 2004 02:21 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:10 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:29 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:44 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:53 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:04 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 03:16 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:42 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 02:35 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:42 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:49 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 02:58 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:13 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 03:19 UTC)
|
Re: strings draft
Bradd W. Szonye
(23 Jan 2004 19:31 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 02:22 UTC)
|
Re: strings draft
Bradd W. Szonye
(06 Feb 2004 23:30 UTC)
|
Re: strings draft
Bradd W. Szonye
(06 Feb 2004 23:33 UTC)
|
Re: strings draft
Alex Shinn
(09 Feb 2004 01:45 UTC)
|
specifying source encoding (Re: strings draft)
Shiro Kawai
(09 Feb 2004 02:51 UTC)
|
Re: strings draft
Bradd W. Szonye
(09 Feb 2004 03:39 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:12 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 03:28 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:44 UTC)
|
Parsing Scheme [was Re: strings draft]
Ken Dickey
(23 Jan 2004 17:02 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
bear
(23 Jan 2004 17:56 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(23 Jan 2004 18:50 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Per Bothner
(23 Jan 2004 18:56 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 20:26 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Per Bothner
(23 Jan 2004 20:57 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 21:44 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 20:07 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(23 Jan 2004 21:22 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 22:38 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(24 Jan 2004 06:48 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(24 Jan 2004 18:41 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(24 Jan 2004 19:34 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(24 Jan 2004 21:48 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Ken Dickey
(23 Jan 2004 21:47 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 23:22 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Ken Dickey
(25 Jan 2004 01:03 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(25 Jan 2004 03:01 UTC)
|
Re: strings draft
Matthew Dempsky
(25 Jan 2004 06:59 UTC)
|
Re: strings draft
Tom Lord
(25 Jan 2004 07:16 UTC)
|
Re: strings draft
Matthew Dempsky
(26 Jan 2004 23:52 UTC)
|
Re: strings draft
Tom Lord
(27 Jan 2004 00:30 UTC)
|
>From: Tom Lord <xxxxxx@emf.net> Subject: Re: strings draft Date: Sun, 25 Jan 2004 19:27:58 -0800 (PST) > I'm not aware enough about the details of Shiro's. All I'm (sort of) > aware of is that he's dealing with a EUCJP -- which sounds very > challenging if you want to wind up with an implementation suitable for > intensive string processing. (Unicode is similarly challenging.) To be precise, what I'm dealing with is to use a CES-independent multibyte string representation, currently including utf-8, EUC-JP, and Shift_JIS. EUC-CN, EUC-TW and EUC-KR should be supported easily. I exclude stateful encodings, like ISO2022 with stateful escape sequences. (EUC is subset of ISO2022, but it only uses single shift escape, thus effectively it's stateless). Tom, if you have specific instance that discourages mb string, I'm curious about hearing it, either off-list or on-list. The following is a discussion that why I think mb string is feasible. Those who aren't interested can skip it. * * * "Intensive string processing" would vary for application domains. The domain I'm looking at has these properties: * very frequent use of regexp. * strings are hardly mutated. * lots of data passing between external programs/libraries. Regexp engine can be implemented on multibyte strings almost as efficient as "uniform character array" string, by compiling regexp into octet-stream NFA/DFA. Currently the only penalty of my implementation is when you use a character range including large character set. It can be optimized, I think. Regexp is heavily used to extract a part of string. Returning substring directly is very efficient if you share the string body. Using string indices can be actually less efficient, even if you use uniform character array strings. Multibyte representation doesn't necessarily put a penalty to use large corpora; e.g. suffix array can be constructed and used efficiently using byte index (actually, any kind of string reference). Most external libraries and programs nowadays require strings to be passed in some sort of multibyte format. If you can use the same multibyte format internally, sending and receiving data have little overhead. It may not help when you're writing a program that will be used on wide variety of environments, but it is an advantage if you're writing in-house tools where you have knowledge of which encoding is used in the environment. Of course I don't insist multibyte strings is generally superior. Actually I'm not 100% sure multibyte strings doesn't have serious problems. But I was curious, so I started implementing it, and haven't seen a serious problem yet, though there are some unresolved issues (like how to tread illegal byte sequences). --shiro