My choice is to raise an error, and I made Gauche's json library do so. However, I could think the a case of treating it as a non-unicode codepoint---when you have to deal with existing JSON files that contains such strings. For example, if you read such JSON, add one field to it and write it out, using replacement character is out of option since it loses the information.

I don't know how much such 'sloppy' JSON is in the wild. I once worked on emails and there were so many broken emails that libraries that rejected them weren't usable. I guess it depends on how bad the actual situation is. We can say "it is an error to have unpaired surrogates", and leave the interpretation of "error" to each implementation.

On Sat, Jan 25, 2020 at 4:25 PM John Cowan <xxxxxx@ccil.org> wrote:

On Sat, Jan 25, 2020 at 8:00 PM Shiro Kawai <xxxxxx@gmail.com> wrote:

It may include such surrogate as a character with the codepoint of unpaired surrogate,

I think it's a bad idea to even suggest that. Isolated surrogates have no defined meaning and could only be treated as non-Unicode characters.

or may raise a JSON error.

I believe it's right to raise an error just as if the JSON syntax was invalid, even though isolated surrogates are not actually invalid syntax. In a forgiving mode they could be replaced by U+FFFD.

John Cowan http://vrici.lojban.org/~cowan xxxxxx@ccil.org
One time I called in to the central system and started working on a big
thick 'sed' and 'awk' heavy duty data bashing script. One of the geologists
came by, looked over my shoulder and said 'Oh, that happens to me too.
Try hanging up and phoning in again.' --Beverly Erlebacher