Re: Binary S-expressions

Show/hide message thread

Database connections as subprocesses Lassi Kortela (14 Sep 2019 07:30 UTC)
Re: Database connections as subprocesses John Cowan (15 Sep 2019 01:06 UTC)
Re: Database connections as subprocesses Lassi Kortela (15 Sep 2019 06:28 UTC)
Re: Database connections as subprocesses John Cowan (15 Sep 2019 23:02 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 08:22 UTC)
Binary S-expressions Lassi Kortela (16 Sep 2019 17:49 UTC)
(missing)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 09:46 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 11:33 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 12:05 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 12:23 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 13:20 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 13:48 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 15:52 UTC)
Re: Binary S-expressions hga@xxxxxx (17 Sep 2019 16:25 UTC)
Re: Binary S-expressions rain1@xxxxxx (17 Sep 2019 09:28 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 10:05 UTC)
Python library for binary S-expressions Lassi Kortela (17 Sep 2019 21:51 UTC)
R7RS library for binary S-expressions Lassi Kortela (17 Sep 2019 23:56 UTC)
Re: Database connections as subprocesses Alaric Snell-Pym (16 Sep 2019 08:40 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 09:22 UTC)
Re: Database connections as subprocesses Alaric Snell-Pym (16 Sep 2019 11:28 UTC)
Re: Database connections as subprocesses hga@xxxxxx (16 Sep 2019 13:28 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 13:50 UTC)
Re: Database connections as subprocesses hga@xxxxxx (17 Sep 2019 13:59 UTC)
Re: Database connections as subprocesses John Cowan (16 Sep 2019 22:41 UTC)
Re: Database connections as subprocesses Lassi Kortela (17 Sep 2019 09:57 UTC)
Re: Database connections as subprocesses Lassi Kortela (17 Sep 2019 10:22 UTC)

Re: Binary S-expressions Alaric Snell-Pym 17 Sep 2019 15:52 UTC

Show/hide attachments

On 17/09/2019 14:48, Lassi Kortela wrote:
> This is getting off-topic for a persistence list, but for lack of a
> better forum...

Oh, I think it's relevant, because this is all about the encoding
between Scheme types and database values...

Ideally, our drivers will do whatever is correct with the database at
hand to ensure that values of all types are correctly handled.

To support them in that, we need to define the types that are supported.

Something like an SRFI-19 TAI time object represents a single point in
time, which might be converted to a nice TIMESTAMP WITH TIME ZONE (of
UTC?) for storage in a database; but if we read *back* a TIMESTAMP WITH
TIME ZONE we should represent it as an object that contains the point in
time as well as the timezone, because although 2019-09-17T15:32:36+00:00
and 2019-09-17T16:32:36+01:00 represent the same point in time, they are
still different timestamp objects.

As for strings... hopefully, the driver will negotiate with the database
to find out what encoding is expected, but IIRC with MySQL this can vary
between columns in the same database so I'm not sure if there's a nice
way to automatically use the correct encoding in all cases:

https://dev.mysql.com/doc/refman/5.7/en/charset-column.html

The best we can do there is probably to have a "default encoding" set as
part of the options when making a connection, maybe with the option to
override it per-column when binding strings into queries through some
MySQL-specific mechanism. If R7RS strings are Unicode, we need the DB
driver to map from R7RS strings into whatever wire format there is!
PostgreSQL seems to have a proper string type with functions to encode
strings into byte arrays using a given encoding and decode them back;
the actual storage encoding for strings is a database-wide property, and
it lets you set the client encoding on a connection and will translate
for you, so it should be simple to correctly encode/decode strings at
the libpq level:
https://tapoueh.org/blog/2018/04/postgresql-data-types-text-encoding/

However, other users of the same database will often fail to send
correctly encoded text, due to not realising that it's a problem and
just assuming that whatever string of bytes their app has will be
interpreted the same by the database (a friend of mine works at a large
bank and has just recently had to write a tool that goes through tables
heuristically guessing the encoding of strings between latin-1, various
Windows codepages, and utf-8 *as it varies between rows* (due to the
carelessness of previous users over the years) in order to re-encode
them correctly... For normal users, we should magically do the right
thing to make their application correctly send strings to and from the
database; but we should also cater to "advanced" users having to suffer
that kind of work, if we must, perhaps by having some mechanism to
request that strings be returned from the DB as bytevectors for them to
do their own decoding on...

Ta,

--
Alaric Snell-Pym   (M7KIT)
http://www.snell-pym.org.uk/alaric/