RDBMS data types and NULL representation Peter Bex (11 Sep 2019 08:08 UTC)
Re: RDBMS data types and NULL representation Lassi Kortela (11 Sep 2019 08:53 UTC)
Re: RDBMS data types and NULL representation Peter Bex (11 Sep 2019 09:30 UTC)
Partitioning the persistence problem space hga@xxxxxx (12 Sep 2019 12:29 UTC)
Re: Partitioning the persistence problem space Peter Bex (12 Sep 2019 12:45 UTC)
Re: Partitioning the persistence problem space hga@xxxxxx (12 Sep 2019 13:13 UTC)
Re: Partitioning the persistence problem space John Cowan (14 Sep 2019 03:59 UTC)

RDBMS data types and NULL representation Peter Bex 11 Sep 2019 08:08 UTC
Hi all,

It's great to see such a flurry of interest in using Scheme for practical
things!  Regarding persistence, I know most about SQL-based RDBMSes, not
so much about other stuff, so that's my main interest for joining this
list.

In the CHICKEN ecosystem, we have a few DB eggs that people enjoy using.
The one that comes up most is sql-de-lite[1], which is indeed a delight
to work with, very elegant and high-level.  It offers a rather
"functional" approach with fold and map and such over rows.  I've modeled
the postgresql[2] egg on it.  I think these APIs are very Schemely and
not so "object-oriented" as you see in some other libraries.

Given my experience, I would like to nail down a few hairy issues first.
The simple things need to be handled too, but that's probably more of a
bikeshed-prone discussion I'm not so interested in right now.  I can chime
in on those as well if needed.

At least with Postgres, the type system is extensible, so you can create
new custom SQL datatypes, which can be mapped to Scheme types.  One
tricky thing is data types like arrays and lists.  I've chosen to map
vectors to arrays and lists to tuples (which are compound objects).
It's unclear what Postgres JSON or HStore objects should map to, though,
for example.  And in MySQL/Mariadb I think you have only JSON but no
true arrays, which might make choosing vectors to represent arrays in
general a bit awkward; one would probably prefer to use vectors for
JSON arrays (and alists for JSON objects?).

For now mapping arrays to vectors this seems to be a decent choice, but
there are some issues with Postgres array bounds.  For one, arrays are
one-based by default.  But they can also have custom bounds aren't
one-based.  For example, '[-1,2]={1,2,3}'::int[] constructs an array that
has a lower bound of -1 and an upper bound of 2.  Currently I simply
ignore that and map it to a vector, so this bound info won't survive a
round-trip to the database (it has no read-write invariance, you could
say).

Another important thing we should nail down early is what to do with the
SQL NULL type.  In CHICKEN, we have a separate sql-null[3] egg that deals
with these values, which is used by some (but, annoyingly, not all) eggs
as a common denominator.  Some interfaces use #f to indicate NULL, but
that causes trouble if the database in question also supports a boolean
type.

The sql-null egg provides a few "ternary logic" operators which handle
NULL as the unknown truth value (more or less like SQL does), but I
consider these niceties.  The important thing is having a common data
type that all SQL database libraries can use.  In the egg's API, it is
undefined whether sql-null is eq? to itself.  I think it's best if it
isn't (and in the egg it isn't), to avoid people accidentally comparing
nulls (which in SQL too is unequal to itself).

[1] https://eggs.call-cc.org/5/sql-de-lite
[2] https://eggs.call-cc.org/5/postgresql
[3] https://eggs.call-cc.org/5/sql-null

Cheers,
Peter