Re: What should Hello World be like?

Show/hide message thread

Please draft us a Scheme API Lassi Kortela (17 Sep 2019 18:54 UTC)

Re: Please draft us a Scheme API John Cowan (17 Sep 2019 19:08 UTC)

What should Hello World be like? Lassi Kortela (18 Sep 2019 08:41 UTC)

Re: What should Hello World be like? Lassi Kortela (18 Sep 2019 08:54 UTC)

Re: What should Hello World be like? John Cowan (19 Sep 2019 01:14 UTC)

Re: What should Hello World be like? hga@xxxxxx (19 Sep 2019 02:31 UTC)

Re: What should Hello World be like? Lassi Kortela (19 Sep 2019 15:35 UTC)

Re: What should Hello World be like? Peter Bex (19 Sep 2019 18:02 UTC)

Re: What should Hello World be like? hga@xxxxxx (19 Sep 2019 22:50 UTC)

Re: What should Hello World be like? John Cowan (19 Sep 2019 19:46 UTC)

Re: What should Hello World be like? hga@xxxxxx (19 Sep 2019 20:50 UTC)

Re: What should Hello World be like? Lassi Kortela (19 Sep 2019 21:14 UTC)

Re: What should Hello World be like? Peter Bex (20 Sep 2019 07:49 UTC)

Re: What should Hello World be like? Alaric Snell-Pym (20 Sep 2019 10:55 UTC)

Re: What should Hello World be like? John Cowan (20 Sep 2019 13:25 UTC)

Re: What should Hello World be like? John Cowan (20 Sep 2019 14:04 UTC)

Re: What should Hello World be like? Lassi Kortela (23 Sep 2019 10:54 UTC)

Connection strings, and representing them as lists Lassi Kortela (23 Sep 2019 10:57 UTC)

John's Simple SQL API hga@xxxxxx (19 Sep 2019 11:38 UTC)

Re: John's Simple SQL API John Cowan (19 Sep 2019 20:28 UTC)

Re: Please draft us a Scheme API hga@xxxxxx (17 Sep 2019 19:11 UTC)

Re: Please draft us a Scheme API John Cowan (17 Sep 2019 19:12 UTC)

Re: Please draft us a Scheme API Lassi Kortela (17 Sep 2019 19:32 UTC)

Re: Please draft us a Scheme API John Cowan (17 Sep 2019 19:35 UTC)

Re: Please draft us a Scheme API Lassi Kortela (17 Sep 2019 19:27 UTC)

Re: What should Hello World be like? Lassi Kortela 19 Sep 2019 15:35 UTC

>> I've updated it now; it's at <https://bitbucket.org/cowan/r7rs-wg1-infra/src/default/SimpleSqlCowan.md>. Note that while it's no longer SQLite-specific, it is still SQL-specific; it believes in rows, columns, column types, and blobs-as-pseudo-ports.

Thanks once again for a nicely drafted spec. Looks very good! I like the
general approach, and also most of the details.

"sql-" is a nice prefix for the procedures. It's good to distinguish SQL
databases from document and graph databases (GraphQL still coming soon).

>> I think that column-fold is the Wrong Thing: it should fold all rows using specified columns.

Haven't thought about this in enough detail. I'll defer to others.

>> (open-sql-database filename mode) -> db
>
>> mode is symbol: read, write, create

You probably left this unedited by accident. I'd make it
(open-sql-database ...) where ... is a plist of arbitrary key-value
pairs. For example, I just implemented Postgres connect using its C
library, and it accepts tons of options like that.

The simplicity of a single "connection string" is enticing, but the real
world is not that simple. Postgres has some legacy support to parse such
a connection string, but I haven't dared look into what kinds of
shenanigans go into it :)

For a "connection string" I'd go with a DATABASE_URL environment
variable as endorsed by 12factor. Our DB API doesn't have to worry about
that; we can give key-value pairs to the DB and the DATABASE_URL library
decomposes URLs into those pairs. I wrote such a parser for Racket:
<https://github.com/lassik/racket-database-url/blob/master/database-url.rkt>.

The database URL syntax is an unruly wild west, but other kinds of
connection strings are probably worse...

> I've been thinking a more general open should return a "connection", since that's what it generally is, and you may find it useful to open more than one connection to a non-embedded database.

Racket has some kind of connection pool:
<https://docs.racket-lang.org/db/connect.html?q=pool#%28tech._connection._pool%29>.

>> (sql-interrupt db timespec) -> undefined
>
>> Attempt to interrupt requests from any thread.
>
> I don't grok what this does.

Possibly stops any pending sql-exec and causes it to raise a special
exception to its caller that says it was interrupted?

>> (sql-timeout db timespec) -> undefined
>
>> Set timeout in seconds and nanoseconds for each database request. Signal an error if request does not complete.

Is there value in altering the timeout while a connection is open? It
would probably be more portable if the timeout is given once as an
option to "open-sql-database". But if it's common to keep changing the
timeout it may make sense to have this as well.

>> (sql-commit db) -> unspecified
>
> Rollback is definitely needed, and if you make commit the default, you can reduce the API surface by remove this manual commit.

I agree that having an explicit commit procedure can be confusing. It
makes you worry about whether there is ever any auto-commit, or whether
you need to commit everything manually.

Are there situations where you want neither auto-commit nor
transactions? I know nothing about this topic.

>> (sql-in-transaction? db) -> boolean
>
>> Is the database currently running a transaction?

This is probably nice to have for library code.

>> Statements and result sets
>
>> (sql-statement db code bindings) -> statement
>
>> It is an error to mutate either code or bindings; this enables caching of either compiled but unbound statements, fully bound statements, or both.
>
>
> I've been thinking that while there should be a sane default, you ought to be able to provide a perhaps per-database type (SQLite, PostgreSQL, etc.) hint, suggestion, or order to prepare the supplied statement. And for not preparing, I'm thinking a db/connection config item should be a procedure that's used to ward against code injection.
>
> I assume the specifics of code and bindings are still up in the air based on the in progress discussions?
>
>> (sql-exec db statement) -> implementation-dependent
>
>> Use this procedure when statement does not return a result set.
>
>> (sql-result-set db statement thunk) -> whatever thunk returns
>
>> Executes statement and calls thunk, passing a result-set object.

I would have sql-exec be the only procedure that executes SQL. This is a
common point of confusion in DB APIs in other languages: there are
separate "execute" and "query" procedures, which are basically
equivalent, yet not really, so you're left scratching your head. Those
kinds of details don't stick to memory because they are illogical.

IMHO it's easier to understand that a query is run though sql-exec like
other statements. But the other statements return zero rows, whereas a
query returns zero or more rows.

In many (but not all!) databases, a SQL INSERT or UPDATE returns the
count of rows affected, without actually returning the contents of those
rows. We need to think about how to bake this into the API. There are
probably situations where people know what database they are using and
want to avoid an extra SELECT to find out the row count.

One thing that I think is mandatory for basic programming sanity is
special procedures for when you expect a query to return:

* Either no rows or one row, never more than one.
* Always exactly one row.

Not sure whether or not these special cases should be baked in at such a
low level, but for application programming they are great.

> And here you ought to be able to provide a hint, suggestion or order to get the full result set, or play the cursor game underneath the API.
>
> Hmmm, thunk can do what you want, or return the result-set object. I suppose that's more featureful without really imposing on the user.
>
>> (sql-read result-set) -> list
>
>> Returns a list of Scheme objects representing the next available row of the result-set.

Is there a particular reason we are representing a row as a list instead
of a vector? Vectors have the advantage that it's fast to get a given
column by index, and all rows from a given query ought to have the same
number of columns.

>> NULL is represented by the symbol nil.

Is there a particular reason to use nil instead of null? null is native
in SQL, and we are also planning to use it  for JSON so it'd be nice to
be able to standardize on it.

>> (sql-read-all result-set) -> list of lists
>
>> Returns all available rows in this result-set.

This is nice to have, and the name is good.

>> (sql-for-each proc result-set) -> unspecified
>
>> Applies each result of result-set to proc for its side effects.

Ditto.

>> (sql-map->list proc result-set) -> list
>
>> Applies each result of result-set to proc and returns a list of the results.

Why not.

>> (sql-fold proc knil result-set) -> any
>
>> Call proc on each result of result-set and the current state (initially knil). Order of arguments?

Great.

>> Blobs
>
>> ISSUE: Should this be gotten rid of? It's not strictly necessary, but handling really large blobs without it will be messy. Ideally it should give us ports, but adding ports to a Scheme implementation is hard.

I would definitely vote to have strong support for blobs. Also in favor
of read/write interfaces, but unsure about the details. Archive SRFI has
to solve an almost identical problem.

> Lots of databases *don't* use integer codes, or at least this is true of [...] SQLite

SQLite has integer error codes, but your overall point stands.

It would be good to have a particular symbol naming each DB engine:
oracle, postgresql, mysql, sqlite, etc. And that symbol would be tossed
around in a lot of places, so library code could find out which DB it is
using.

It might be good to make the error an alist:

((engine sqlite)
  (native-error-symbol SQLITE_BUSY)
  (native-error-code-integer 5)
  (native-error-message "database is locked")
  ...any other fields...)

Obviously pick better names :)

The DB driver could return as much data as it knows with each error.
Callers can ignore the data they don't understand.

>> (sql-tables db) -> list of symbols
>
>> The symbols correspond to database tables (including views) accessible to the current user.

Interesting. I wonder if there's a portable way to implement this.

>> (sql-columns db table) -> list of symbols
>
>> The symbols represent column names, and appear in ordinal position.

Also interesting and useful. Getting the columns from a result set is
probably much more standardized.

>> (sql-column-type db table column) -> string

Nice.

All in all, in very good shape already IMO.