From: John Cowan <xxxxxx@ccil.org>
Date: Wednesday, September 18, 2019 8:14 PM

On Wed, Sep 18, 2019 at 4:41 AM Lassi Kortela <xxxxxx@lassi.io> wrote:

> I'll modify my existing pre-SRFI in this direction, taking up a good many
> of the points already raised, or as many as I can remember.  :-)

Is your pre-SRFI already readable somewhere? I didn't find it in your
wg1 repo.

I've updated it now; it's at <https://bitbucket.org/cowan/r7rs-wg1-infra/src/default/SimpleSqlCowan.md>.  Note that while it's no longer SQLite-specific, it is still SQL-specific; it believes in rows, columns, column types, and blobs-as-pseudo-ports.

I think that column-fold is the Wrong Thing: it should fold all rows using specified columns. I"ve just added a preliminary entry to the above page.

Here's my initial comments:


Opaque object types

Database, statement, result-set

Databases

(open-sql-database filename mode) -> db

mode is symbol: read, write, create


This is still SQLite specific.  I've been thinking a more general open should return a "connection", since that's what it generally is, and you may find it useful to open more than one connection to a non-embedded database.  For that matter, you can do the same with SQLite, can't you?  Won't it do the right thing, while only allowing one writer or write at a time?

Thus filename should be something like connect-string, magic-string, whatever, it's content is by convention and for the lowest level of the stack to understand.  mode should be optional and something like the SRFI-167 Ordered Key Value Store (OKVS) config, there an association list, which for SQLite would have one item, 'mode, being 'read, 'write, or 'create.

(sql-database? obj) -> boolean


Are the following uses of timespecs intervals, instead of a point in time since an epoch?  The latter is the only thing they're currently defined to be in SRFI-174.

(sql-interrupt db timespec) -> undefined

Attempt to interrupt requests from any thread.


I don't grok what this does.

(sql-timeout db timespec) -> undefined

Set timeout in seconds and nanoseconds for each database request. Signal an error if request does not complete.


The error needs to be specified so it can be caught and distinguished from other likely more fatal errors, like you over-the-wire connection being lost.  Whereas problems like "the query from hell" are recoverable in some sense.

(sql-in-transaction db thunk) -> what thunk returns

When proc returns, roll back unless sql-commit has been called.


Why not assume success and commit on the completion of the thunk, and require an explicit rollback?  That's what the PostgreSQL egg does, and what I'm planning on doing for my DBI.  See this modification of an example supplied by Peter Bex:

(with-transaction db
  (lambda ()
    (query db "INSERT INTO foo (bar, qux) VALUES (1, 2)")

    (with-transaction
      (lambda ()
        (query db "INSERT INTO foo (bar, qux) VALUES (3, 4)")
        ;; Oops, decide to roll this back
        (rollback)))

    (query db "INSERT INTO foo (bar, qux) VALUES (5, 6)")))

The original had it returning #f where I placed the rollback, but Lassi pointed out that can be quite confusing to do and debug in practice.

(sql-commit db) -> unspecified


Per the above, I don't think this is needed.  Rollback is definitely needed, and if you make commit the default, you can reduce the API surface by remove this manual commit.

(sql-rollback db) -> unspecified


Per the above, a call to the rollback procedure gets you to the containing with-transaction, which already has db.

In another message you noted that while SQLite doesn't have nested transactions, it does have savepoints, which are sufficient for the job.  Are savepoints in general preferred to nested transactions?

(sql-in-transaction? db) -> boolean

Is the database currently running a transaction?

Statements and result sets

(sql-statement db code bindings) -> statement

It is an error to mutate either code or bindings; this enables caching of either compiled but unbound statements, fully bound statements, or both.


I've been thinking that while there should be a sane default, you ought to be able to provide a perhaps per-database type (SQLite, PostgreSQL, etc.) hint, suggestion, or order to prepare the supplied statement.  And for not preparing, I'm thinking a db/connection config item should be a procedure that's used to ward against code injection.

I assume the specifics of code and bindings are still up in the air based on the in progress discussions?

(sql-statement? obj) -> boolean

(sql-exec db statement) -> implementation-dependent

Use this procedure when statement does not return a result set.

(sql-result-set db statement thunk) -> whatever thunk returns

Executes statement and calls thunk, passing a result-set object.


And here you ought to be able to provide a hint, suggestion or order to get the full result set, or play the cursor game underneath the API.

Hmmm, thunk can do what you want, or return the result-set object.  I suppose that's more featureful without really imposing on the user.

(sql-read result-set) -> list

Returns a list of Scheme objects representing the next available row of the result-set. NULL is represented by the symbol nil.


As in 'nil ??

(sql-read-all result-set) -> list of lists

Returns all available rows in this result-set.

(sql-for-each proc result-set) -> unspecified

Applies each result of result-set to proc for its side effects.

(sql-map->list proc result-set) -> list

Applies each result of result-set to proc and returns a list of the results.

(sql-fold proc knil result-set) -> any

Call proc on each result of result-set and the current state (initially knil). Order of arguments?

(sql-column-fold proc-list column-list knil-list result-set) -> any

Folds specified columns simultaneously, where column-list specifies the names of the columns of interest, proc-list is the folding procs corresponding to the chosen columns, and knil-list is the initial values of the current states of each fold.

Blobs

ISSUE: Should this be gotten rid of? It's not strictly necessary, but handling really large blobs without it will be messy. Ideally it should give us ports, but adding ports to a Scheme implementation is hard.


Will defer on this for now, given the above question.

Exceptions

(make-sql-exception code message) -> sql-exception


Note that in some databases like MySQL you'll be putting together several strings to make up a single message.

(sql-exception? obj) -> boolean

(sql-exception-code sql-exception) -> exact integer


Lots of databases don't use integer codes, or at least this is true of Oracle and PostgreSQL vs. MySQL, SQL Server, and SQLite based on what I've recently looked up.  I think Oracle's ORA-[exact integer] makes sense, it gives me a good idea of where there's a problem in my stack vs. an anonymous bare number, and I'm wondering if it might make sense, perhaps as an option, to prepend a similar prefix for other databases.  Stripping the leading "ORA-" isn't a general solution, PostgreSQL's error codes are fully alphanumeric, such as "0LP01".

(sql-exception-message sql-exception) -> string

Meta

This is not standardized over databases, so provided here. There isn't much, but what there is is generally useful.


One thought I've had is that to the extent you can't introspect this information, the API can provide a facility for keeping a (hopefully in-sync) copy of what's missing.

(sql-tables db) -> list of symbols

The symbols correspond to database tables (including views) accessible to the current user.

(sql-columns db table) -> list of symbols

The symbols represent column names, and appear in ordinal position.

(sql-column-type db table column) -> string

Returns the declared type of the specified table and column. The result is a string whose possible values depend on the database.


Worth emphasizing that "declared type" is narrowly construed for SQLite: https://www.sqlite.org/datatype3.html which as you've been emphasizing is great for Scheme.

- Harold