From: Amirouche Boubekki <xxxxxx@gmail.com>

Date: Tuesday, July 02, 2019 8:25 AM

Le mar. 2 juil. 2019 à 14:26, Amirouche Boubekki <xxxxxx@gmail.com> a écrit :

From: John Cowan <xxxxxx@ccil.org>
Date: Saturday, June 29, 2019 9:46 AM

On Sat, Jun 29, 2019 at 7:26 AM Amirouche Boubekki <xxxxxx@gmail.com> wrote:

That is very difficult to do validation at this level. OKVS manipulates bytevectors, to validate keys and values it would require to unpack the bytevector which is not know at this level. I acknowledged that validation is missing piece in the whole framework (SRFI-167 + SRFI-168), I am thinking about ways to improve the situation, but that will not be part of any current SRFI.

A simple approach would be to associate a simple Scheme function with each open database that is called before inserting any key-value pair. It is passed the key and value as its arguments, and returns #t if the combination is valid and #f if it is not. Of course the behavior of this function would depend on how the store was used. That would be straightforward to implement within the library.

That's the sort of simple thing I was thinking of, and I think it would be in the scope of either or both SRFIs.

My thought is that validation should be specified in higher level abstraction, but that one should be able to hook it on okvs.

I suspect we're all agreed on that point. An OKVS has basically no knowledge whatsoever to allow it to do RDBMS style declarative validation.

If the constraint is to be persistent, it would need to be stored in the database under some well-known key, which would mess up the complete freedom of the user to use any key desired. If that were overcome, a fully general constraint would have to be stored as a lambda expression and eval'd in some minimal library (which would be worth defining anyway, probably).

That's a capital idea, although evaluating random external code invites mischief. Arguably more beyond the scope of these SRFIs, especially since it would be nice to store multiple constraints, but it would be extremely useful, and help bridge the validation capability gap between simple OKVSes and the better RDBMSes.

RDBMS are row stores and provide shallow and type validation (and sometime foreign key constraints). That is when the user writes a okvs key-value pair, all the data that will allow some kind of validation to be relevant is available:

(okvs-set! db (pack 'news 'row 1) (pack "SRFI-167: OKVS" "Proposal to support Ordered Key-Value Stores in Scheme implementations" 2019))

(okvs-set! db (pack 'news-tag 'row 1) (pack 1 "scheme")) ;; news foreign-key and tag

Whereas the nstore, it would be coded as:

(nstore-add! db '(news-1 title "SRFI-167: OKVS"))

(nstore-add! db '(news-1 body ""Proposal to support Ordered Key-Value Stores in Scheme implementations""))

(nstore-add! db '(news-1 year 2019))

(nstore-add! db '(news-tag-1 news-id news-1))

(nstore-add! db '(news-tag-1 tag "tag"))

In that case, it would be possible to validate the type of individual nstore-add but not the row-equivalent as whole

except IF there was a transaction hook and track with tuple were touched. Mind that the example rely on "auto-commit"

so in that case whole "document" validation will not be possible.

I believe that upfront validation (schema-on-write) is to database programming what static typing

is to programming languages....

Standardized RDBMS row insert and update validation, and delete actions are declarative and indeed akin to static typing, and as you note below are by no means a full validation framework, are very subject to garbage in/garbage out (GIGO). But still quite useful like static typing, and without imposing the day in, day out tedium of ceremony in your code.

But they can perform a degree of data validation, see this example of a CHECK constraint where Age must be >= to 18: https://www.w3schools.com/sql/sql_check.asp

If users make calls at the level of okvs-set! and nstore-add!, or add procedures that do so, we still want hooks available at the OKVS level. But if they make calls at the level of add-blog-post! from the SRFI-168, and have the self-discipline to never mutate at the nstore or okvs level....

However, we're by necessity proposing full fledged stored procedures, whether they're persistently stored in library startup code or in the database and limited to validation or general, they can do arbitrary things when called. If generalized to take arbitrary arguments, and the author is not careful about input sanitation, there's potential for Little Bobby Tables mischief: https://www.xkcd.com/327/

The idea of required library startup code could negate the need to add them to SRFIs, or doing more than the simplest of hooks in SRFI-167 that are installed at library startup, during which which the library could explicitly retrieve validation code from the database.

I've talked myself out of the need for anything more than mutation hooks, without defined in SRFI-167 persistence, just "best practices" example code in it and/or SRFI-168 that starts with a stored procedure example with suitable warnings.

While there are valid simplicity and expected self-discipline arguments for just suggesting them at the library level, I think they fail when you contemplate Data Base Administrator (DBA) "fix-up" actions at the REPL (although he'll of course be able to invoke the necessary "don't fire trigger" option). And mutation hooks can be sidestepped by using okvs-debug. Which if it performs mutations could pile up a great deal of transaction state, since the map operation is wrapping in a single transaction.

And you're right that hooks on okvs-transaction-start or -commit are worth thinking about, for the greater context they'll have. Don't have much of an idea of how to do them, and it would make the most sense that they would also implicitly invoke the mutation hooks. In all cases upon failure requiring a rollback, although is that's implicit?

Or should it be? What happens if you get blasted to the debug level of your REPL? Having that transaction open could be useful, although onerous if other people are working on the database. It should be rolled back if you abandon the debug context.

... Respectively schema-on-read is similar to dynamic typing.

In this regard my take on it is that "gradual" or "optional" typing could be a good thing to support

or the very least not make it impossible.

This is an interesting argument to include a hook on okvs-read. Once you've done it for the mutators it should be trivial to add in the SRFI and implementations.

Another thought, postgresql upfront schema validation is in no way a full validation framework,

in the sense that one must still assume garbage in, garbage out. Also, PostgreSQL validation,

is in no way good enough for end-user reporting because the error that is raised has not enough

information even if you parse (!) the error string. That means that validation must still happen in user code.

Absolutely, they're a limited, last ditch means to maintain basic data integrity, without requiring user validation at all. As you note more on below:

Yet another thought, validation in the sens of a okvs-set! predicate assume that

if the key-value pair pass the test, the whole database will still make sense, that

is a false promise it makes to the user.

AFAIK and I am far from being a specialist on the subject and my knowledge on it might be old:

Whole schema up-front validation does NOT exist. It always assume if it is locally sound, it should

be globally sound. Which like I said previously is still garbage-in, garbage out in some occasions.

If you're that serious, you use a RDBMS with stored procedures hooked on triggers, or require all mutations to be preformed using curated stored procedures (and maybe all actions, to for example avoid the "query from hell" that locks everyone else out of the database for hours while it runs). For the former, see the PostgreSQL implementation of the standard SQL CREATE TRIGGER statement: https://www.postgresql.org/docs/11/sql-createtrigger.html

Still I agree that some validation facility will be a good thing to have.

See the above; it could be a note and example in SRFI-168. But I think it's best to make the simplest possible, modulo what that means for transaction level triggers as we think through them.