See below, these are really general purpose triggers, that are well suited to validation, but also other actions including ones that maintain data integrity without being "validation" per se.
Date: Thursday, July 04, 2019 3:28 PM
Date: Tuesday, July 02, 2019 8:40 AM
[...]
My last ideas:
- validating individual key operation as they happen in okvs-set! maybe okvs-delete!
Definitely both, and potentially beyond validating, to for example implement cascading deletes if the schema demands that.
- validating whole transactions
I haven't yet figured out anything we can do beside follow the SQL RDBMS model.
Note this is implemented by full fledged stored procedures, they'll need to be handed the transaction handle because they can and may need to do arbitrary database operations. This has implementation implications for the 'no-transaction concept, it could turn what appears to be a single operation into multiple ones that all need to be wrapped in the same transaction.
Well, make a no-transaction operation into a multi-operation transnational operation would be missing the point of no-transaction.
'no-transaction imply not atomic.
That's a good point, but it will cause unpleasant, and very obnoxious to fix surprises if 'no-transaction operations call triggers that mutate the database, like cascading deletes, and this gets interrupted. And 'no-transaction is intended to be something of a lie, in that for databases that support transactions, the single procedure using it as its transaction handle must execute it in a transaction. The idea is to give the library code the information it needs to execute a single operation with the highest efficiency and safety. And I'm pretty sure we don't want 'no-transaction to skip triggers.
So maybe we want to change it to 'single-transaction, with the caveats that there will be none if the OKVS doesn't support them, and multiple if it calls triggers.
For maintenance purposes, we need a way to skip trigger procedures, but allow transactions. We should take the SQL approach that allows you to disable and enable triggers, but disable them for no longer than the duration of the running process, since we're successfully avoiding a requirement for durable storage persistence for this.
A bit of searching revealed that Wiretiger doesn't seem to have nested transactions, but Oracle's Sleepycat Berkeley DB does, so the concept is not alien to KVSes. Do we want to make any provisions for them?
I don't think we should.
In terms of the API prior to the 'no-transaction update, it could be very simple by handing okvs-transaction-begin and okvs-in-transaction the existing outer transaction handle instead of the okvs database handle.
If not, the SRFI should state it's an error to begin a transaction inside another.
I don't think we should specify that and leave the door open to implement it.
Similarly sqlite3 lsm extension has a notion of nested transaction [0]
which would require the use of some kind of transaction-state to implement.
I'm thinking that the interface of OKVSes is so simple, we it should make it very easy to replace one with another. RBMSes are infamous about not conforming to standards, even for the simplest concepts like data types, where SQLite3 isn't even conceptually similar to the basic SQL standard which DB2 UDB, Oracle, and PostgreSQL implement, of course each with their own omissions and enhancements. So it's expected that a fair amount of work will be required to replace a RDBMS back end.
How else do OKVSes differ in way that'll be visible to a SRFI-167 user?
It's not in Scheme's style to outlaw something like nested transactions. Using the current API for nested transactions, so that the transaction handles at the user level are unrelated, so threading them like I suggest above is not required, is much better for development and maintenance.
Also, it should not make impossible the use of multiple abstraction in the same okvs.
Isn't that implemented by convention through prefixes?
Yes.
That allows two approaches for deciding which triggers to call: either the one that matches the largest fraction of the key prefix, or all that match some fraction of the prefix, starting from the greatest to the least matches, where if a no prefix trigger exists it would be called for every okvs-set!, okvs-delete! or okvs-range-remove! mutation operation (for okvs-range-remove!, for each key-value pair deleted).
Does a hierarchy of prefixes make sense? What about the simpler case of a single prefix specific trigger, plus an optional trigger that fires for all keys?
Following the SQL model, if we allow multiple triggers per operation, we could have before and after triggers. And multiples for a prefix, which fire in alphabetical order by the name of the trigger. If only one, an obvious simple implementation is handing the stored procedure a procedure that includes a transaction as an argument and performs the raw mutation, which it would be called at the desired point.
We need to make this non-local implicit behavior discoverable, with something like (okvs-trigger-print 'set! prefix) that prints the procedure or procedures that get called; if multiple, in order along with when and where the mutation is done.
FWIW, my plan was to not create the notion of stored procedures. At least, we should
not make it obligatory to use, since sandboxed eval can not be guaranteed.
Procedures stored in some place are the only option we have for validation etc., since OKVSes know essentially nothing about the data they store, which is why I'm using the name trigger in alignment with SQL. We can suggest security measures like running the procedures in a SRFI-172 restricted sandbox, locking down the database if you're storing them in one, not blindly accepting pull requests if in library code etc., but realistically, how much more dangerous is this than letting anyone import and call okvs-delete! etc...?
If we want to lock all this down, we'd need to add something like the SQL security system, which defines identities which have to be logged into, and roles which they can have. Like what prefixes a role is allowed to mess with, and what operations on that prefix. And that's something that needs to be both dynamic and persistent, it would have to be stored in a database.
In the use space I was considering that the okvs was fully controlled by a single POSIX processes,
possibly exposing REST endpoints, possibly a REPL of some sort to execute queries.
Users will find all sorts of other ways of using these SRFIs.... And we should absolutely allow for more than one process to concurrently manipulate an OKVS. WiredTiger seems to go to a great deal of effort to support these sorts of use cases:
http://source.wiredtiger.com/3.2.0/architecture.html ? And that now includes MongoDB replica set multi-node updating.
- Harold