s-expression database
Linas Vepstas
(27 Oct 2021 18:18 UTC)
|
Re: s-expression database
Marc Nieper-Wißkirchen
(08 Nov 2021 17:34 UTC)
|
Re: s-expression database Linas Vepstas (08 Nov 2021 19:41 UTC)
|
On Mon, Nov 8, 2021 at 11:34 AM Marc Nieper-Wißkirchen <xxxxxx@gmail.com> wrote: > > Do you know the rewrite system Maude? How does your system compare with it? I hadn't heard of Maude but I did find this: http://maude.cs.uiuc.edu/maude1/manual/maude-manual-html/maude-manual_3.html I'm skimming this, but from what I can tell, with Maude, you have to already know what is to be re-written, and also know the rewrite rules. In my case this is not yet known. Be aware that *every compiler* has a rewrite engine inside of it. gcc, for example, converts C code into abstract syntax trees, and then applies thousands of rewrites that perform optimization, register assignment, constant term reduction, etc. Those trees are then discarded, once assembly code or bytecode has been generated. (In the past, I've worked on several compilers, deep in the guts of them.) (btw, the gcc internals are rather lisp-like, and the lisp community used to give them shit about this, because it wasn't pure clean lisp. But of course, they were missing the point: compilers are not about lisp, but about rewriting.) In "my system" (its 20-25 years old, and predates me by a decade), yes, there is a rewrite subsystem. At the lowest level, one has a jumble of terms, and a jumble of rewrite rules (aka "a database") You then fish out some of the terms (using a database query) and some of the rewrite rules (using a query) and apply what you found -- apply the rules to the terms. This can be done recursively, so it's a defacto forward-chainer. Some people have written a backward-chainer on top of this. There are vague plans for layering prolog on top of it (the main stumbling block is "why bother?") In short, rewriting is easy. Unification is fairly easy. Query is just plain hard. Many, but not all terms stored in the database are evaluatable/executable. For example (+ 2 3) stored as a binary tree, (an abstract syntax tree), with 2,3 as leaves and + as the root of the tree -- this is obviously evaluatable. Equivalently, you can specify a rewrite rule that rewrites it into 5. My blurbling about tags and tagged procedures in srfi-229 is because we tag everything, by default. Originally these tags were "truth values": true/false, or maybe a single floating point number representing the probability of the term. Or two floating point numbers, representing the probability and the confidence. Or three, representing the probability, confidence and importance. Or... well eventually, the light-bulb switched on and the tags became places to store arbitrary extra data that lives *outside* of the searchable database of terms. (The tags are not searchable; the actual tag implementation very loosely resembles that of srfi-229) So, what I have, de facto, is a database for s-expressions, a way to query them, a way to rewrite them in arbitrary ways, a way to execute/evaluate those which are executable/evaluatable, and a way to tag them with arbitrary non-searchable data. You mentioned 3-lisp elsewhere. One of the explicit goals of "my system" is that its reflexive - not only is the data s-expressions, but the queries are s-expressions as well (and thus storable in the database) Of course, terms like "eval" and "apply" amd "define" and "lambda" are s-expressions too, and are storable (and searchable, and can be rewritten... that's why I blurble about "macros". You can write a query to find some subset of all lambdas, and rewrite them into ffamdas .. and even evaluate the result, if you are lucky enough to have an evaluatable definition for ffamdas in the database.) Reflectivity means that you can not only analyze what a program does, but even post the query: "find all queries to which 42 is the answer". This drives ordinary database people nuts, but is the bread and butter of chatbots: this is how chatbots figure out what to talk about next. The chatbot query is "find everything I can say about 42, rank them by priority, and blurt out the one at the top of the list." Anyway, the above is a sampling of the fun stuff you can do with a database of s-expressions. The hard part is what to do, if you do not yet know what to rewrite or evaluate. That is a topic that is far outside of the bounds of this discussion (it's the focus of my current research. Making progress, but it's all uphill.) --linas > > Cheers, > > Marc > > Am Mi., 27. Okt. 2021 um 20:17 Uhr schrieb Linas Vepstas <xxxxxx@gmail.com>: >> >> I read through srfi-213 with a twinkle in my eye, and want to get Marc's attention, and perhaps influence his thoughts. >> >> From where I sit, I find that srfi-213 is a baby-step towards a flat-file s-expression database. I like that, because I have actually built a rather sophisticated s-expression database. I dislike the flat-file part: flat files are a terrible way of recording data. (I also dislike that what I've built is not really scheme-compatible, and thus this email.) >> >> I should unpack the above paragraph, so as to be more understandable, and less outlandish. Start by thinking of "identifiers" as single "points" in some abstract "space". In this setting, srfi-213 and `define-property` attach "data" to those "points". >> >> I think this is wonderfully useful... conditioned on one thing, though. To have a "true database", one needs persistence: I would like to be able to say: "hey, all of those identifiers that have properties on them, please save them somewhere, and next time I start my program up, please restore those properties". Yes, of course, one could write a library routine to do this, but first, a bit more to understand and contemplate. >> >> Properties become fantastically useful, if one has some method to search for specific identifiers. Suppose that, in one's collection of s-expressions, one has the expression `(some (foo thing) (bar (stuff mumble jumble)))` A proper query language then allows or searches such as "find all expressions that have `jumble` and `foo` in them, and return them to me". With these expressions now in hand, the typical user will then examine some specific property on them. >> >> I have built exactly such a system, but it is not really scheme-compatible (the guts are in c++, the bindings are guile... and python). I'm writing this email because I'd like to see a generic, scheme-compatible, portable version thereof. >> >> The second half of srfi-213 talks about transformers, and I should too. So, when a user performs a search: "find all expressions that have `jumble` and `foo` in them, and return them to me", another common operation is to then say "for-each expression replace `jumble` by `(+ thing 42)` and replace `foo` by `(list 'a 'b 'c)`" >> >> I hope that it is obvious that what I just wrote resembles the situation of "for-each expression apply macro jumble-foo". That is, after performing the search, and getting the results back, one selectively transforms those results by applying specific "rewrite rules" to them. >> >> I hope that, in writing this, it has now become clear to the reader what an "s-expression database" could be. It resembles current-day scheme, but has some highly unconventional aspects: >> >> * Instead of having "code" stored in "text files" that is "run" by the scheme system, one has s-expressions stored in an abstract "space", which might be invoked and run by the scheme system. >> >> * Instead of having "macros" applied to everything that comes in through the door, one has syntax-transformers (rewrite rules) that are selectively applied to specific sets of s-expressions. The result of those rewrites/transformations are then jammed back into the abstract space of s-expressions. >> >> * Upon program termination, instead of wiping out everything that has ever been done, the space of s-expressions is persisted, so that it can be re-opened and re-used next time the program is started. Or by any other program, for that matter. >> >> Again, I have created exactly such a system, see the two URL's below. My goal is not to advertise this system, but rather to encourage the scheme community to recreate it, better and stronger, and fully compatible with scheme. I think this email provides enough information for how to do this. >> >> But if you want to explore some more: my system is called the "atomspace". You can read the docs, but perhaps the best way to wrap your mind around it is to run the examples. They are here: https://github.com/opencog/atomspace/tree/master/examples >> >> We made one fundamental naming flaw: we used the word Values with a capital V when a better name would have been Properties. Alas. >> >> Marc: this email is aimed squarely at you. I think you have the general understanding and the general abilities to wrap your mind around this idea. Or at least to ponder it for a little while (which is really what I ask: just ponder it for a while...) See what you can do with it. >> >> For me, this task has taken on some particular urgency; there's a group of people who have started to work on an atomspace-next-generation. I don't believe they will succeed, because, well, many reasons. If there's a next-generation version, I would rather see it done in a generic scheme setting. >> >> --linas >> >> -- >> Patrick: Are they laughing at us? >> Sponge Bob: No, Patrick, they are laughing next to us. >> >> -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us.