Re: SQL combinator DSL

Show/hide message thread

SQL combinator DSL Lassi Kortela (18 Sep 2019 11:08 UTC)

Re: SQL combinator DSL Alaric Snell-Pym (18 Sep 2019 12:58 UTC)

Re: SQL combinator DSL John Cowan (18 Sep 2019 22:31 UTC)

Re: SQL combinator DSL Alaric Snell-Pym 18 Sep 2019 12:58 UTC

On 18/09/2019 12:08, Lassi Kortela wrote:
> Can we cautiously establish general agreement that a domain-specific
> langauge of safe SQL combinators is a requirement for basic sanity
> working with databases in Scheme?
>
> I.e. non-DSL use of SQL could be regarded as something similar to unsafe
> or optimized primitives in a high-level language implementation: crucial
> to have, but usually best to avoid and hide behind safe wrappers.
>
> If we can have this agreement, we could start exploring a DSL with the
> people we have on board now.
>
> I don't mind it has engine-specific extensions as well. E.g. if Postgres
> has extensions that people use all the time for real work, why not add
> them in. I imagine any extensions are also safer and easier to use with
> a good DSL than without.

Throwing a few ideas out, we could define the DSL in terms of
constructor procedures (that return some opaque types), thereby allowing
a postgres lib to provide their own constructors that can be intermingled.

One question to answer first is how closely to stick to SQL's structure.

For instance, we could have a "select" form that supports all the things
select can do - GROUP BY, HAVING, WHERE, and projection (picking out
specific columns / computed expressions), that has a simple translation
to the matching SELECT statement. This is how SSQL works, for instance.

Or we can have nodes for joins, selection, grouping, and projection -
more towards the "relational algebra" end - and do more work to fold
them down to SQL.

To give a feel for the implications, a query like:

(select
  (group
   (select
      (join a b '(= a.foo b.foo))
      '(= a.bar 6))
   '(a.x)
   '((y-sum . (sum b.y))))
 '(= y-sum 10))

...could become a single SELECT:

SELECT a.x, SUM(b.y) AS y_sum
  FROM a INNER JOIN b ON a.foo = b.foo
  WHERE a.bar = 6
  GROUP BY a.x
  HAVING y_sum = 10;

...but a naive implementation could create a SELECT for every node, and
rely on the DBMS to optimise it down:

SELECT * FROM
 (SELECT a.x, SUM(b.y) AS sum_y FROM
  (SELECT * FROM
   (SELECT * FROM a INNER JOIN b ON a.foo = b.foo)
   WHERE a.bar = 6)
 GROUP BY a.x)
WHERE sum_y = 10;

Obviously, things like UNION / INTERSECT / EXCEPT [ALL|DISTINCT] need to
generate subqueries for the subexpressions, as the SQL syntax demands
that, along with common table expressions (we could call them "let" in
Scheme rather than WITH..AS if we're trying to lean more towards a
relational algebra!)

>
> Thoughts?
>

--
Alaric Snell-Pym   (M7KIT)
http://www.snell-pym.org.uk/alaric/