Statistics primitives ("Elusive Eight") Lassi Kortela (14 Mar 2024 12:54 UTC)
Re: Statistics primitives ("Elusive Eight") Pierpaolo Bernardi (14 Mar 2024 20:26 UTC)
Re: Statistics primitives ("Elusive Eight") Lassi Kortela (16 Mar 2024 14:37 UTC)
Re: Statistics primitives ("Elusive Eight") Lassi Kortela (17 Mar 2024 16:53 UTC)
Re: Statistics primitives ("Elusive Eight") Jens Axel Søgaard (17 Mar 2024 17:14 UTC)
Re: Useful srfi's [was Re: Statistics primitives ("Elusive Eight")] Lassi Kortela (16 Mar 2024 11:41 UTC)
Re: Useful srfi's [was Re: Statistics primitives ("Elusive Eight")] Arthur A. Gleckler (16 Mar 2024 16:02 UTC)
Re: Statistics primitives ("Elusive Eight") Alex Shinn (18 Mar 2024 02:14 UTC)

Re: Useful srfi's [was Re: Statistics primitives ("Elusive Eight")] Lassi Kortela 16 Mar 2024 11:41 UTC

> The core argument in that paper is reasonable.
Thanks for reviewing it.
> I really really wished scheme had support for arbitrary precision
> float pts. So, GnuMP, plus a math library of stuff not in GnuMP
> (basics, like sine, cosine, exp, log.)
Racket may have something close to that, or if not, it's the place to
start for fast progress.
> The meta question is: who uses scheme? Why? for what purpose?

The fundamental point of Lisp (and hence Scheme) is that it doesn't have
an application domain. It's a chameleon. That's why it's valuable.

That's why there is so much focus on fundamentals around SRFI. Whether
that work is successful is an orthogonal question.

> The statisticians seem to all use R. They are not programmers; they
> seem to be medical doctors and biologists and bio-stats data
> tinkerers. "power users".
>
> Those who wish to pretend they know how to program, those with a
> visual-basic skill-set, they  gravitate to python. From what I've
> heard, sci-py is a foundational cornerstone and attracts those who
> work at the intersection of science data and programming.
>
> Where could scheme have a real impact? Machine learning, LLM's, GPU
> programming are the modern thing, these days.
>
> One hot topic in hardware is "hypervectors". These are bit-strings
> with say, a million bits. Sometimes they might be strings of floats.
> Or both. Depends on the app. Nothing in scheme gives an efficient or
> performant way of working with strings of a million bits or floats,
> and doing basic ops on them: intersection, truncation, overlaps,
> masking, adding, XOR-ing. No support for hardware chipsets. No
> libraries. A decent hypervector library would attract a lot more
> interest than indefinite-gamma or indefinite-beta. The hypervector
> guys actually have venture-capital funding. They might be willing, to,
> you know, throw money at the problem.

These topics should be brought up with the research groups working with
Scheme.

- Racket and Gambit each have some people with a mathematical bent.

- Long-time schemers Jeffrey Siskind and Barak Pearlmutter recently
worked on this: https://github.com/functional-autodiff

- Gambit is from U Montreal which was/is home to Yoshua Bengio's machine
learning group. AFAIK the groups have never interacted, but perhaps they
could.

> Symbolic AI was an old lisp forte, but has been utterly neglected:
> scheme does not offer anything comparable to what one can do with
> prolog, datalog, or answer-set programming (ASP). ASP is a kind of
> "prolog-on-its-side", a constraint-satsifaction solver that has a
> completely different performance profile from conventional prolog. It
> uses a SAT solver under the covers. There are no SAT solver srfi's. No
> ASP srfi's. The Potsdam ASP was the best, a decade ago when I last
> used it.
>
> OpenCyc was written in a pseudo-lisp syntax. It was deeply flawed, but
> still. It's on a fast path to oblivion, but I think the need for
> general symbolic infrastructure is still needed. The geeks have
> gravitated to theorem provers, like Agda or Coq. Nothing in scheme
> would allow you to bootstrap an agda-inspired system. Agda and Coq
> kind-of suck, cause they're static. They're built on OCaML; that's the
> achille's heel. Scheme is dynamic: having a type library in scheme
> would instantly overcome a serious and fundamental impediment in agda,
> Coq.  Build a freakin srfi for types! That gets you a ticket into the
> big-boys game.
>
> One commonality between datalog, agda and opencyc was the ability to
> store lisp-like fragments, trees, in a database/dataset/dataspace, and
> search for fragments of those trees that you want. Imagine having a
> million lines of scheme code, dumped as a big giant blob, in a
> searchable dataset. Each "line of code" would be 50 chars to thousands
> of chars in length. For example: "(genome (gene-name BRCA)
> (upregulator cytocsine) (reactome FOOBAR))" stuff like that. Not
> tables, but irregularly structured data. Millions of these. You don't
> hold them in a file, you put them in a searchable database, and
> provide algos to munch on this data.
>
> Here's an example of 40-year-old cyc data:
>
> (#$genls #$Copper #$Mineral)
> (#$genls #$Mineral #$InorganicMaterial)
> (#$isa #$Copper #$CommodityProduct)
> (#$arg1Isa #$MoviesDirectedByFn #$Director-Movie)
> (#$isa #$MoviesDirectedByFn #$CollectionDenotingFunction)
> (#$MoviesDirectedByFn #$OrsonWelles)
>
> Millions of these.  Datalog allows these to be stored and searched.
> processed and rewritten. solved, extrapolated, inferred.  The #$genls
> and #$isa are type constructors; #$MoviesDirectedByFn is an "arrow
> type": the type of a function. CaML Haskel etc use "->" a literal
> arrow for that constructor. This is where cycl was actually better and
> more innovative, because it was lisp-inspired.

I'm not aware of any active schemer with symbolic AI expertise. The
above applications require specialized training and a lot of patience.
It's not appropriate for that work to start from SRFIs, and may not be
wise to end as SRFIs either, given how complex the problem is.

Prolog, Datalog, SAT, etc. have been implemented several times in
Scheme, but you are looking for advanced implementations, not entry
level ones.

> What does scheme have that's comparable? Freakin "hygenic macros".
> Could have been foundational and brilliant, instead hopelessly
> brain-damaged and self-sabotaged. WTF are you guys doing? How am I
> supposed to run a query, get back an answer like (#$SpecifiedMovieType
> #$Thriller) and apply a macro to that? Am I supposed to wrap that with
> a "define-syntax", then pipe the string into a file descriptor, open
> the file, somehow force "syntax-case" to run, pipe the result of that
> to another file, open the file, and then finally obtain the list of
> all Orson Wells movies? How, exactly, is a normal programmer supposed
> to apply define-syntax and syntax-case to dynamic search results? WTF
> are you guys doing?
>
> I've got a guy, he has a fruit-fly genomics database, about 50GBytes
> of data compressed, about a terabyte uncompressed. He spent about a
> half-a-million dollars USD salary for a bunch of russian programmers
> to re-invent the scheme macro language, but make it run-time instead
> of compile-time. What he got back was something crazy and slow. They
> use parens, just like scheme, but use equals signs to be both a lambda
> and also a define-syntax, depending on the context. It's .. I want to
> call it "crazy and incoherent", but there's a method in their
> madness.  They allow expressions like (= (+ x 2) y)" so, like a
> backwards "(lambda (x) (+ x 2))"  which they can unify with (+ 3 2)
> using a syntax-case like thing, to deduce that x is three, returning
> (= x 3) as the result. So = is both a lambda and also a syntax-case. I
> forget the details,  I can forward the docs if you're interested.
> They're public.
>
> I get emails once a year, begging me to fix it for them.  So now I beg
> of you: fix the freaking scheme macro system. *Please* let me apply
> syntax-case to arbitrary search results from this fruit-fly genomics
> database. Fixing scheme macros so they're not totally brain-dead would
> be an excellent place to start, to regain interest from the AI and
> machine learning world.

You can expand macros at run time with eval, but that's not exactly what
you want.

It's true that the macro and library expander intertwines several
concerns that should perhaps be decoupled. This is something that RnRS
could partake in. But right now RnRS is in a bad place politically and
the priority is to "walk before it can run"; "first, do no harm", etc.
> I had another guy, he was doing computer security on a $5M US Govt
> contract, hiring youngsters to machine-learn computer viruses. The
> system was built on top of "BAP-BIL" "Binary Analysis Platform-
> "Binary Intermediate Language".
>
> BIL was interesting because it looked almost exactly like scheme, or
> maybe CycL without the annoying #$ prefixes. They stored
> dis-assembled, decompiled snippets of viruses in a database, and then
> would try to figure out if fragments of those "threat vectors" where
> in some code. The API to the database was something called "OGRE" 
> Here's an example of OGRE:  Each s-expression has the form
>
> ( < attribute−name> <v1> <v2> . . . <vM>)
>
> So:
> ( declare student ( name str) ( gpa float ) )
> ( student ( name Joe ) ( gpa 3.5 ) )
>
> with
>
> d e c l a r a t i o n : : = ( d e c l a r e < a t t r i b u t e −name>
> < f i e l d > < f i e l d > . . . )
> f i e l d : : = ( < f i e l d −name> < f i e l d −type >)
> f i e l d −type : : = i n t | s t r | bool | f l o a t
>
> Here:
> http://binaryanalysisplatform.github.io/bap/api/odoc/ogre/Ogre/Query/index.html
>
> The achilles heel in OGRE was that it resembled SQL too closely for
> comfort. But, what the heck -- that still makes it a step up: there's
> no srfi that allows this kind of syntax for data.  I would personally
> argue against the specific OGRE system, but it gives a hint of what
> scheme srfi's could do, if the scheme macro system was brought into
> the 21st century.
This seems too broad and open-ended for a SRFI. I recommend talking to
one of the research groups listed above if interested.