Re: We need a Pandoc implementation in Scheme

Show/hide message thread

We need a Pandoc implementation in Scheme Lassi Kortela (12 Jun 2021 07:24 UTC)

Re: We need a Pandoc implementation in Scheme Amirouche Boubekki (21 Jun 2021 12:19 UTC)

Re: We need a Pandoc implementation in Scheme Lassi Kortela (28 Jun 2021 19:29 UTC)

Re: We need a Pandoc implementation in Scheme Duy Nguyen (29 Jun 2021 03:43 UTC)

Re: We need a Pandoc implementation in Scheme Arthur A. Gleckler (29 Jun 2021 04:36 UTC)

Re: We need a Pandoc implementation in Scheme Lassi Kortela (29 Jun 2021 09:18 UTC)

Re: We need a Pandoc implementation in Scheme Amirouche Boubekki 21 Jun 2021 12:19 UTC

Show/hide attachments

Hello Lassi, hello all!

Le sam. 12 juin 2021 à 09:24, Lassi Kortela <xxxxxx@lassi.io> a écrit :
>
> A number of goals are converging on the general requirement that we need
> a modular Pandoc (https://en.wikipedia.org/wiki/Pandoc) clone written in
> portable Scheme.

Why is it required or necessary ?

> I'll work on it piecemeal here and there; help is welcome. It should
> probably work by using SXML (http://okmij.org/ftp/Scheme/SXML.html) as
> the internal representation, and convert Markdown/Texinfo/etc. to SXML
> and back.

About SXML tools, it seems clear to me that nowadays xpath hence
sxpath is dead or will die. Most new blood rely on CSS selectors to
query HTML (and XML afaik non-existent nowadays).

(I came to this subject from a very distant topic: make match-lambda
fast. My idea is / was to specialize a set of match patterns to avoid
the need to test multiple times with some kind of prefix compression.
It is also related to generic methods: generic methods are like a
match-lambda except each match clause is built in its own module.
Similarly, it is possible to apply at compile-time prefix compression,
because most generics vary at the beginning, which makes me think a
general decision tree will be overkill; also arguments are passed as a
list, a general decision tree will require to convert the list to a
vector to be able to tap in the middle efficiently. And my plan is /
was to construct a parser combinator from the match spec, specialize /
optimize it, and then produce a lambda with nested-if that would
dispatch to the correct underlying procedure. I think it is possible
with Chez, using define-property that attaches an expression to an
identifier and good use of implicit phasing.)

That is why I started working on a parser combinator library called
paco. I have attached to this mail a proof-of-concept that will stream
a parse result, and possibly stream alternative parse results in the
case where the grammar is ambiguous (not coded yet). It is a broad
feature-set with many pitfalls. I already removed the ability to parse
left-recursive grammars. And maybe handling ambiguous grammars is not
useful as part of a clone of pandoc.

There is many test files in pandoc's github:
https://github.com/jgm/pandoc/blob/master/test/
But I think they miss the point of *unit* testing. I would rather
approach the problem in a way that is similar to SRFI-180 (JSON).

Another topic of interest while working on parsers is fuzzing. The de
facto standard seems to be
https://en.wikipedia.org/wiki/American_fuzzy_lop_(fuzzer)
Another source of possibly interesting test cases is
https://github.com/google/oss-fuzz

I think my next goal is to rewrite SRFI-180 and HTTP 1.1 parser, then
I will try to parse some markup language for a task we need a parser
at work. Getting together a test suite as described above (piecewise
tests) would be of great help for things like HTML, micro XML,
commonmark, and also improve the test suite of HTTP 1.1 parser.

Help and feedback are welcome!

--
Amirouche ~ https://hyper.dev