I'm reluctant to include it in posix srfi.  Gauche used to have glob(3) interface, but at some point (long time ago) I reimplemented it in Scheme.  As far as I remember, the decision process was as follows:

- To provide the same functionality on Windows, I have to implement and maintain the logic anyway.
- If I have to write the logic anyway, write it in Scheme is a lot more easier to maintain and extend (e.g. support of "**").
- Glob-like matcher is sometimes useful outside of traditional filesystem.  Suppose you want to implement a tool like gsutil (Google Cloud Storage access utility), which supports glob-like wildcard to filter and pick objects stored in the cloud.  You use almost identical logic for expander, but you need to access some kind of database instead of filesystem directories.  Having glob logic in Scheme makes it easy to hook up alternative subsystem under glob expander.  (To be precise, posix does provide alternative subsystem via GLOB_ALTDIRFUNC, but to use that from Scheme there would be lots of C-Scheme hopping needed.)

If we ever have extended glob srfi, posix glob would be just redundant subset; extended glob can't be implemented "on top of" posix glob.  Posix glob is certainly useful as its own, but not so much as a building block of further functionality; it's a dead-end of a feature path.


On Wed, Dec 11, 2019 at 9:37 AM Lassi Kortela <xxxxxx@lassi.io> wrote:
>     - As mentioned, it's significantly higher-level than the other stuff in
>     this SRFI.
>
> In what sense?  It's glob(3) and much like other things at that level.

In the sense that it adds a lot of semantics on top of syscalls. While
something like read-directory is an abstraction, it's a fairly obvious
mapping of getdirentries(). Glob isn't an obvious mapping of any
syscall. If there was a glob in kernels, it would be a different matter.

libc is both a blessing and a curse. It has a lot of goodies but few of
them are implemented optimally for garbage-collected languages, and many
are not even optimal for C. I hope it will be widely deprecated at some
point. That's why I think the syscall API is a better target to aim for.
Of course, many concessions need to be made.

>     - It's easy to implement in portable Scheme on top of a directory
>     walker.
>
> The implementation on top of open/read/closedir is certainly not a
> trivial effort; especially if you want to minimize the number of
> directories you open.  There's no reason not to use the libc
> implementation since it is there.

IMHO if we're going all the way up to this level of abstraction, the
libc glob syntax is no longer the best one to use. We might as well
implement something like bash or zsh globs, and there is no universally
available C library for that.

> It's not available in Win32, that's true; but lots of Posix things
> aren't.  Here's a self-described minimal implementation:
> https://github.com/oetiker/rrdtool-1.x/blob/master/win32/win32-glob.c

That is a neat glob implementation. I don't quite understand the
parsing; it hardly seems to do any preprocessing to the pattern string.

Again thinking of the syscall surface, WinAPI DLLs expose almost-direct
equivalents for most of the essential syscalls / syscall combinations.

> Posix 2008 and later specifies *, ?, and [...] only, and that's what we
> should provide too.

I've several times found the Posix globs wanting for real work. Simple
jobs are not too bad to do by manually filtering and merging directory
listings; for complex jobs, Posix globs are not feature-rich enough.
Hence based on my experience I'd advocate for something more complex.

>     - It'd be nice to use S-expression regexps instead of using string
>     regexps and worrying about escaping. Probably would be nice to have the
>     traditional string regexps as well.
>
> The whole point of this function is to trade off perfomance, certainly
> in scsh, for convenience.  It does what it does.

The main point of S-expression regexps is correctness and composability.
Performance ought to be slightly poorer than with strings unless macros
are used. But again, disk I/O and syscalls probably take more time.

>     errors should probably be on by default.
>
> Makes sense.  Change the argument name to carry-on? then.

https://sd.keepcalms.com/i/keep-calm-and-ignore-warnings.png

>     We also need to support musl libc and the like. Do those have glob()?
>
> Anything that supports Posix 2008 has glob().  In particular both musl
> and newlib have it.

That's good to know.