Email list hosting service & mailing list manager

peer-to-peer Amirouche Boubekki (05 Oct 2019 12:24 UTC)
We need a pre-SRFI list hga@xxxxxx (05 Oct 2019 12:41 UTC)
Re: We need a pre-SRFI list Arthur A. Gleckler (05 Oct 2019 19:14 UTC)
Re: We need a pre-SRFI list hga@xxxxxx (05 Oct 2019 20:20 UTC)
Re: We need a pre-SRFI list Duy Nguyen (06 Oct 2019 01:47 UTC)
Re: We need a pre-SRFI list elf (06 Oct 2019 01:51 UTC)
Re: We need a pre-SRFI list hga@xxxxxx (06 Oct 2019 02:18 UTC)
Re: We need a pre-SRFI list elf (06 Oct 2019 02:33 UTC)
Re: We need a pre-SRFI list Arthur A. Gleckler (06 Oct 2019 04:57 UTC)
Re: We need a pre-SRFI list hga@xxxxxx (06 Oct 2019 11:42 UTC)
Re: We need a pre-SRFI list Amirouche Boubekki (06 Oct 2019 06:09 UTC)
Re: We need a pre-SRFI list Arthur A. Gleckler (06 Oct 2019 17:30 UTC)
Planning how to organize Scheme discussion Lassi Kortela (06 Oct 2019 17:48 UTC)
Re: Planning how to organize Scheme discussion hga@xxxxxx (06 Oct 2019 19:41 UTC)
Re: We need a pre-SRFI list Arthur A. Gleckler (06 Oct 2019 18:30 UTC)
Re: We need a pre-SRFI list Lassi Kortela (06 Oct 2019 19:31 UTC)
Re: We need a pre-SRFI list Amirouche Boubekki (06 Oct 2019 19:48 UTC)
Re: We need a pre-SRFI list Amirouche Boubekki (06 Oct 2019 19:56 UTC)
Re: We need a pre-SRFI list elf (06 Oct 2019 01:53 UTC)
Re: We need a pre-SRFI list Vladimir Nikishkin (06 Oct 2019 03:06 UTC)
Re: We need a pre-SRFI list Duy Nguyen (06 Oct 2019 04:13 UTC)
Matrix libraries Lassi Kortela (06 Oct 2019 14:51 UTC)
Re: Matrix libraries John Cowan (06 Oct 2019 17:55 UTC)
Who's working on what? Lassi Kortela (06 Oct 2019 19:39 UTC)
Re: Who's working on what? Amirouche Boubekki (06 Oct 2019 20:19 UTC)
Re: Who's working on what? Amirouche Boubekki (06 Oct 2019 20:26 UTC)
Re: Who's working on what? John Cowan (06 Oct 2019 20:40 UTC)
Re: peer-to-peer Amirouche Boubekki (05 Oct 2019 14:43 UTC)
Re: peer-to-peer Arthur A. Gleckler (06 Oct 2019 05:14 UTC)
Peer-to-peer, sockets and binary s-expressions Lassi Kortela (06 Oct 2019 12:41 UTC)
Re: Peer-to-peer, sockets and binary s-expressions Amirouche Boubekki (06 Oct 2019 13:46 UTC)
Re: Peer-to-peer, sockets and binary s-expressions John Cowan (06 Oct 2019 20:35 UTC)
Re: Peer-to-peer, sockets and binary s-expressions Vladimir Nikishkin (07 Oct 2019 02:42 UTC)
WebSockets Lassi Kortela (06 Oct 2019 12:47 UTC)
Re: WebSockets Per Bothner (06 Oct 2019 14:40 UTC)
Re: WebSockets Amirouche Boubekki (06 Oct 2019 19:53 UTC)

Re: peer-to-peer Amirouche Boubekki 05 Oct 2019 14:43 UTC

Le sam. 5 oct. 2019 à 14:24, Amirouche Boubekki
<xxxxxx@gmail.com> a écrit :
>
> I know many pieces are missing in the SRFI and R7RS standards. I
> think, it would be nice to have in SRFI.

I meant to write that many other SRFIs should or could be written
before this one. Here is the one I can think of:

- UDP or TCP or Web sockets
- Binary scheme expressions
- Kbucket datastructure

The (python) prototype rely on UDP with packets that must not be
bigger than 1Mb because I read that it is not reliable to send big
messages over UDP. The reason to use UDP is that it is easy to use.

>
> A markdown document is available at [0].
>
> # peer-to-peer
>
> ## Status
>
> Early Draft
>
> ## Abstract
>
> This library describe the primitives for a peer-to-peer network that
> can be used to implement networked applications that do not require a
> central server to work. It offers the building blocks to create
> decentralized applications.
>
> ## Rationale
>
> It is very schemey to operate toward a common goal in a distributed
> fashion. This library brings the distributed social characterics of
> Scheme to Scheme programs.
>
> Many Scheme implementations provides network support. This library
> will allow them to exercise those capabilities while allowing them to
> create something bigger than what they could achieve alone. It offers
> a unique opportunity for Scheme implementations to cooperate toward a
> common goal while all will keep their defining characteristics.
>
> It is also an opportunity to bring fresh air to the Internet. This
> library offers new possibilities to create networked applications in a
> decentralized way that could spur creativity.
>
> ## Specification
>
> All communication between peers is secured using [X25519 key
> exchange](https://en.wikipedia.org/wiki/Curve25519). Moreover the
> specifications prescribe the use of Ed25519 signing when using
> namespaces.

Those primitives are available in R6RS Scheme Industria.

> The network protocol is based on serialized scheme expressions
> compatible with `write` prefixed with the size of the serialized
> expression in network byte order (TODO: big or little?). Because the
> network can not be trusted the procedure used to parse messages from
> the network must be safe.

This could put to good use binary s-exp.

>
> The maximum size of serialized scheme expression transmitted over the
> network know as message is defined to be 1 megabyte.
>

This is an arbitrary value. That limitation requires lots of bookeeping to make
sure that network procedures do not try to reply with a big message...

> The protocol identifies peers with bytevectors of 32 bytes and use XOR
> metric to find nearest peers similar to what kademlia paper describe
> and similar to the "mainline dht".

> The protocol doesn't dictate a particular datastructure for the routing table.

This is where the KBucket datastructure comes in. See [1] and [2] for example
implementation.

[1] https://github.com/jech/dht
[2] https://github.com/bmuller/kademlia/blob/master/kademlia/routing.py

This is done in the prototype with the following function:

def nearest(k, peers, uid):
    """Return K nearest to to UID peers in PEERS according to XOR"""
    return nsmallest(k, peers, key=functools.partial(operator.xor, uid))

> Peer identifiers are refered as `uid` pronounced "weed".
>
> A peer initiating a request must wait for a reply at most 5 seconds.

This is a regular protection. BUT It must be taken into account when
computing the nearest peers as shown above because if one does not
rely on kbucket the complexity becomes a limiting factor. Using
CPython, it start to become problematic when the routing table has 10M
peers:

In [17]: peers = list(range(10_000_000))

In [18]: random.shuffle(peers)

In [19]: %timeit nearest(10, peers, 1)
1 loop, best of 5: 2.45 s per loop

> The protocol is not compatible with existing distributed hash tables.
>
> The default replication factor is 5. The maximum replication factor
> that must be respected globaly is 20. TODO: explain what it is.

kademlia paper describes two constants k and alpha. Those are merged
into "replication factor". I may need some advice about that.

>
> This library defines the following disjoint type: `peer`
>
> ### Public procedures
>
> #### Initialization
>
> ##### `(peer uid recv send replication) → peer`
>
> Returns a peer object identified as `UID` bytevector. `UID` must be a
> randomly chosen bytevector of 32 bytes. `RECV` is thunk that allows to
> retrieve incoming payload as a scheme object. `SEND` takes ip string,
> port number and scheme object payload as argument and allow to send to
> the given address the given payload. It is an error if the payload
> serialize to a bytevector bigger than 1Mb. `REPLICATION` must be at
> least 5 and at most 20.
>
> A peer must remember its own `UID` for future use.
>
> ##### `(peer-connect peer ip port) → boolean`
>
> This will make sure that `PEER` knows about the peer at `IP` and
> `PORT`. The peer must send a `iam` network procedure. Return `#t` on
> success. Otherwise `#f`.
>
> ##### `(peer-bootstrap peer) → unspecified`
>
> `PEER` will initiate a bootstrap algorithm that aims at filling the
> routing table based on known peers with other peers in its surrounding
> and peers from the rest of the 256 bits space. It must rely on `peers`
> network procedure described in the next section to discover new peers.
>

I did not understand the kademlia paper algorithm to populate the routing table.

> #### Distributed-Hash-Table procedures

This is the "usual" datastructure described in kademlia paper.

> ##### `(peer-set peer value) → bytevector`
>
> Set `VALUE` bytevector in the network at the nearest peers to the
> SHA-256 of `VALUE`. Returns the SHA-256 of `VALUE` as bytevector.
>
> ##### `(peer-ref peer key) → bytevector`
>
> Returns the value associated with `KEY` bytevector of 32 bytes in the
> network. The returned value must have `key` as sha256.

TODO: add peer-ref-at procedure.

Small explanation about the `foo-at` procedures.

In a DHT network, key-value pairs are distributed in the network of
peers according to the distance between KEY and the peers' UID. The
nearest peers to KEY have the responsability to store the key-value
association. That means that a given pair, is stored at the initiatior
of the (peer-set peer value) request and at the peers that are near
(hash value) key and prolly many other peers that are far from the
"nearest peer" based on replication factor. In some occasion, and in
order to make the network more resilient to failures (network
partition, churn etc...), the foo-at procedure allow to bypass the
"routing table" and request a value at a specific peer. For some
reason, the user of this library, might know that a specific peer has
a given value, because for instance, that peer has "favorited" it.
Hence the foo-at procedures.

> #### Bag procedures

Maybe bag is not a good name because it is R7RS thing. It associates a
KEY to a set of bytes. So, maybe multiset is a better name...

> ##### `(peer-bag-publish peer key value) → unspecified`
>
> Adds `VALUE` bytevector of 32 bytes to the bag in the network
> associated with `KEY` bytevector of 32 bytes. Bag values can be freely
> added by any peers.
>
> ##### `(peer-bag-popular peer key) → vector`
>
> Return most popular values associated with `KEY` bytevector of 32
> bytes.
>
> ##### `(peer-bag-popular-at peer uid key) → vector`
>
> Return most popular values associated with `KEY` bytevector of 32
> bytes at the peer known as `UID`.
>
> ##### `(peer-bag-recent peer key)`
>
> Return most recent values associated with `KEY` bytevector of 32
> bytes.
>
> ##### `(peer-bag-recent-at peer uid key)`
>
> Return most recent values associated with `KEY` bytevector of 32 bytes
> at the peer known as `UID`.
>
> ##### `(peer-bag-sample peer key)`
>
> Return random values associated with `KEY` bytevector of 32 bytes.
>
> ##### `(peer-bag-sample-at peer uid key)`
>
> Return random values associated with `KEY` bytevector of 32 bytes
> at the peer known as `UID`.
>
> #### Namespace procedures

Given a private-key and public key, the following procedures allow the
DHT to store identities in way that makes it not possible for
malicious peers to change the value.

> ##### `(peer-namespace-set peer private-key public-key key value)`
>
> Set in the network `VALUE` bytevector of 32 bytes associated with
> `PUBLIC-KEY` and `KEY` bytevector of 32 bytes.  `PRIVATE-KEY` must be
> used to compute the signature of `KEY` and `VALUE` as described later
> in `namespace-set` network procedure.
>
> ##### `(peer-namespace-ref peer public-key key signature)`
>
> Retrieve from the network the value associated with `PUBLIC-KEY` and
> `KEY`. `KEY` and the returned value must be validated using
> `PUBLIC-KEY` and `SIGNATURE`.
>
> ##### `(peer-namespace-ref-at peer uid public-key key signature)`
>
> TODO
>
> ### Network procedures
>
> The following procedures are encoded as Scheme vectors. The procedure
> name is encoded as symbol inside the vector. Both request and replies
> must not exceed 1 megabyte when encoded. If a message payload exceed 1
> megabyte, the connection must be closed.
>
> #### `iam uid`
>
> A peer receiving this message must reply with a vector with the
> following values:
>
> - `welcome` symbol
> - IP as a string (TBD)
> - port as an integer
> - `uid` as an integer
>
> This allows peers to discover their external IPs. The receiving peer
> may or may not add the initiating peer to its routing table and
> initiate a `iam` network procedure.
>
> #### `peers key`
>
> `KEY` must be a 256 bits number. A peer receiving this message must
> reply with a vector starting with `peer` followed by at most 20 (ip,
> port) pairs that are the peers nearest to `KEY` known to the receiving
> peer.
>
> #### `set value`
>
> The peer receiving this message must first check that it is part of
> the nearest peers to the associated sha256 key of `VALUE` bytevector
> based on the global replication factor. If the receiving peer knows
> about peers that are more near the key, it must reply `#f` otherwise
> it must store the `VALUE` bytevector and reply `#t`.
>
> #### `ref key`
>
> The peer receiving this message has two options for the reply:
>
> - If it has the value associated with `KEY` it can return a vector
> with two values the first being `value` symbol and the second the
> bytevector associated with `KEY`.
>
> - If it doesn't know about `KEY` it must return nearest known peers as
> described in the above `peers` network procedure.
>
> #### `bag-add key value`
>
> The peer receiving this message must associate `KEY` to `VALUE` except
> if it knows about peers that are more near `KEY` than itself and the
> receiving peer is not part of the nearest peers. Similarly to `set`.
>
> #### `namespace-set public-key key value signature`
>
> TODO
>
> #### `namespace-ref public-key key`
>
> TODO
>
> ## Implementation
>
> Inspired from [qadom](https://github.com/amirouche/qadom) and gnunet.
>
> ## Acknowledgements
>
> Credits goes to gnunet for keeping the lights on. This work is also
> based on kademlia paper by X, Y, Z.
>
> [0] https://github.com/scheme-live/peer-to-peer#peer-to-peer