Re: Adding another Scheme.org server + better compartmentalization Amirouche (28 Nov 2023 18:18 UTC)

Re: Adding another Scheme.org server + better compartmentalization Amirouche 28 Nov 2023 18:18 UTC


On Friday, November 24th, 2023 at 20:45, Lassi Kortela <xxxxxx@lassi.io> wrote:

>
>
> The main server running most Scheme.org subdomains has been at full
> capacity for a while.
>
> For example, if you've witnessed Gitea crashing, it's because the
> Postgres database runs out of RAM (and swap) at intervals. This does not
> cause data corruption, but does cause service disruption.
>
> A lot of the resource usage is due to botnets hitting the site, which is
> probably unavoidable on today's internet. This also causes log bloat.
> It's common for one log file to take nearly a gigabyte of disk. We have
> some log rotation, but apparently not enough. To avoid disruption, logs
> should probably be stored on a dedicated file system.

Dedicated file system? Like what? You could ignore 404 errors, that should
save some space, or redirect 404 to a dedicated log, that rotated, deleted
more regularly to still have some logs about the 404.

Or you can use fail2ban to monitor nginx logs, and ban disrupting ips.

hint: https://www.digitalocean.com/community/tutorials/how-to-protect-an-nginx-server-with-fail2ban-on-ubuntu-20-04#step-2-configuring-fail2ban-to-monitor-nginx-logs

> Even if the above is taken care of, the main server still isn't beefy
> enough to host everything. I'd like to take this opportunity to advance
> the "microkernelization" of Scheme.org (as explained in the original
> announcement) by keeping the front page and some administrivia on the
> current server, and moving the community subdomains to a new server.

Scaling that way is very painful depending on the growth of the community.
My favorite approach, given the team is:

- two containers running in different hosts for gitea, and a third host for the database;
- the static asset should be present their own hosts

> This would also make server configuration and git repos easier to figure
> out, as the main server would correspond to the github organization
> https://github.com/schemeorg and the community servers would correspond
> to https://github.com/schemeorg-community. The current layout is quite
> confusing.

FWIW, I should have showed up earlier, but sub.scheme.org should be dedicated
to community website like https://js.org/, and small-web.org/. Everything that
the core scheme.org community maintains, unlike:

  https://community.scheme.org/

Should be in scheme.org/path/to/perfect/community

The historical reason, is that SEO threat subdomains as different orgs, possibly
with a less good reputation than toplevel domain circa 2000 that was the case with
blogger and wordpress blogs. And that hurts indexation. Also, recently, there is
a couple of website that avoid completely subdomains, most GAFAM do that.
Another case is sr.ht that bring up the tilde e.g. https://git.sr.ht/~rabbits/

That is an interesting case because they do expose subdomains for "technical" reasons: microservices.

> Scheme.org is fundamentally DNS-based, git-sourced, and automated using
> Ansible, so it is quite easy to experiment with different servers and
> move subdomains between them with little disruption.

It is unclear what is the current setup.

>
> Let me know if there are any suggestions or objections. If I don't hear
> anything by the end of the weekend, I'll go ahead with the plan. I
> emphasize that this is just a server issue and should not cause
> user-visible changes.
>
> As before, ssh and git access is available to people who'd like to work
> on the site.