This page describes the configuration of a cluster of computers dedicated to scientific computing, with focus on parallel computation. The computers are hosted by Faculdade de Ciências da Universidade de Lisboa; they have been acquired by Centro de Matemática, Aplicações Fundamentais e Investigação Operacional (FCT project UID/MAT/04561/2020) and by Grupo de Física Matemática.
This document describes the steps I followed for configuring the cluster.
A large part of it is only relevant for system administrators.
Parts which are relevant for users of the cluster are highlighted by a
)
is reachable from outside;
the other five (hereby designated by
s)
are connected to a local network and communicate with the outside world through
(which acts as a gateway).
All nodes run archlinux
.
There was a previous installation using Ubuntu
.
Click on a triangle to open or close the corresponding paragraph.
For installing For booting The only original part was that I decided to keep the As arguments to Configuring the network was not trivial.
I chose to use Each machine has several I started by installing Then, in the installed system (after If the For If you want Note the name of the file : In order for Then, enable the services (they will be effectively started after reboot) :
Replace On each Then, in the installed system (after If you want Then, enable the services (they will be effectively started after reboot) :
Replace Some applications ( Although the live On the We must also ensure the The command On the Somewhat misleadingly, the command Each machine has a 895Gib disk.
I have reserved 200Gib for the operating system and 15Gib for a We intend to use Installing and configuring On the server side (on the then edit Then On the client side, I did not implement One has to be careful about the user IDs.
If you have two users on different nodes with the same username but different IDs,
they will be unable to access their home directory through In order to save bandwidth and installation time, I want to share For normal updates, I share I use a "temporary" cache directory Of course all the above is not performed by hand; rather, it is done through a
python script.
While working on the script for moving around package files
(see section " Suppose there is a file If If I don't know if this is intended behaviour of command Below is a graphic representation of the execution time for several identical (single-threaded) processes
launched simultaneously on one machine only.
We see a linear growth for more than 40 processes (serial behaviour).
We also see a drop in performance at 20 processes.
Recall that each machine has 20 cores, 40 threads.
So I wrote a script
for propagating user accounts and passwords across the nodes of the cluster.
It is written in Edit The same script
can be used by the system administrator to add new users :
There are no quota.
The installing
archlinux
archlinux
it was enough to follow the
instructions.
archlinux
I chose, in the BIOS
,
efi
boot
.
I formatted the hard disk with a gpt
label
and installed GRUB
on a partition of 500Mib
(looks too big at first sight, but actually GRUB
occupies 220Mib).
After this, booting was smooth.
pacman
cache
on the USB
stick (where the live system boots from).
To do this, when burning the archlinux
image (using rufus
)
I chose a high value for
persistent
partition
size
;
this left a lot of free space on the stick;
on that free space I created (using fdisk
) a second (ext4
) partition.
In the live system, I mounted this second partition on /mnt/cache
.
Then I specified in /etc/pacman.conf
the cache directory
/mnt/cache/pacman/pkg/
and added the -c
option to
the pacstrap
command in order to use
the local cache rather than the cache on the installed system.
This way, when installing archlinux
on the first machine the cache gets populated
and when installing on subsequent machines the package files are already in the cache.
This saves time and bandwidth.
pacstrap
, include all packages needed for a minimally
functional system, like nano
, grub
, efibootmgr
,
intel-ucode
, sudo
, openssh
.
configuring the network
systemd_networkd
and systemd_resolved
as network managers;
they are included in the systemd
package so they are installed by default.
ethernet
ports;
on the
I used two of them;
eno1
links to a plug in the wall and to the outside world,
eno2
links to a switch and provides connectivity to the local network
of
s.
archlinux
on the
alpha_node
archlinux
system, we need access to the internet :
ip address add
/etc/resolv.conf
nameserver 1.1.1.1 # or some other server
arch-chroot
), create
/etc/systemd/network/20-wired-outside.network
[Match]
Name=eno1
[Link]
RequiredForOnline=routable
[Network]
DHCP=yes
IPv4Forward=yes
[DHCPv4]
ClientIdentifier=mac
DHCP
service on your network does not exist or does not work properly,
you can use a fixed IP
number instead:
/etc/systemd/network/20-wired-outside.network
[Match]
Name=eno1
[Link]
RequiredForOnline=routable
[Network]
Address=
eno2
, I created
/etc/systemd/network/20-wired-local.network
[Match]
Name=eno2
[Network]
Address=192.168.1.1/24
IPv4Forward=yes
root
to be able to login through ssh
, add
/etc/ssh/sshd_config
PermitRootLogin yes
sshd_config
, not ssh_config
!
Beware, this opens the door to cyberattacks; remove this line as soon as you
implement a different way to login remotely as administrator;
until then, choose a strong password for root
.
That different way to login remotely could be creating a regular user
and adding it to the wheel
group,
then use sudo
to gain super-user privileges.
to work properly as a gateway,
we must enable NAT
forwarding.
Create
/etc/sysctl.d/30-ipforward.conf
net.ipv4.ip_forward=1
/etc/systemd/system/nat.service
[Unit]
Description=NAT configuration for gateway
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/bin/iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
ExecStart=/usr/bin/iptables -A FORWARD -i eno2 -o eno1 -j ACCEPT
ExecStart=/usr/bin/iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
systemctl enable systemd-networkd
systemctl enable systemd-resolved
systemctl enable nat
systemctl enable sshd
enable
enable
--now
chroot
).
Commands like systemctl
status
systemctl
status
journalctl
-xe
, the configuration is far simpler.
I only used eno2
.
On the live archlinux
system, we need access to the internet
(using
as gateway) :
ip address add 192.168.1.
/etc/resolv.conf
nameserver 1.1.1.1 # or some other server
arch-chroot
),
/etc/systemd/network/20-wired-local.network
[Match]
Name=eno2
[Network]
Address=192.168.1.
root
to be able to login through ssh
,
follow the steps described above for the
.
systemctl enable systemd-networkd
systemctl enable systemd-resolved
systemctl enable sshd
enable
enable
--now
chroot
).
NTP
(network time protocol)make
for example) rely on the timestamp of
files in order to work correctly.
Thus, it is important that the machines composing the cluster have the clocks
synchronized (otherwise, sharing file through NFS
could stir up trouble).
NTP
is a good solution for that.
Actually, NTP
does more than we need.
It synchronizes the clock of your machine with a universal time provided by timeservers
scattered across the world, and it does it with a very high precision.
What matters for us is sychronization between our machines only.
Anyway, NTP
does the job.
archlinux
comes with systemd-timesyncd
enabled,
on the installed archlinux
we must choose and install a service providing
NTP
.
I installed ntp
;
it provides a client which gets the time from an exterior server
and also a server, used by the
s:
pacman -Syu ntp
systemctl enable --now ntpd
ntpd
service waits for the systemd-networkd
service at boot. Create
/etc/systemd/system/ntpd.service.d/wait-for-network.conf
[Unit]
After=network-online.target
Wants=network-online.target
systemctl
status
ntpd
kernel
reports
TIME_ERROR:
0x41:
Clock
Unsynchronized
;
looks like this message can be safely ingored.
Check with ntpq
-p
s I installed systemd-timesyncd
which only provides a client.
It gets the time from the
:
/etc/systemd/timesyncd.conf
NTP=
systemctl enable --now systemd-timesyncd
systemd-timesyncd
handles graciously network failures, so the next step
is optional : create
/etc/systemd/system/systemd-timesyncd.service.d/wait-for-network.conf
[Unit]
After=network-online.target
Wants=network-online.target
timedatectl
status
NTP
service:
active
if we use systemd-timesyncd
but
answers NTP
service:
inactive
if we use ntp
.
disk partitions
swap
partition.
/nfs-home
on one machine as home directory;
it will be available to other machines through NFS
.
I have chosen the
to keep /nfs-home
.
So, on the
I mounted /nfs-home
on a 300Gib partition;
/sci-data
is mounted on a 380Gib partition.
On each
, the /nfs-home
directory from
is visible through NFS
and
/sci-data
is mounted on a local 680Gib partition.
See section "intended disk usage" below.
NFS
(network file system)NFS
was not that difficult.
) :
pacman -Syu nfs-utils
mkdir /nfs-home
/etc/exports
.
I listed all
s in one line :
/etc/exports
/nfs-home
systemctl
enable
--now
nfsv4-server
/nfs-home
, e.g. by invoking
useradd
-b
or -d
(or edit /etc/default/useradd
).
pacman
-Syu
nfs-utils
mount
/nfs-home
/nfs-home
becomes invisible until umount
(as happens with any mount
operation).
To mount /nfs-home
through NFS
automatically at boot time,
you should edit /etc/fstab
.
kerberos
authentication.
NFS
.
See paragraph "user accounts".
s will see /nfs-home
through NFS
, disk access will be rather slow on this directory.
Thus, users are encouraged to keep large files on local storage, under /sci-data
.
A folder /sci-data/
exists for that purpose on all machines.
Configuration and preferences files should be kept in
/nfs-home/
of course;
this is useful for defining your preferences throughout the cluster.pacman
's cachepacman
's cache
among the computers composing the cluster.
During the initial installation of archlinux
I kept the cache on the USB
stick, as explained in section "installing archlinux
" above.
pacman
's cache through NFS
.
However, this is not a trivial process because the package files should be owned by
root
and this is not compatible with NFS
' philosophy
(I decided against specifying the no_root_squash
option).
/nfs-home/cache
, owned by a regular user.
Each update operation is initiated on the
and uses the usual cache directory /var/cache/pacman/pkg
.
Before calling pacman
-Syu
/var/cache/pacman/pkg
; after the update has finished on the
we list again all files there
and copy the new ones (not previously present) to /nfs-home/cache
.
Old versions of package files in /var/cache/pacman/pkg
are deleted
using the command paccache
-rk1
, we copy all files
from /nfs-home/cache
to /var/cache/pacman/pkg
,
ensuring that the copied files are owned by root
.
We then perform the update; in theory, there is no need to download any package file.
After updating the
we delete all files
in /var/cache/pacman/pkg
(which thus stays empty most of the time).
After all
s have been updated,
the folder /nfs-home/cache
is also emptied.
moving and copying files with different owner
pacman
's cache" above), I have noticed something peculiar
about the commands mv
and cp
.
file-1.txt
belonging to user-A
.
Suppose another user, user-B
, goes to that directory.
Suppose user-B
has read and write permissions on the directory and on
file-1.txt
.
user-B
issues the command mv
file-1.txt
file-2.txt
file-2.txt
will belong to
user-A
. This is true independently of whether a file file-2.txt
exists (previously to the mv
operation) or not.
user-B
issues the command cp
file-1.txt
file-2.txt
cp
operation.
If a file file-2.txt
existed previously and belonged to user-A
,
then it will belong to the same user-A
after the cp
operation,
although with a new content.
If no file named file-2.txt
existed previously, it will be created and will
belong to user-B
.
cp
or is some sort of bug.
top
).
python3
and uses
fabric
.
cluster
password
passwd
command if you want different passwords on
different nodes, but why would you want that ?/etc/default/useradd
if you want a default home directory different from
/home
(in our case, /nfs-home
).
cluster
add
user
cluster
delete
user
user
, not the
we want to add or delete;
the script is interactive and will ask for information at the prompt.
The script creates the directory /nfs-home/
only on the
;
on the
s
the user will see their home directory through NFS
.
In contrast, the folder /sci-data/
is created on all machines.
If you are careful not to add/delete user accounts through other means than the above commands,
the user (and group) IDs will be the same across nodes (this is important since
/nfs-home
is seen through NFS
).
linux
kernel is updated frequently, thus requiring frequent reboots,
which is rather annoying.
Cristian Barbarosie, 2025.05