Alioth has become an important part of the Debian infrastructure in
recent years; it has been used by more and more people and teams inside Debian, as well as by some upstream projects. This growth in usage wasn’t closely followed by an increase in resources, however, and the server hosting Alioth was getting more and more overloaded as time passed. It was time for Something to be Done. My name is Roland Mas, and I’m your host for this report of the Something that got Done.
The admins of Alioth (that would be Stephen Gran, Tollef Fog Heen and
yours truly) got bold and submitted a proposal for a sprint to the
Debian project leader. Zack, being the cunning DPL that he is, promptly
agreed to it, and there was no way we could renege on our proposal. We
tried to invite others to join us, but basically nobody fell for it, and
so the three of us got together in sunny Cambridge, England from the
20th of May to the 22nd. We were provided with our basic requirements
(meeting room, power, networking, whiteboard and coffee) by Collabora,
which we would like to thank for their hospitality.
We started by getting our hands on vasks.debian.org, setting it up
with Squeeze, and copying most of the data over from the old Alioth
(which was hosted in a Xen domU on wagner.d.o). Actually, we started by
stopping all services on old-alioth, in order to free some I/O bandwidth
for the data transfer; even though, that took quite some time.
Old-alioth was just across the Ethernet switch, but its disk setup was
very sub-optimal. Once the data transfer was started and we had removed
every bottleneck we could have an influence on, we got down to setting
up the FusionForge instance. (A version based on the 5.1 upstream
branch, with a few Alioth-specific patches.) The database and web
interface were mostly operational by the time we called it a day.
We left the data transfer to its own devices, and adjourned to the
nearest curry house. There was pub time afterwards, although you’ll
have to get the details from Tollef and/or Stephen, because your editor
decided he needed sleep more than alcohol.
Saturday morning. The data transfer was done, so we got down to
serious sysadminning and bikeshedding about names and sizes for LVM
volumes, hostnames, where each hostname would point, what the URLs would
look like, and so on. Once that was decided, I, being the official
FusionForge guy, focused on fixing the problems we encountered on that
front (on vasks.d.o) while my honoured colleagues wondered how to do a
remote reinstallation of wagner.d.o without a remote console. I wasn’t
the most attentive of spectators, being out-quirking PHP quirks at that
time, but from what I got it was akin to sawing the branch you’re
sitting on while it was suspended (with bits of string) from the branch
above it, which you’re then going to saw off too. And there are spikes
on the snake-infested ground. Anyway, after some deep magic, Qemu,
three levels of Grub chainloading, deconstruction of running RAID arrays
hosting root filesystems and all, wagner.d.o was running Squeeze too, no
longer virtualized, and it survived reboot. So we started setting up
the parts of Alioth that would run on wagner.d.o; namely, the read-only
anonymous access to SCM repositories, the repository browsers, and the
project websites. We also got a visit from Bazaar developer Jelmer
Vernooij, whom we failed to task with interesting jobs so he ended up
working on backports of Bazaar-related packages for us; these will help
keeping Alioth responsive, so many thanks to him.
Then it was Saturday evening, and even I couldn’t weasel out of the
traditional British occupation for the night. Beer and pub food were
had and enjoyed, for which we were joined by local Debian release
manager Neil Williams. Discussed politics, cat-herding, and “if you’re
lucky enough to look under 21 you will have to prove you’re over 18″
Sunday morning happened, as it usually does, and the three of us
reconvened; we thought we’d firmly attach the last few remaining
dangling ends, but there were more of them than we thought. NFS mounts,
DNS hacks, HTTP proxying, SMTP configuration, bug-fixing all over the
place, and so on. We gradually opened up the services again, fixed
stuff as we got notified of it on IRC, sent status emails, yada yada.
Lunch^WFinal debriefing was had in a pub with Colin Watson and family,
and interspersed with discussions of some aspects of WWII-era history
and a behavioral study of the rail replacement bus and its predators in
the wild. And then it was time to go back to our respective homes and
The following is a rough description of the new setup. vasks and
wagner both run Squeeze directly (no virtualization); both have roughly
the same amount of disk space (around 500 GB after RAIDing); vasks has
slightly faster CPUs (four 3 GHz cores compared to four 2.2 GHz ones),
wagner has a bit more RAM (16 GiB instead of 6). The load is therefore
split so that the “real-time” tasks (SCM access for developers,
FusionForge web interface, database) run on vasks, while the “lower
priority” tasks (SCM repository browsers, projects’ websites, email,
local cronjobs and whatever random stuff run on Alioth) are on wagner.
The projects’ “Sources” tab in the FusionForge web interface should give
correct URLs for the repositories of various kinds, for read/write
developer access and for read-only anonymous access. After a week of
running, it seems the benefits are apparent, and the load average is
down to very reasonable levels on both hosts.
There are still some things to fix or amend: it would be nice to
preserve the old URLs as much as possible, some synchronization of homes
across the servers would be desirable, we didn’t necessarily reinstall
all the packages that used to be there on old-alioth, and so on.
While we’re on this subject: we’d like to take this opportunity to
remind our users not to consider Alioth as a generic and infinitely
elastic hosting service. Please be considerate on what you run there,
and on the amount of disk-space you use. The disks are large enough
that we don’t meet the limits quite yet, but there’s some data that is
clearly outdated and very probably useless. ISO images of multiple
daily CDs, however small, aren’t necessarily bad, but keeping them for
years can’t be right. Ditto for years-old tarballs of the SCM
repositories, and for 2004-era package repositories, and so on. We’re
considering setting up an way to mark some files as expirable and
automatically remove them after a while, but in the meantime you might
want to have a look at your data and clean up what’s obsolete.
A few final words: we would like to apologize for the poor
communication and the lack of reminders that the sprint was going to
happen; we’d also like to thank the DPL and Collabora for respectively
triggering and hosting this sprint. More apologies for the continued
inconvenience as we keep on fixing glitches; and more thanks for bearing
with us in the meantime and reporting problems. We even received
patches fixing some of the problems, for which we’re very grateful.
Please drop by and say hi on #alioth on IRC. Only don’t be *too*
helpful, otherwise we might be tempted to add you to the team
Thank you for reading so far. This report was sent to you on behalf
of the Alioth admin team,
Tollef Fog Heen,
and Roland Mas.