However, eventually it has to leave your machine: has to run on your colleague's machine,
or deployed in its production environment.
It can be a completely different flavour of OS, with a different set of libraries
and supporting tools.
It can be difficult to test if you accounted for all those variations
on your own development system.
You may have things in your environment you're not even aware of that make a difference.
Your users could also be less technically inclined to deal with dependencies.
You may wish to decrease this friction.
---
# Problem (for users)
Suppose you want to run some piece of software.
First off, you really would like some sort of turn-key solution.
Of course there's none, there's only the source code.
The build instuctions indicate 5-years-old out of date libraries
on top of a similarly old OS distribution.
And no, the original developer is most certainly no longer available.
You also don't trust this software fully not to mess up your OS.
Or, you want to run it on a remote server for which you don't
even have the privileges to comfortably install all the dependencies.
---
# Problem (for researchers)
Suppose you have a piece of scientific software you used
to obtain some result.
Then someone half across the globe tries to reproduce it,
and can't get it to run, or worse - is getting different
results for the same inputs. What is to blame?
Or, even simpler: your group tries to use your software a couple
of years after you left, and nobody can get it to work.
For a reproducible way to do science with the help of software,
packaging just the source code might not be enough;
the environment should also be predictable.
---
# Problem (for server administrators)
Suppose you have a hundred of users, each requesting certain software.
Some of it needs to be carefully built from scratch,
as there are no prebuilt packages.
Some of the software works with mutually-incompatible library versions.
Possibly even known-insecure ones.
Any such software change has to be injected in
a scheduled maintenance window, but users want it yesterday.
And finally, _you most certainly don't_ trust any of this software
not to mess up your OS. From experience.
---
# What would be a solution?
* **A turnkey solution**
A recipe that can build a working instance of your software, reliably and fast.
* **BYOE: Bring Your Own Environment**
A way to capture the prerequisites and environment together with the software.
* **Mitigate security risks**
Provide a measure of isolation between the software running on a system.
No security is perfect, but some is better than none.
---
class: center, middle
# The Solution(s)
---
# Solution: Virtual Machines?
A virtual machine is an isolated instance of a .highlight[whole other "guest" OS] running
under your "host" OS.
A .highlight[hypervisor] is responsible for handling the situations where this isolation
causes issues for the guest.
From the point of view of the guest, it runs under its own, dedicated hardware.
Hence, it's called .highlight[hardware-level virtualization].
Most<sup>*</sup> guest/host OS combinations can run: you can run Windows on Linux,
Linux on Windows, etc.
------
\* MacOS being a complicated case due to licensing.
---
# Virtual Machines: the good parts
* **The BYOE principle is fully realized**
Whatever your environment is, you can package it fully, OS and everything.
* **Security risks are truly minimized**
Very narrow and secured bridge between the guest and the host means little
opportunity for a bad actor to break out of isolation
* **Easy to precisely measure out resources**
The contained application, together with its OS, has restricted access to
hardware: you measure out its disk, memory and alotted CPU.
---
# Virtual Machines: the not so good parts
* **Operational overhead**
For every piece of software, the full underlying OS has to be run,
and corresponding resources allocated.
* **Setup overhead**
Starting and stopping a virtual machine is not very fast,
and/or requires saving its state.
Changing the allocated resources can be hard too.
* **Hardware availability**
The isolation between the host and the guest can hinder access to
specialized hardware on the host system.
---
# Solution: Containers (on Linux)?
If your software expects Linux, there's
a more direct and lightweight way to reach similar goals.
Recent kernel advances allow to isolate processes from the rest
of the system, presenting them with their own view of the system.
You can package entire other Linux distributions, and with the exception
of the host kernel, all the environment can be different for the process.
From the point of view of the application, it's running on the same hardware
as the host, hence containers are sometimes called
.highlight[operating system level virtualization].
---
# Containers: the good parts
* **Lower operational overhead**
You don't need to run a whole second OS to run an application.
* **Lower startup overhead**
Setup and teardown of a container is much less costly.
* **More hardware flexibility**
You don't have to dedicate a set portion of memory to your VM
well in advance, or contain your files in a fixed-size filesystem.
Also, the level of isolation is up to you. You may present devices
on the system directly to containers if needed.
---
# Containers: the not so good parts
* **Kernel compatibility**
Kernel is shared between the host and the container,
so there may be some incompatibilties.
Plus, container support is (relatively) new, so it needs a recent kernel
on the host.
* **Security concerns**
The isolation is thinner than in VM case, and kernel of the host OS
is directly exposed.
* **Linux on Linux**
Containers are inherently a Linux technology. You need a Linux host
(or a Linux VM) to run containers, and only Linux software can run.
---
# History of containers
The idea of running an application in a different environment is not new to UNIX-like systems.
Perhaps the first effort in that direction is the `chroot` command and concept (1982):
presenting applications with a different view of the filesystem (a different root directory `/`).
This minimal isolation was improved in in FreeBSD with `jail` (2000),
separating other resources (processes, users)
and restricting how applications can interact with each other and the kernel.
Linux developed facilities for isolating and controlling access to some processes
with namespaces (2002) and cgroups (2007).
Those facilities led to creation of solutions for containerization, notably LXC (2008),
Docker (2013) and Singularity (2016).
---
class: center, middle
# Docker vs Singularity
## Why did another technology emerge?
---
# Docker
* Docker came about in 2013 and since has been on a meteoric rise
as the golden standard for containerization technology.
* A huge amount of tools is built around Docker to build, run,
orchestrate and integrate Docker containers.
* Many cloud service providers can directly integrate Docker containers.
Docker claims x26 resource efficiency improvement at cloud scale.
* Docker encourages splitting software into microservice chunks
that can be portably used as needed.
---
# Docker concerns
* Docker uses a pretty complicated model of images/volumes/metadata,
orchestrating swarms of those containers to work together,
and it not always very transparent with how those are stored.
* Also, isolation features require superuser privileges;
Docker has a persistent daemon running with those privileges
and many container operations require root as well.
--
Both of those issues make Docker undesirable in applications
where you don't wholly own the computing resource - HPC environments.
Out of those concerns, and out of scientific community, came Singularity.
---
# Singularity
Singularity was created in 2016 as an HPC-friendly alternative to Docker.
It is still in rapid development.
--
* It's usually straightforward to convert a Docker container to a Singularity image.
This gives users access to a vast library of containers.
--
* Singularity uses a monolithic, image-file based approach.
Instead of dynamically overlaid layers.
You build a single file on one system and simply copy it over or archive it.
This addresses the "complex storage" issue with Docker.
---
# Singularity and root privileges
The privilege problem was a concern from the ground-up,
to make Singularity acceptable for academic clusters.
--
* Addressed by having a `setuid`-enabled binary that can
accomplish container startup and drop privileges ASAP.
--
* Privilege elevation inside a container is impossible:
`setuid` mechanism is disabled inside the container,
so to be root inside, you have to be root outside.
--
* Users don't need explicit root access to operate containers
(at least after the initial build).
---
# Singularity and HPC
Thanks to the above improvements over Docker, HPC cluster operators
are much more welcoming to the idea of Singularity support.
As a result of a joint Pipeline Interoperability project between Swiss Science IT groups, the UniBE Linux cluser UBELIX started to support Singularity.
Once your software is packaged in Singularity, it should work across
all Science IT platforms supporting the technology.
---
# Singularity niche
When is Singularity useful over Docker?
* The major use case was and still is .highlight[shared systems]:
systems where unprivileged users need the ability to run containers.
However, an admin still needs to install Singularity for it
to function.
* Singularity is useful as an alternative to Docker. If you have admin privileges on the host, Singularity can do more
than in unprivileged mode.
It doesn't have the same level of ecosystem around it,
but currently gaining features such as OCI runtime
interface, native Kubernetes integration and own cloud services.
---
# Singularity "sales pitch"
Quoting from Singularity Admin documentation:
> _Untrusted users (those who don’t have root access and aren’t getting it) can run untrusted containers (those that have not been vetted by admins) safely._
This won over quite a few academic users; for a sampling: