Okay, sorry. Hi. So, yeah, so I'm Stefano. I will talk a bit about contentization with
a specific focus on scientific reproducibility. Sorry, I didn't include the graphics. I will
just start as a fresh reminder what are containers. This is a definition from CIO.com. So it's
pretty dumbed down, but I think it's very effective. So containers are a solution for
the problem of how to get software to run reliably when moved from one computing environment
to another. And this is also known as a dependency health problem, which we all know too well,
but just as a recap, the dependency health problem shows under two different takes, I
would say. The first one is Mike wants to use a new software. He cannot find a precompiled
version. Mike then asks or Google for help, and he gets some basic instructions like compile
it. At this point, Mike starts downloading all the development environment. And soon
he realizes he needs to upgrade or even downgrade his main operating system. He started doing
that, but during the process something went wrong, and then Mike spent the entire afternoon
in trying to fix his own primary operating system. And at the end, the software he really
wanted to install turns out it doesn't do what he hoped to do. And this is one take
of the dependency health problem. The other take of the dependency health problem is more
serious in my opinion, which is Mike wants to use this new software. Mike does find a
precompiled version. He then download and install it. Mike runs the software, and at
this point he gets an answer, which is 43. But after a year, Mike runs again the same
software and with the same data, he gets the results, 42, which is of course the right
answer. Now, the point is that at this point, Mike takes a very deep dive into the problem
and finds out that the library that he was using when using the software at the time
was using a wrong API code to a library because it changed. So he basically worked for a year
with a wrong software, actually with a software giving wrong results. And this is something
that actually happened and that generated quite a lot of rumour for chemical potentials.
So the dependency health problem, as any problem, has a solution spectrum. So we range from
properly stating requirements of the software to defining virtual environments where we
have those requirements automatically set up. We can also use typically linked binaries,
which to some extent are a very good solution, but unless you cannot use them like Python
or complex programs, for example. On the other side of the spectrum, we have virtual machines
with hardware emulation. In this case, you can reach full reproducibility because you
don't even have problems or big Alendian, little Alendian or CPU instruction sets. Then
we have a lighter version of the virtual machines, which are the ones we all know that work with
a hypervisor. And then we have this containerization thing in the middle, which is where we focus.
Now, a container engine is something slightly different than a virtual machine. You can
think about it as a virtual machine in first approximation, but it's different because
there is no hardware abstraction layer between the container engine and the host operating
system. So they share the kernel. So everything is like a virtual machine, except that they
use a kernel. They can also share dependencies. So here, for example, the dependencies for
those two containers are the same because of the incremental file system, the nature
of the container file system itself, which we will, I will pick a bit in next slide.
So this is a very important difference. And actually, there are three different solutions
for containerization. One is singularity, which you might have heard and which was mentioned
for the presentation before. And the other one is Docker. That is probably the most common
one. And they are at the specific size of this subspectrum because singularity is more
closer to a process, while Docker is way more closer to a virtual machine. And this is an
important trait. So let's see the differences between singularity and Docker, which again
are container engines. So singularity, it is a scientific computing program, let's say.
So it born the scientific computing and high-performance computing space, while Docker, it is the IT
industry standard. So from Amazon to Google and going through DeepMind. Singularity, in
singularity running containers are seen as processes, while in Docker running containers
Zugänglich über
Offener Zugang
Dauer
00:24:59 Min
Aufnahmedatum
2020-07-27
Hochgeladen am
2020-07-28 00:46:23
Sprache
en-US
Speaker
Stefano Alberto Russo, INAF
Content
The role of software containerization and example use of Docker containers.
The Workshop
The Workshop on Open-Source Software Lifecycles (WOSSL) was held in the context of the European Science Cluster of Astronomy & Particle Physics ESFRI infrastructures (ESCAPE), bringing together people, data and services to contribute to the European Open Science Cloud. The workshop was held online from 23rd-28th July 2020, organized@FAU.
Copyright: CC-BY 4.0