Continue!
rhetoric to be here with you today
to talk about the relevance of software-source code in research and open-source
But let's start by looking at software-source code
this is a very special kind of knowledge
a kind of unique in the
history of mankind because it is at the same time human-readable and
and technically enough to be executable on a machine.
This observation was already present, for example, in the introduction of a book by Harold Arbensohn,
Structure and Interpretation of Computer Programs, that I was using as a student at the university many, many years ago,
and he said the programs must be written for people to read and not incidentally for machines to execute.
Well, of course, you can think of this as just a professor telling the students, if you turn in a program,
I cannot understand, you will get a bad grade, but it is much deeper than that.
A program is something that is able to evolve, that may contain bugs, may need to be adapted,
so this means that it is not enough to write it once, you need to read it again,
understand what is going on, to actually make modifications to it.
When this observation was made, there was not much software source code available on the planet,
but 30 years later, after the free software open source movement, we had tons of software source code available.
It can go as bad as the Apollo 11 source code or more recent incredible pieces of source code,
which are written to design functions using games.
I do not have time to go through that case here, I will let you just look at the source code by yourself.
That was an incredible amount of knowledge which is available to everybody today.
And as Len Shustek, who is the board director of the Computer History Museum, wrote some years ago,
I mean, source code for us opens a view into the mind of the designer.
That's the reason why it is so important.
Well, but software source code, if you think about it seriously, it is really special, it is really not data.
Software evolves over time, you have projects that can last decades,
and the history of the development of the software project is fundamental to understand what is going on.
Software is complex, I mean, you have software projects which have millions of lines of code,
but even if you just use one line of code, for example, the picture you see here in the slide is
just a diagram of the dependency that you pull in when you just type import matplotlib in a Python interpreter.
So even if you write a line of code to run that line of code, you may depend on tons of dependencies, external dependencies.
And this means you depend also on sophisticated developer communities, which brings me to my next point.
I mean, even if you are only interested in research software,
well, you will need to understand the research software, just a tiny bit of the software world.
Actually, even to run a software program which is developed by researchers,
you need to rely on layers upon layers of different software components
which have been developed by industry and communities that actually drive the standard
and provide this essential support for what we do.
And let's move even forward. I mean, I insist, software is not just a piece of data, it's not just a sequence of data,
it's really special. When we talk about software, for example, just think for a moment.
We have so many different ways of talking about software because it depends on what we are talking about, really.
There are issues about version and granularity.
For example, I may write on a report or on a newspaper article that in RIA created the OCaml programming language and the Scikit-Fer toolkit.
Or you can find in an article something that says the two-dimensional support for Voronoi diagrams
has been introduced in the Computational Geometry Algorithm Library, starting from version 3.1.0.
Here we are talking about releases.
But then sometimes when you want to actually reproduce a particular experiment,
a release is kind of a not so well defined thing, you want to pinpoint the precise state of the project.
You may write down something like this result was produced using this precise commit identified by this hash code here.
Zugänglich über
Offener Zugang
Dauer
00:26:20 Min
Aufnahmedatum
2020-07-23
Hochgeladen am
2020-07-23 17:46:21
Sprache
en-US
Speaker
Roberto di Cosmo, director of Software Heritage
Content
The role of software source code in research and open science.
The Workshop
The Workshop on Open-Source Software Lifecycles (WOSSL) was held in the context of the European Science Cluster of Astronomy & Particle Physics ESFRI infrastructures (ESCAPE), bringing together people, data and services to contribute to the European Open Science Cloud. The workshop was held online from 23rd-28th July 2020, organized@FAU.
Copyright: CC-BY 4.0, Roberto di Cosmo