
2023 Author: Bryan Walter | [email protected]. Last modified: 2023-05-21 22:24
Modern science is unthinkable without machine processing of results, but do scientists always understand what programs they use and can they be trusted? The question is especially acute in astrophysics, where the results of scientific research are not so easy to verify in practice. The authors of a scientific article accepted for publication by one of the American astrophysical journals decided to find out how things actually stand in this area. Dmitry Zigfridovich Vibe, Doctor of Physical and Mathematical Sciences, Head of the Department of Physics and Evolution of Stars of the Institute of Astronomy of the Russian Academy of Sciences, Dmitry Zigfridovich Vibe, tells more about this article and the conclusions following from it at the request of N + 1.

The Astrophysical Journal Supplement Series has accepted an interesting article titled "Schrödinger's Code: A Preliminary Investigation into the Availability of Scientific Source Codes and the Survival of References in Astrophysics." Its authors decided to check to what extent the source codes of programs used in astronomical research are available for external verification.
The arguments in favor of the need for such a study are as follows. Astronomy of recent decades (like other branches of science) increasingly relies on the use of a wide variety of programs, but the source codes of these programs rarely become available to the scientific community, which sometimes makes it difficult to verify the results presented in the articles. The tradition of not exposing the source code to the public has developed for a long time, but in the old days there were objective reasons for this: it was not very convenient to distribute programs in the form of printouts, magnetic tapes and stacks of punched cards.
Now the situation has changed, and there are much more opportunities for publishing source codes, but so far this practice has not entered astronomical (and even general scientific) use. Meanwhile, we really actively use programs that were written by other people, and we believe that they give an adequate result, often without being able to check what these programs have inside. Therefore, calls are increasingly being heard to include programs in the general circulation of scientific knowledge, for example, to conduct their peer review.
The problem is very multifaceted, and the authors of the work have asked themselves while looking for an answer to a rather narrow question: what is the proportion of programs used and clearly identified in scientific articles that can be verified by other researchers? Even this turned out to be a difficult task, since astronomical articles often not only do not name the program that was used to solve a particular problem, but do not even write that this problem was solved by a program method (although the complexity of the problem makes this obvious). In many cases, it is considered sufficient to indicate not the software (software) itself, but a link to another work in which this software is named (“We used the same method as in Ivanov et al. 2013”).
The authors initially reviewed 2015 articles in three leading journals: Monthly Notices of the Royal Astronomical Society (MNRAS), Astrophysical Journal (ApJ), and Astronomy & Astrophysics (A&A), but then felt that the sample of articles from MNRAS and ApJ might not be representative. and in the end they limited themselves to only the A&A magazine.
In total, 1805 articles were published in this journal in 2015. From this list, the authors excluded letters and error messages, and then selected every tenth work for analysis, resulting in a sample of 166 articles. These articles were automatically and manually analyzed for the presence of mentions of certain programs used in the work on the article.
The search was carried out for 21 keywords, and as a result, in 166 articles the authors found 715 mentions or hints about the use of program codes. From this number, references were excluded for which it was impossible to establish the software used (information is given in another article; the use of software is stated, but without specifying specific names; the article obviously used some programs, but there is no direct recognition of this).
In total, after the rejection, 418 mentions remained. The authors have divided them into 10 categories: A - source code available for immediate download, B - binaries available for immediate download, C - source code available only to contributors, N - code not available for download (the specified site is missing, the download site cannot be found, download site found, but download URL does not work, etc.), W - the code works as an online service without providing the source code, GS (soft gate) - the code is automatically available for immediate download after providing certain information (registration), GH (hard gate) - long-term actions are required to gain access to the code (contact with the author, registration without immediate results, attendance at trainings on the program, etc.), P1 - the source code can be purchased (in the analyzed sample, it is almost exclusively Numerical Recipes), P2 - you need commercial software purchased in the form of executable files (for example, IDL), O - other.
The total for the 418 references found was as follows: A - 262, B - 26, C - 4, N - 70, W - 21, GS - 5, GH - 16, P1 - 6, P2 - 6, O - 2. These 418 mentions correspond to 285 unique programs. Among them, the alignment is as follows: A - 162, B - 14, C - 4, N - 63, W - 16, GS - 2, GH - 10, P1 - 6, P2 - 6, O - 2. The most popular program is IRAF - belongs to category A and was used in 31 articles from the sample. It is followed by SExtractor (10 mentions, category A), HIPE (7 mentions, category B) and GILDAS (5 mentions, category A). All these are programs for processing observation results.
Overall, the share of programs with readily available source codes was quite high - about 60 percent (categories A, GS and P1). The remaining 40 percent are either available as black boxes (binaries or web services, categories B and W), or are available to a limited number of researchers, or not available at all. The authors emphasize that the availability of the source code does not guarantee its performance, since they did not check whether the license allows widespread use of the code, whether it is compiled, whether the version that was used in a particular study is available for download, etc.
Along the way, the authors presented two more statistical results. First, they extracted hyperlinks from all A&A 2015 articles and tested them to work. It turned out that in two years (the study was carried out in August-October 2017), out of about two and a half thousand links, about 10 percent no longer work. Secondly, to find out the availability of some of the codes, they had to write messages to the e-mail of their authors. They received a total of 19 responses to 46 letters sent. 12 responses expressed the intention to make the code publicly available sooner or later. In only three cases, the authors of the programs provided download links, and in one more case, the code was submitted for placement in ASCL (Astrophysics Source Code Library). In other words, out of 46 attempts, a positive result was obtained in only 4 cases.
I did not have a consensus on the issue raised in the article. On the one hand, I had to find a bug in the published source code of the Zeus2d program, and that was good. On the other hand, the number of programs used is growing, and there is no time to check them. In other words, I do not care whether the source code of a certain program is available on the network, since I will not check it anyway, unreasonably relying on the fact that all the necessary verification was carried out by authors unknown to me.
Am I ready to upload the source codes of the programs I have written myself? Yes and no. If it is just publishing programs with no additional commitment, why not? Whoever needs it, let him figure it out. But the idea of code reviewing seems to me rather unrealistic. In modern programs, for example, for modeling gas-dynamic processes, there are millions of lines, and it is simply impossible to review them. It remains to rely on good old intuition and hope that a major error in the code will somehow give itself out as a non-physical result of the calculation.