Written by Mihalis Kritikos.
Distributed computing has accelerated Covid‑19 research in molecular dynamics as it allows people to voluntarily make their computers available to scientists for effective virtual screening of chemical compounds. As computing initiatives grow to meet the increasing demand for massive computational power, this analysis examines the current applications and the surrounding legal and policy questions.
Computer simulations and their capacity to process inconceivable amounts of data in a very short time can be extremely effective in helping scientists map the behaviour and reveal the three-dimensional shape of all protein structures of a virus. The need for vast processing power to simulate the folding of the virus proteins can be met in two ways: by using the world’s fastest supercomputers and/or by seizing the opportunities that grid/distributed computing offers.
Edge computing carries the capacity for efficient processing of massive data sets and performing multifaceted simulations of the dynamics of protein molecules to understand the process of protein folding, which could serve as a basis for developing effective therapies. It is based on the integration of distributed resources of many computers into a single unit that effectively takes the form of a ‘virtual supercomputer’. As a form of citizen science (citizens participating in the scientific process via crowd-sourcing without any particular cognitive involvement), distributed computing that enables concurrent computation has been invaluable during the pandemic. It operates on the basis of allocating large computing tasks to different users on the grid and allowing anyone to contribute computing power to a common cause. In the context of Covid‑19, distributed computing has been used extensively to allow millions of volunteers worldwide to lend their processing power to scientists who require borderless access to distributed computing infrastructures and massive amounts of computational power to run complex simulations to model molecular dynamics.
Potential impacts and developments
The scientific community has mobilised a series of voluntary computing initiatives, such as Folding@home and Rosetta@home that simulate protein dynamics to help with the design of drugs to fight Covid‑19. The Folding@home project is by far the most powerful crowd-sourced supercomputer in the world, involving more than 1 000 000 computers, while, before the pandemic, 30 000 devices were running for Folding@home. It currently encompasses approximately 2.4 exaflops of computational power which is more raw computing power than the world’s 500 largest traditional supercomputers combined. The combined computing power of Folding@home, which also makes use of CERN’s computing resources, has been used to virtually screen 800 potential drug compounds.
Rosetta@home is another massive network of volunteer computers that allows citizen scientists to lend their computers’ CPUs and RAM for protein analysis tasks. Currently, Rosetta@home comprises nearly 4 406 203 hosts across 151 countries, collectively enabling an estimated 473 petaflops of volunteer cloud computing power, setting another successful example of donation of computing power for Covid‑19 research. Moreover, it is worth mentioning the OpenPandemics-COVID-19 project coordinated by the World Community Grid, an IBM social impact initiative that is currently screening hundreds of millions of molecules. This grid has accelerated the process of tackling Covid‑19 through the involvement of 7 047 819 devices and aims to develop an open source toolkit that could be used as a basis for seeking treatments in the event of future pandemics. Οther distributive initiatives include DreamLab, a specialised application developed by Vodafone, which accumulates the processing power of smartphones to analyse coronavirus-related complex data while phones are being charged, the Worldwide LHC Computing Grid at CERN, the Berkeley Open Infrastructure for Network Computing, SiDock@home COVID Moonshot and the Open Science Grid.
The European Union has been supporting several citizen science actions, such as the active and voluntary public participation in research as part of its Open Science policy. In the context of Covid‑19, the support has taken many forms and the promotion of distributed computing is one of the most prominent. Among others, Europe-based distributive computing projects include the work carried out by the Slovenia-based COVID.si, the Italian Institute for Nuclear Physics and the HADDOCK service on the WeNMR platform supported by the European Grid Infrastructure, which is the largest distributed computing infrastructure for research, bringing together hundreds of data centres worldwide.
The distributed nature of the underlying computing network, where data are processed using computing resources anywhere, carries multiple policy challenges that involve questions about restricting access to the distributed computing system, safeguarding network security and privacy, and ensuring the smooth integration of computing, software and storage resources. More concretely, heterogeneity can become a challenge in the operation of these networks, as a varied group of hardware devices running various operating systems communicate among themselves to serve a particular purpose. Further, network security poses a fundamental challenge in terms of possible data leakage software piracy, integrity infringement and denial of service. The integration of different distributed components and possible malicious devices also creates privacy challenges in addition to more difficulty in troubleshooting and diagnostics due to distribution across multiple servers.
These challenges can be tackled efficiently if policy-makers aim to render distributed computing an essential part of high-computing frameworks and an attractive and trustworthy option for individuals who could donate spare computing power for a social reason. To achieve this, a series of requirements need to be met. First, an EU-wide registry of scientific projects that require computing power for massive and immediate processing could be created. Further, incentives that could bring thousands and/or millions of individual users under the same computing ‘roof’ would also need to be developed. Τhese incentives could take the form of tailoring applications and services to particular users’ specific requirements, and of promoting the benefits of open science and citizen science including the training of users, the empowerment of distributed digital infrastructures and the development of open-source software. Additionally, open-source components that could enable the access and processing of data collected/stored in different platforms and formats could be further developed. The European Commission’s adoption of its new open source software strategy 2020-2023 makes explicit reference to the need for sharing and reusing of software solutions, knowledge and expertise to benefit society.
The European Commission could develop funding schemes for the development of open and flexible distributed computing architecture based on common standards for the smooth integration of hardware and software components, on user-friendly and comprehensive repositories as well as on standards-based common interfaces that could address the challenges of privacy and cybersecurity. Fostering the development of data and computing e-infrastructure at the EU level and achieving their accessibility and interoperability – independent of the different data-driven technologies used – should be prioritised. Ensuring the connectivity of all European households and populated areas, which could further enhance the effectiveness of distributed computing initiatives, is an essential part of the Commission’s vision for Europe’s digital transformation by 2030, as was recently presented in the form of a digital compass. The success of the distributed projects to tackle Covid‑19 illustrates the need for the EU to invest further in the development of efficient, sound and user-friendly crowd-sourced distributed computing projects and for an in-depth examination of the factors that could increase the involvement of volunteers in these bottom-up efforts. This should be seen as part of a broader strategy for enhancing lay participation in scientific endeavours by facilitating digital accessibility and removing computational obstacles for communities and user groups that face serious technical and financial difficulties. The ultimate aim should be the gradual creation of a trustworthy federated European data and distributed computing infrastructure that is based on optimised access to IT equipment and services, on a clear definition of the goals and the objectives of each and every distributed computing project, as well as on the acknowledgment of the cost-effective character of distributed systems compared with the use of expensive supercomputers.
Read the complete ‘at a glance’ on ‘What if we could fight coronavirus by pooling computing power?‘ in the Think Tank pages of the European Parliament.