Arkivum to pilot commercial solution for petabyte-scale digital preservation in the final stage of EU’s ARCHIVER project

READING, BERKSHIRE, UK: 18 months into the EU’s €4.8m ARCHIVER project, UK-based Arkivum – a leading international specialist in digital archiving and digital preservation – has been selected to take its petabyte-scale solution through to the final, pilot stage, in preparation for commercialisation on the European Open Science Cloud (EOSC) and elsewhere.

Arkivum has successfully completed both the design and prototyping phase of ARCHIVER. The project was launched in June 2020 by a multinational scientific buyer group led by CERN, operator of the Large Hadron Collider near Geneva. ARCHIVER, which comes to an end in June 2022, receives European Commission funding from the European Union’s Horizon 2020 research and innovation programme (grant agreement No 824516).

The aim of ARCHIVER (Archiving and Preservation for Research Environments), is to achieve radically improved archiving and digital preservation for petabyte-scale data-intensive research. Supporting the IT requirements of European scientists, ARCHIVER will provide end-to-end archival and preservation services for the vast and ever-growing datasets generated by world-leading research institutions. Reflecting the move toward large-scale collaborative research supported by cloud infrastructure, and leveraging best practice and economies of scale, it embraces such issues as extreme data-scaling, network connectivity, service interoperability and business models.

In addition to CERN, the members of the ARCHIVER buyer group are DESY (the Deutsches Elektronen-Synchrotron, based in Hamburg and Berlin), EMBL-EBI (European Bioinformatics Institute, based in Cambridge), and PIC (Port d’Informació Científica, situated near Barcelona).

The prime aim of the project is to produce digital preservation solutions for the EOSC, and as such it has now been opened up to a diverse early adopter group that includes research institutions, national research networks, associations and inter-governmental organisations across Europe and even as far as Australia.

“The third and final phase of the ARCHIVER project will address the pressing need for long-term sustainable digital preservation and access solutions for scientific data,” says Matthew Addis, Chief Technology Officer and Co-Founder of Arkivum. “Among our key areas of focus are: achieving economic sustainability with a solution that is cost-effective at scale when working with very large datasets; ensuring environmental sustainability by minimising the carbon footprint of the solution, and, of course, applying good practice in the digital preservation and archiving of research data, with the aim of guaranteeing long-term, sustainable access for the scientific community to these hugely valuable resources.

“Arkivum’s ARCHIVER solution will now be applied to a broader range of use cases. It is designed to be flexible – cloud-provider agnostic, but also suited to on-premises deployment – and responsive to the archiving needs of a wide range of organisations. Scalability and economies of scale have been a prime concern throughout the project, and smaller operations, maybe working with terabytes rather than petabytes, will also be able to benefit by becoming tenants of the Arkivum SaaS solution.

“Across the board, we are constantly mindful of the ever-growing need for long-term digital preservation services, of the ideology of open data, and of the imperative of ensuring data is FAIR, keeping digital assets Findable, Accessible, Interoperable and Reusable.”

Speaking for CERN, Jean-Yves Le Meur, the institution’s Project Leader for Digital Memory, says: "During the previous stage (Prototype), the testing activities focused essentially on functional requirements. The Pilot stage will now provide the opportunity to align and validate the Arkivum SaaS platform and its supported APIs with the needs of the CERN Digital Memory project. Specifically, the next steps will require an understanding of how some of the large information systems used at CERN could potentially interact with Arkivum to progressively preserve the CERN digital heritage into a compliant archive (following schemes such as ISO 14721 and CoreTrustSeal)."

For all phases of the ARCHIVER project, Arkivum has selected Google Cloud, a provider of highly scalable and cost-efficient IaaS and a world leader in cloud environmental sustainability. Over the course of the project, the environmental impact of long-term archiving and of ensuring access to huge datasets has risen up the agenda, becoming, in Matthew Addis’s words, “a headline issue for the pilot phase”. Google Cloud runs the cleanest cloud in the industry and is already carbon-neutral and aims to run on carbon-free energy, 24/7, at all of its data centres by 2030. Google Cloud’s impact is far greater when it shares technology, methods, and funding to enable organizations around the world to transition to more carbon-free and sustainable systems.