Access the full text.
Sign up today, get DeepDyve free for 14 days.
1IntroductionSustainability of data is often only an afterthought compared to the money and effort invested in digital solutions in research, politics, and industry. In the past decades, database technology has continually progressed and conferences on DB technology have flourished. By contrast, there was very little discussion on how to secure efficient and effective access to database content for longer periods. Archives, libraries, and other memory institutions have gathered database content in very basic formats like CSV, with descriptive metadata in various formats.In April 2020, this situation motivated Kai Naumann of the Landesarchiv BW to launch a challenge about how to preserve 125 databases for an imaginary customer so that they can be used in as many ways as possible in 2080. In the following 60 years a) no costs should occur apart from the secure storage of the data and b) the database content must not be publicly accessible (for data protection reasons). By database the imaginary client understood the graphical user interface, the business logic and the contents of the database management system (DBMS). The archived form of the databases should contain records or protocols about the archiving process. The variety of usage scenarios ranged from queries by non-specialists to further usage in the DBMS of the year 2080. Participants were allowed to use existing tools, processes and standards for the task.Kai presented the findings of the challenge at WeMissiPRES 2020WeMissiPRES 2020. https://www.dpconline.org/events/wemissipres video stream of 22 September 2020 at https://youtu.be/F3GjxD-iUpo?t=2450 (07.12.2021). and also at the Web Archiving and Digital Libraries Workshop during the JCDL 2020.Naumann, K. “125 Databases for the year 2080. A technology challenge and how it can be met.” ACM/IEEE Joint Conference on Digital Libraries 2020, Web Archiving and Digital Libraries Workshop, https://vtechworks.lib.vt.edu/handle/10919/99569 (07.12.2021). It turned out that there were no technically coherent solutions for this scenario, let alone business models. But among the entries were some standardisation efforts and lots of good ideas on how to proceed. This became the basis of the Databases for 2080 Workshop. Itwas launched by the Landesarchiv with a call for participationhttps://www.landesarchiv-bw.de/de/aktuelles/nachrichten/71969 (07.12.2021). in January 2021 and took place virtually on 5–6 October 2021. The first session was dedicated to options and strategies, the second to standardisation and application, followed by open discussion sessions. Participants came from industrial contexts, memory institutions, IT engineering and universities, mainly in Europe but also in the US and Australia.For the sake of brevity, this report will only mention some presentations, but invites all interested readers to consult the conference homepagehttps://www.landesarchiv-bw.de/de/aktuelles/termine/72973 (07.12.2021). for further reading and watching. A proceedings ebook on the conference will be published in 2021, containing short reports on all presentations, short speaker biographies, presentation slides and in some cases even presentation manuscripts. The presentations were recorded at the workshop and the videos will be available at the Landesarchiv’s website.2How data projects in the Humanities can survive longerThe keynote presentation was given by Brigitte Mathiak from GESIS (Leibniz Institute for the Social Sciences, Cologne). Mathiak and her colleagues asked Humanities scholars about the survival rate of their research, and gained the insight that maintenance of websites mostly stopped at the end of research projects. The average lifespan of digital scholarly editions was 8.5 years in contrast to books that can survive for centuries without needing further attention. Mathiak outlined several projects that deal with this problem. Some try to sustain the digital scholarly editions as living systems or reduce them to a smaller application with only standard access methods or move the old systems to newer platforms. These methods can be used retroactively on legacy projects, they are relatively quick to learn and to apply, they do improve maintenance costs significantly, but inevitably re-design or shutdown is necessary in order to reduce security risks. Others try to steer scholars away from stand-alone custom software towards modular, standards-based software like GAMSGAMS (Geisteswissenschaftliches Asset Management System) has been developed at the University of Graz (Austria). https://gams.uni-graz.at/ (07.12.2021). to curate their digital assets which then will be maintained through the central system. A multi-layered approach is a good idea according to Mathiak, like the King’s Lab approach at King’s College London.Neuefeind C., B. Mathiak et al. “Sustainability for Digital Humanities Systems.” DH2020 conference abstracts (2020). https://dh2020.adho.org/wp-content/uploads/2020/07/565_SustainabilityStrategiesforDigitalHumanitiesSystems.html (07.12.2021).Fig. 1:Table in blue space (Source: Landesarchiv BW)3Reading DNA means dissolving itRaja Appuswamy is a database expert who has joined the OligoArchive project at EURECOM in Southern France as Assistant Professor, exploring DNA as the next generation data carrier. Its storage density is by several orders of magnitude higher than magnetic media and it is very durable – it can survive for a long time and in harsh conditions. As a proof of concept, OligoArchive has stored databases of the National Archives of Denmark on DNA. While writing data to DNA is very costly, devices for conversion of DNA code to computing equipment are affordable. For DNA writing, every data stream is encoded many times in order to minimize possible losses and enable a reading by which the original data is collated into the “consensus” of the original data stream. A rather baffling insight was that reading DNA data carriers always involves liquids, making reading them a once-only event. Appuswamy pointed out that DNA storage is heavily dependent on self-explainingness. Decoding methods have to be safely transmitted to future users because the data carrier itself does not convey any identifiable design principles. Could this fact leverage standardisation efforts for database and other content?4Emulation: a prolonged sunset phase for database systemsKlaus Rechert is a postdoctoral researcher at the University of Freiburg (Germany) and also part of the company Stabilize located in Delaware, USA. While all other presentations at the workshop made use of the migration approach for longevity of database content, he presented a business case for preservation of the whole database performance, calling it a software-driven approach. In the EMiL and EaaSI projects, database system stacks have been emulated, offering full access to a service even though hardware and software are obsolete. Security threats are ruled out by starting the emulated services in sandboxes. Rechert confessed several non-technical problems around this strategy, especially intellectual property rights that last for decades even though market relevance of outdated software subsides after ten to twenty years. On the other hand, emulation seems to be a large market for computing centres and companies who need access for a time span of 10 to 30 years. Discussions revealed a reluctance about applying emulation to database content for the very long term.5The 15-terabyte screwdriverWhen the Diesel emission scandal shook the German car industry in 2015, BMW stayed relatively calm. This was partly due to its facilitator for database archiving, the CSP company located outside Munich. Car manufacturing has become a big data business, as Florian Hartl of CSP explained by showing a professional electric screwdriver used in car assembly lines. Throughout its lifespan of 10 years, this device has fed 15 TB of database content about how, when, by whom it was used on what car into the company’s data warehouses. Hartl predicted that its next generation, with more sensor equipment, will create 15 TB per year. CSP operates CHRONOS, a service that extracts rarely used data from productive DB systems, archives and digitally signs them, and leaves links to the retired data on the live systems. When suspicions on exhaust manipulation devices in German motors were raised, it proved easy for BMW to refute them. Since then, many other car companies have signed up with CSP, but also financial and other companies.Fig. 2:Screwdriver (Source: CSP GmbH)6An emerging market for database preservation software?Six institutions of various backgrounds (open source and proprietary alike) presented software solutions for migrating obsolete database content into standard formats and providing access to it. While proprietary archival formats are rarer, the SIARD format first conceived in 2007 is about to become a standard. SIARD is SQL 2008 conforming XML data packed into ZIP64 containers. Workshop discussions on how this format should evolve in the future were lively and should continue in the next months. Issues playing a part were SQLite encapsulation, stronger and clearer definitions in the specification, and large objects embedded in a database.Others think that simply using common formats, like discipline-specific XML, JSON, SQLite or the most common SQL dump formats, are sufficient for ensuring long-term access to database content. Standards of this kind include OGC formats for geospatial information,Open geospatial committee, https://www.ogc.org/docs/is (07.12.2021). national government standards for administrative records, or the DDI standardData Documentation Initiative (DDI), https://ddialliance.org/(07.12.2021). for social and economical statistics.The advocates of SIARD standardisation, the Digital Information LifeCycle Interoperability Standards (DILCIS) Board,https://dilcis.eu/ (07.12.2021). but also other organisations involved, like the Digital Preservation Coalition (DPC),https://www.dpconline.org/(07.12.2021). the Nationale Forschungsdateninfrastruktur (NFDI),https://www.nfdi.de/(07.12.2021). or the Open Planets Foundation (OPF)https://openpreservation.org/(07.12.2021). seemed to be keen to keep discussions on the workshop subject alive. It is thus very probable that there will be follow-up workshops.
ABI Technik – de Gruyter
Published: Feb 1, 2022
Access the full text.
Sign up today, get DeepDyve free for 14 days.