When the bits hit the fan: Electronic records and archives

Remarks prepared for a session entitled "Acquisition of Electronic Records of Individuals" at the annual meeting of the Association of Canadian Archivists, Ottawa, 26 May 1994.

Terry Abraham
Head, Special Collections and Archives
University of Idaho Library
Moscow, Idaho 83844-3125

Dinosaur joke: Two dinosaurs are opening fortune cookies. "Funny," one says, "mine doesn't say anything either."

Unlike the dinosaurs, archivists and manuscript curators at all institutions are aware that their world is changing, that there is a tidal wave of documentary materials headed their way. Materials unlike all others they have yet dealt with. Materials that exist only as the slightly ordered fragments of magnetism on an iron substrate. Materials that can only be read or viewed through special lenses, lenses that distort even as they display. These are records in electronic format and we have been preparing for their arrival as assiduously as the 1952 school boards ignored the impending onslaught of baby boom first graders.

While there are those who have worked with magnetic media archives for years, most of us are so awash in huge backlogs of traditional paper records and documentation that we are only just now sticking our heads up and looking around and noticing the oncoming tsunami.

Fortunately, some people are thinking and theorizing about electronic records, watching for patterns and erecting signposts. Fortunately, as Terry Cook pointed out in Archivaria, it will "be easier...to join the electronic records race the second time around than it was initially."[1]

The principal difference between electronic records and conventional records is that the medium is now completely independent of the message. A letter is no longer a piece of paper that may have been folded, placed into an envelope, and sent through the mail. We can now peel the words off the piece of paper, send it across the ether, reconstitute it on a screen or spray it onto another piece of paper. Some electronic records exist only as "virtual documents," never existing as real documents. The report is generated to the screen once upon request; by the time of the next request the information has been changed. One example that we might be familiar with is the library circulation system that reports on the status of different books in the system. The information is dynamic and constantly changing.

Another example might be an e-mail system that maintained transmission and receipt logs behind the scenes. In this case, and many others, as one archivist noted, "the systems used to create these records are themselves part of the record."[2]

Nonetheless, many of our concerns about the document or group of documents are the same as they have ever been. What is the provenance of the record, how do we provide access to it, how do we describe it, how do we preserve and store it, how do we present it to the user?

We are in a transition between the old and the new. ARCHIVES listserv subscribers last year read the complaints of a faculty member whose decade's worth of 8" Displaywrite diskettes were refused by a university archives because they were unable to read or properly store them. Here is an example for those of you who missed this era. [Hold up 8" diskette.]

Last year the Legislature of the State of Idaho passed a law allowing taxidermists and fur buyers to keep records on media other than paper.[3] Within recent memory, the contents of White House hard disks were being argued in the courts. When we look at all the data that is accessible on the Internet we wonder who, if anyone, should have the responsibility for preserving it. Although first a joke, it is now accepted that all newsgroup postings, even those to USENET groups such as ALT.STARTREK.GET.A.LIFE, are potentially being archived and may come back to haunt the intemperate poster.[4] The LISTSERV software, such as that used by the Archives & Archivists List, does have automatic archiving and searching capabilities and this message base is being used as a resource by students.

Last February, we again had a wide-ranging and interminable discussion of these changes on the ARCHIVES listserv. The original inquiry was whether "to accept e-mail and WordPerfect files created on the campus network by a retiring faculty member."[5] In sorting out the responses to this question we see that they fall into three camps.

First, was the question of whether such electronic records were archival records. A function of this appraisal process was the institutional mission. It was quickly suggested that if the disks contained information that fit one's collecting responsibilities, then they should be acquired and retained.

The second class of responses dealt with physical maintenance of these records. At one end we had the simple (and to some, too simple) admonition to print them all out to paper because the technology (hardware and software) is changing so rapidly. At the other end were discussions of the permanence of optical disks and magnetic impulses. In addition, it was noted that electronic records are also dependent on the software used to create them and view them; over time that software may change or become unavailable. Therefore, data migration would involve the move to both a newer hardware platform and newer software. For instance, can you still convert Displaywrite data on eight inch disks to MSWord data on 3 1/2 inch disks without losing any of the formatting?

The third class of response was to urge archivists to grasp this opportunity to shape the technology to achieve archival ends. "Archivists," said one, "need to be involved in the decisions that are being made in many companies and schools regarding choices for new computer technologies and computer systems."[6] Some college and university archivists mentioned their service on campus task forces that have been set up to design new system-wide automation resources.

Throughout this discussion there was a disturbing subtext. Archivists, it was often assumed, are solely responsible for the retention and preservation of records. The archival profession, as it has necessarily increased its level of professionalism, has begun to assert that it not only knows best about what is or is not archival, but that it makes these decisions in a societal vacuum. In reality, archivists and manuscript curators operate in a field where everything that they do is dependent upon others.[7] Archivists may help determine records schedules, for instance, but it is the individual department's clerical staff who may, or may not, remember to follow the schedule when filing, sorting, or disposing of records. The Maine State Archives recently reported such a case. An agency director called the archives asking for the use of a shredder for some confidential records. In anticipating the shredding, and to save the state money, she had had her staff remove all the manila folders. The state archives now has the job of trying to reconstitute the files from boxes full of loose papers. There was, we were assured, a retention schedule in place, but it was inconveniently forgotten.[8]

In addition, the determination of what to save has been formulated in an environment independent of the needs of users; in fact, we bemoan the lack of user studies in our discipline but make little attempt to initiate such studies. Can we demonstrate that those records we designate as archival are the minimum necessary to answer all research questions? According to one eminent historian, Archivists are the wrong people to determine if something is historically important, as reported in a recent article in the New Yorker.[9]

On the supply side of the equation, we should recognize that our support funds are dependent on what has been termed "resource allocators," that is, those who hand out the money. This may be a dean or director, a finance officer, or a remote vice-president. A study of their attitudes concluded that archivists were considered to be doing important work, but not so important as to take a great deal of their time, attention, or budget. In much the same way, the discussion about electronic records in archives misses an essential point: we can not solve this problem alone. Even service on a campus computing task force is not going to solve all one's archival problems with electronic records.

Rob Spindler of Arizona State has made this point when he wrote on the LISTSERV: "the software dependent environments that are most likely to be made upgradable are probably those in which our society's largest information investments are made. In other words, business and government have made substantial investments in certain technologies -- I would argue that those investments are most likely to be protected by insuring that this information can be upgraded to newer systems."[10]

Our concerns about preservation of electronic records, data migration, and software and hardware capability will all be solved ... if the larger society agrees with us that these issues and the records are important.

In the meantime, I think we do have some slight grounds for optimism. I think there are solutions to the problems of electronic records and I expect the technology to provide us with those solutions. I base this on three principles: first, the Second Law of Thermodynamics suggests that order is but a small part of a larger world of disorder. I have written at some length about this before, but in summary, systems tend towards disorder. Thus, it takes an expenditure of energy to maintain a semblance of order under this pressure.[11] Arrayed against the downward spiral of informational disorder is a relatively small band of archivists and concerned citizens anxious for the record to be retained and willing to make the necessary energy expenditure.

Second, like governments, society gets the archives it deserves. There will always be a relatively low level of support for archival functions. They are just not that important to most people. For example, just one casino chain in the United States rakes in over nine billion dollars every year. I would estimate that the total expenditure on archives in the States is less than two percent of that.[12] Casino gambling is obviously much more important to society than are archives. Matching expectations with resources will continue to be extremely difficult. Thus archivists are not going to be able to solve the problems of electronic records by throwing more archivists at them. We must concentrate on working smarter.

Third, the newly emerging field of study labeled complexity theory offers the liberating idea that self-organization is inevitable in large systems.[13] Feedback systems, first developed to control steam engines in the early days of the Industrial Revolution, are recognized as one example of the self-organizational aspects of complex natural systems. Archivists documenting organizations must rely on the members of these to set aside and preserve the vital records necessary for permanent documentation. And they will do it for us if we make it easy for them. Records management is really just a subset of people management, and thus more difficult than we give it credit for.

In the meantime, while we wait for new tools to help us select, describe and make available electronic records, what resources are available to us now? Here are my "interim" suggestions; hard and fast rules they are not, subject as they are to rapidly changing technologies.

First, one solution is to put it on paper and treat it as any other documentary record. Better yet, let the agency staff put it on paper and file it. According to Cyndi Merritt, an archivist with the State of Washington, their policy on e-mail messages is to schedule them into three classes, only part of which are considered permanent records.[14] Of those designated permanent, paper copies are to be created and filed. No electronic copies are maintained. While I believe this is a common practice, comparable to notes on telephone conversations, not all agree that it is sufficient. In the federal court ruling regarding the White House PROFS files, Judge Richey found that "the printing of the records onto hard copy does not satisfy the requirements of the Federal Records Act since it does not show who received the information and when they received it."[15]

I am less in favor of this as a solution than I was previously. Software systems to manage electronic records are increasingly becoming integral to the record. In addition, it is clear that the impact of electronic messages is different from messages on paper. The intemperate postings on the ARCHIVES listserv are but one example of this common phenomenon. Any reformatting of the message then should only be undertaken as an extreme measure to preserve the content, much as society has expended millions of dollars microfilming newspapers.

I am coming toward expecting a technological answer to this problem. The many arguments for maintaining the electronic format suggest to us that -- whether it is optical disks or magnetic regeneration, which is the systematic copying of old files to new tapes or disks -- some long-term solutions are close at hand. In the meantime, I too have ten-year-old floppies that so far still seem readable.

Second, encourage the use of document management techniques already built into the system now being used. Disks can be labeled internally and externally. Hard disks can be subdivided into archival materials and temporary files. Major word-processing software includes document summaries that keep track of the document version, its date of creation and revision, and allow searches to be made on the document's title, author, operator, key words and comments. If a comment field includes the term "Archival," or some other codeword, then it is an easy matter to use the searching mechanisms that permit copying or deleting selected files. These will become even more important as offices migrate to networked PCs and abandon the limitations of stand-alone units. As Fred Stielow has pointed out on the LISTSERV, there are software and hardware combinations that "have transparent and automatic controls that reduce [archival] decision making and save energy."[16] Such "recordkeeping systems" are just beginning to be addressed in the archiva l literature.[17]

Third, if the cost of memory and storage continues to decline as it has been, "archiving" electronic records will rely more on software searching attributes and less on physical storage capabilities. This also indicates that access will be more a function of the searching tools and less of the order of one piece of paper next to another. Physical location, here in Ottawa, or on a server in Idaho, will also be meaningless. While a lot of less-than-archival junk will be retained, it will probably not be any more than is kept in paper-based systems. But the effort to organize and retrieve material should be reduced by automated methods. Newly developed tools, Gopher, Archie, Veronica, and World-Wide-Web, are the beginnings of a transformation of the electronic frontier. There are products available today under the term "electronic document management technologies" that offer "document imaging; archival/retrieval to an optical disk jukebox; O[ptical] C[haracter] R[ecognition]; natural language text search with fuzzy-search-by-word, fuzzy-search-by-concept, and query by meaning; annotations; forms overlay/dropout; and fax input/output ...; and variable scanner and printer support."[18] And, while I have only a fuzzy idea of what these terms represent, I can tell you what it means; it means that cyberspace is increasingly where archivists, like the rest of the information society, will be living full-time. And if we do not accept and embrace this role, we will have reached a dead end. We will be like the two dinosaurs in the cartoon; they are out walking and notice many other dinosaurs, feet up, dead. One says to the other: "Oh-oh, I don't like the looks of this."

Based, in part on "When the bits hit the fan: Archives and electronic records Introductory" comments for a session entitled Electronic Records Roundtable at the joint meeting of the Northwest Archivists and the Conference of Intermountain Archivists, Boise, Idaho, May 1, 1993.

[1] Cook, Terry. "Easy to byte, harder to chew: the second generation of electronic records archives." Archivaria 33(Winter 1991-92)203.

[2] Date: Thu, 10 Feb 1994 08:34:01 EST Sender: Archives & Archivists <ARCHIVES@MIAMIU.ACS.MUOHIO.EDU> From: David Saumweber <dsaumweb@NAS.EDU>

[3] SB1048

[4] Steve Outing, quoted by John Byczkowski, "Private words on Internet are for public's eyes." Idaho Statesman, May 12, 1994, p. 6D.

[5] Date: Fri, 4 Feb 1994 11:40:04 CST Sender: Archives & Archivists <ARCHIVES@MIAMIU.ACS.MUOHIO.EDU> From: Jim Parks, College Librarian <parksjf@OKRA.MILLSAPS.EDU>

[6] Date: Wed, 9 Feb 1994 22:35:34 PST Sender: Archives & Archivists <ARCHIVES@MIAMIU.ACS.MUOHIO.EDU> From: Thomas A. La Porte <tlaporte@UMICH.EDU>

[7] This point, and some of those following, were recently asserted "as basic operating assumptions" by Richard M. Kesner in his "Teaching archivists about information technology concepts: a needs assessment," American Archivist, 56:3(Summer 1993)435.

[8] Date: Mon, 11 Apr 1994 09:45:43 -500 Sender: Archives & Archivists <ARCHIVES@MIAMIU.ACS.MUOHIO.EDU> From: "Nina M. Osier" <nmosier@SATURN.CAPS.MAINE.EDU> Subject: Re: Renaming labels

[9] Geoffrey Giles, University of Florida and chair of the German Studies Association's archive committee, quoted by Gerald Posner, Secrets of the files. New Yorker, March 14, 1994. p. 43.

[10] Date: Mon, 14 Feb 1994 10:21:01 -0500 Sender: Archives & Archivists <ARCHIVES@MIAMIU.ACS.MUOHIO.EDU> From: Rob Spindler (iacrps@ASUACAD) <IACRPS%ASUACAD.BITNET@MIAMIU.ACS.MUOHIO.EDU>

[11] Abraham, Terry. "Entropy and archival disorder." Provenance 11:1(Spring 1984)94-99.

[12] 3,000 SAA members * $50,000=$150,000,000.00/$9,000,000,000*100=1.67%

[13] Time, Feb 22, 1993, p. 62. See also, Cowan, George. "Simple words About the New Science of Complexity." In: Whole earth review. Sumr 89 v n 63 Page: 94. Davies, Paul. "A new science of complexity." Summary: For years, people thought that systems such as living organisms were too complex to quantify. Information theory may shed new light on the mathematics of biology. In: New scientist. NOV 26 1988 v 120 n 1640 Page: 48.

[14] Cyndi Merritt (ARCHIVES@INDYCMS Listserv posting 7 Jul 1992 16:29:48)

[15] Summary posted to ARCHIVES@INDYCMS by David A. Wallace Mon, 11 Jan 1993 20:33:56

[16] Fred Stielow, ARCHIVES Listserv post of 8 Apr 93 15:13:36

[17] See, for instance, David Bearman, "Recordkeeping Systems," Archivaria 36 (Autumn 1993), pp. 16-36, as cited by Date: Thu, 12 May 1994 15:53:51 -0400 Reply-To: Archives & Archivists <ARCHIVES@MIAMIU.ACS.MUOHIO.EDU> Sender: Archives & Archivists <ARCHIVES@MIAMIU.ACS.MUOHIO.EDU> From: "David A. Wallace" <davidw@LIS.PITT.EDU> Subject: Re: EMAIL RECORDKEEPING SYSTEM

[18] Date: Wed, 23 Mar 1994 13:28:24 EST Sender: Archives & Archivists <ARCHIVES@MIAMIU.ACS.MUOHIO.EDU> From: Karen Board <KARBOARD%VTVM1.BITNET@MIAMIU.ACS.MUOHIO.EDU> Subject: Document scanning and retrieval systems

Return to Selected Papers and Presentations

erecs94.htm / June 1995 / tabraham@uidaho.edu