Introduction

If the generalizations, abstractions and reductions we’re making do not actually solicit more extensive and nuanced textual analysis, then they no longer fulfill their basic thematic function. To thematize is, after all, to risk the singularity of a text in order to disseminate an idea more broadly – providing greater access to and context for discrete passages that might otherwise go unremembered and unremarked. If 93.1 percent of our texts remain uncited or unread within 5 years of their publication, then the mnemotechnical infrastructure of our institution is clearly broken. But how can we make the most exacting work on the most singular linguistic objects more accessible when locating this kind of work (let alone synthesizing it meaningfully into our own) requires more time and energy than we currently have at our disposal?

The task of the digital scholar, as I have argued, is not to do away with thematization but to thematize as effectively as possible. How might technology enable us to do so? The theories described in the previous chapters have consistently pointed towards utopian systems of knowledge, but have failed to clarify the practical steps we can individually take towards such ends. Rather than starting with a theoretical analysis of the most promising technologies available today, I want to emphasize the granular tensions that emerge when we change the substrate on which we read, write and remember texts. These tensions are largely overlooked in utopian projections about the future of digital scholarship even though they are essential for developing a workflow that might accommodate the largest community of users regardless of their technical expertise.

In order to take full advantage of the technology we have at our disposal we must check the impulse to drastically change the interface for which many of us still harbor palpable nostalgia.  The workflow one develops over the course of a lifetime in dealing with printed and written texts is an armature that will not easily yield to a radical questioning.  At some point, the potential time saved by learning a new one becomes more vertiginous than it is motivating. If the new workflow were really that much better than the old, the time wasted not using it becomes that much greater. Such “progress” threatens to miniaturize our work – making everything we’ve accomplished so far a shade of what we might have under more ideal conditions. Such thinking is detrimental because even the illusion of progress is self-fulfilling – a feedback loop wherein the happiness of reaching certain aims, however virtual they may be, improves the disposition with which we approach new ones and, thus, the overall efficacy of our work. I do not believe our current technology justifies such anxiety, but if we really want to enhance the way we read, write and remember texts then we need to appreciate their inherent tensions as we strive to redesign them.

I have opted here for an auto-ethnographic account of my shift from a print to digital library because I hope to make myself more accessible to scholars contemplating this shift and programmers interested in facilitating it. The development of a technology adequate to our needs requires collaboration and transcends divisions in expertise.

This chapter can be seen as a kind of user guide for enhanced citation and annotation that begins with the limitations of handwritten notes and ends with the possibilities of a collective knowledgebase. It emerges from a continuous shuttling between organizing digital text and reflecting on this process in writing. While I have tried to capture as many antitheses between colleagues, students and former selves as possible, I realize how often the questions that arise between design and reflection are obliterated by conventional written narrative. This is why I have tried to supplement this text with videos whenever possible.

My aim is to humanize the need for citation infrastructure within our discipline – something that might too easily be regarded as a technocratic fiat. If we want to find a digital workflow that will allow us to thematize more effectively (and overcome scholars’ inherent resistance to this workflow), then we must first understand the mnemotechnical forces at play in the predigital media of citation and annotation – how the need for a knowledgebase emerges from institutional and methodological problems in which we all participate and in which our current knowledge is enframed. Of central importance, here, are:

  • the shifting relationship between citation and annotation
  • the extent to which the conceptual rigidity of themes corresponds with the formal rigidity of the medium on which they are stored
  • the ways in which hierarchical outlines provide the foundational structure of thematic thought and how they might be supplemented or surpassed by other forms of organization

I want to show that knowledgebases are not a radical departure from traditional mnemotechnologies even though they are the first form of mnemotechnology capable of generating the kind of infrastructure required for a more robust semantic web and borderless mode of textual production.

Written / Digital Annotation

Nearing the end of my undergraduate degree, when it was time to produce an original thesis not explicitly framed by the questions of any particular class, I was troubled by the apparent uselessness of my handwritten notes. I distinctly recall the sinking feeling I had when I began to question whether I had wasted hundreds of hours writing hundreds of pages of notes.  It wasn’t as if I had inscribed them all into my memory, and yet I only ever consulted them a handful of times. Why did I bother? More often than not when I did refer back to them, what I really wanted was the citation around which the discussions and lectures had been based. But these citations were conspicuously absent – probably because they would have taken too long to transcribe in the moment I was trying to comprehend what was being said. There were citations for some of the passages I was seeking, but I was surprised by how inconsistently these helped clarify what I had been trying to capture in the notes. Often the page range of the citation was ill-defined, or the specific language being discussed was not immediately obvious (e.g. when a passage discussed earlier in class was being compared, tangentially, with something later on). Even when I was able to locate the citation and recall its connection with the notes, I typically found that there was enough to say from out of the immediate context of the pages I just reread. Perhaps I was anxious that the notes might prove embarrassingly basic or utterly incongruous with the new line of thought that I was just then in the moment of developing. The prospect of reconciling them with new line of thought was not only tedious; it also seemed to risk muddling the vitality of what I had yet to write down. Once I’d been relayed to the original text, I seldom returned to the notes and I certainly didn’t update them with whatever new insights I might have had while rereading the cited passage.

This familiar predicament raises several questions about the inherent mnemotechnical tensions of academic work:

  • Is annotation most effective as an independent, hierarchical outline or as a marginal shorthand that is both typographically and conceptually close to the original text?
    • Should an outline record the momentary connections that flicker in the mind of the student or transcribe more or less accurately what the teacher is saying?
    • Should teachers distribute such outlines? Would the value of the class be undermined by this?
      • Do the answers to these questions differ with regard to lecture and discussion-based notes?
  • How often do class notes actually serve as a spur for formal essays?
    • How often do essay outlines deviate irreconcilably from class notes?
    • How often and to what extent are both forms of annotation more or less expunged from the memory of the student upon completion of the essay / class?
  • How much collective mnemonic energy is wasted when every student in a class is simultaneously and independently attempting to transcribe some version of what is being said?
    • Is it better for students to take notes independently and compare them afterwards or collectively in an interface that allows for real-time comparison?
      • Obviously there’s something to be said for each individual’s ability to excise and organize the information to best suit their needs; each individual’s notes might be said to surpass an audio/visual recording of the class in some respect. Nevertheless, is the total isolation of each annotator entirely necessary?
    • Would it be beneficial for the teacher to see this collective annotation in real-time and make interventions and emendations whenever necessary?
      • Would such a pedagogical environment continue to require the continuous physical presence of all of those involved?

Many of these questions (especially those concerning the possibility of collective annotation) I will return to in the following sections. Here I want to point out that, while it may be easier to write an essay from a structured outline than it is to improvise one from out of the margins of annotated passages, the citations in and of themselves often have a greater variety of uses and an enduring value beyond the immediate requirements of a class.

In my experience taking and teaching classes involving textual analysis, I’ve found that citations can be shared in a way that allows discussion to flow spontaneously, openly and with more potential for collaborative construction. Outlines, on the other hand, present immediate hermeneutic discrepancies between possible readings that can easily stifle conversation. One of the reasons students end up regurgitating an interpretation that has been clearly outlined ahead of time is that they struggle to question an interpretive structure after the original text has been truncated and subordinated to it. Too much structure creates the illusion that interpretation is a process of decryption – a textual riddle that can be “solved” with the “right” evaluation of the “right” pieces of information.  Too little structure, by contrast, tends to generate elliptical and fragmented discussion (e.g. if I were to begin by asking the class’ thoughts about the assigned reading in its entirety). A curated discussion of carefully selected citations, however, can produce interpretations similar to what might have been presented at the outset in outline form.

The difference is that, when students read citations from the original text first,  not only can they derive the interpretation themselves, they can begin to experience interpretation as a process that depends fundamentally upon a continued questioning and refinement of the tentative distinctions and evaluations hazarded by peers. These will inevitably differ from what might have been outlined beforehand, but this should be seen as a sign of productivity rather than inefficiency. If we use these deviations to revise and expand our thinking and teaching, then they are a compelling reason for repeating the course in real time and living space rather than recording it once and distributing it asynchronously online. Even if the interpretive structure arrived at collectively during the discussion of a citation did not deviate a jot from the way it was outlined in the lesson plan, it’s still advantageous for students to participate in the construction of this knowledge from the ground up rather than having it imposed in a hierarchical form and authoritative fashion. This is, perhaps, what students mean when they describe something as ‘relatable’ (rather than relevant). Certainly all of the information we could distribute to them in the form of an outline would be ‘relevant’ (insofar as the content of this outline is directly related to what they need to learn), but it would still lack the ethos of nonprofessionals relating to one another to construct a complex form of knowledge from a finite piece of information. What’s missing from the outline provided beforehand is all of the discursive and dialogical tension that goes into its construction (e.g. why certain distinctions and connections could not be sustained, what unmade them, how and why they were reformulated etc.).

The hierarchical structure of the outline is too often the first and only form of knowledge organization even though it is the mode of annotation that seems least likely to endure – the most likely to be discarded once the individual composing it has achieved the goal at hand. The redundancy of so much academic notetaking suggests great potential for collaboration but the idea of a more systematic collaboration gets stigmatized as sign of laziness or even academic dishonesty. Why is it that so few courses ever provide guidelines on how to take notes effectively for the objectives at hand, let alone notes that can become part of a collaborative structure that endures beyond the term of study. This is not so much a failure of hierarchical organization as such as it is a failure of the medium in which these hierarchies are composed. Before the difficulty of coming to a consensus regarding the way ideas are subordinated to one another comes into play, there is an even more systemic problem with our inability to keep the links between citation and annotation alive. This is why alleviating the redundancy of academic coursework and generating more collaborative modes of learning will, at the very least, require the complete digitization of notes and the texts to which they refer.

After I acknowledged the extent to which my notes would be useless without clear and direct citational links, I began experimenting with different methods for improving these links using exclusively digital text and was far more successful thanks to their searchability and modularity.

Searchability

Even if I vaguely remembered some phrases or key words associated with an idea I was trying to remember, I could now instantly search the entire body of my notes and the primary texts themselves. As a result, direct quotations copy-and-pasted from the primary texts began to replace the vague paraphrases transcribed in the moment. This kind of instant searchability  allows us more direct access to what we’re thinking about while we’re thinking about it, enabling us to take notes closer to the moment in which our minds are saturated with the relevant context and possibilities of new connections. By freeing up our short term memory to think about the text rather than locate it, instant searchability keeps these momentary connections alive long enough for us to figure out what we were after when we first began reaching for them. Though the act of moving through physical space to consult a passage cited in a physical book seems to require a modicum of effort, this may amount to something rather significant when we are not sure where the idea is leading us in the first place (especially if there are several competing connections struggling to breech a level of consciousness upon which they can be preserved in memory).

Is it possible that, while immersed in thought, our minds censor many of the links that would actually require us to consult a book because engaging the perceptual-motor systems necessary find and search for it might obliterate any number of inchoate links (perhaps, even the links that prompted the search in the first place)? While we are, of course, “free” to pursue whichever links we see fit within the archives available to us, this freedom only really applies to the links of which we are conscious. If there were, indeed, a mnemotechnical mechanism designed to economize the preservation of preconscious links, then our freedom to browse digital text is in some way greater than that of written or printed text because the instant searchability of the former frees up the time and mental energy necessary for these links to become conscious. My point is that our freedom to browse is not as free as it might at first seem since whatever triggers us to actually connect what’s outlined in our minds with actual citations involves a surprisingly low ratio of invested time to projected payoff.  In shifting to digital notes and texts, I cannot help but wonder how my writing might have changed if looking things up never required that I physically turn away from the page in the middle of reading or writing it. How many links went unexplored over the years because pursuing them would have taken a few minutes rather than a few seconds?

The thematic impulse may extend all the way down to this preconscious level on which links are forgotten before they’ve had a chance to become memories (to save us from the madness of Borges’ Funes) and all the way up to the superego of the academic institution that requires us to be more and more merciless in our demarcation of what is relevant.

Modularity

Another great limitation of handwritten notes is that they make it prohibitively difficult and time-consuming to properly integrate something that occurs as an afterthought or to compile an early intimation of an idea with its subsequent iterations. The slightest whisper of inconvenience is often sufficient to sideline our intentions to revise – solidifying the inadequacies of a handwritten outline indefinitely.

While this disinclination to compile threatens the overall efficacy of notes, it can actually be quite productive in other contexts. Sometimes it seems easier to handwrite an essay under time constraint than to type one without any form of deadline because the slower speed and greater fixity of the medium drastically restricts our inclination to revise. The syntax of our thought, impeded by the physicality of writing, more closely approximates that of a spoken conversation where redundancy, generalization and delay provide a rhythm that allows thought more time to connect what is currently being said with what ought to be said next. Typing, especially digital, touch typing, makes it possible for language to outrun thought – depleting the temporal buffer that allows for the logical flow of ideas to solidify. This often results in a rhythm of writing defined by fits and starts, where the idea gets evacuated prematurely. Rather than looking forward to the next line, the mind grows preoccupied with fixing what was just written. Too often the typed idea makes a muddle of what could have been spoken or handwritten clearly if a bit more slowly. This isn’t to say that ideas do not also get muddled during conversation or while writing by hand, only that the syntax of thought that typing promotes has the potential to distort ideas that are quite easy to articulate in other media. The freedom and ease with which we can manipulate anything at any time creates a level of revision anxiety that speech and handwriting can often attenuate. If you qualify too much of what you say in conversation, your interlocutor will lose focus or interest and may spur you to get to the point. If you erase or cross out too many times, you will run out of space or the paper will rip. These material limitations, subtle as they may be, are radically absent from digital type.  Even a manual typewriter balances enhanced speed with the physical inconvenience of revision. Digital type, however, while in many ways superior in its ability to begin again without any material or social resistance, also suffers from this lack of resistance. Beneath the anxiety that comes with the freedom to revise infinitely, there is a more fundamental disruption to the syntax of our thought when it is deprived of this resistance and loses its rhythm.

I was only able to take full advantage of digital text once I had adjusted to the new rhythm and syntax of thinking that it required. After struggling to type a paragraph from beginning to end I began composing fragmentary outlines that I would later compile into more coherent blocks of text. Here the ‘enter’ and ‘tab’ keys used to create new lines and indentations began to replace the pauses that had formerly been marked by punctuation. My ideas could, thus, be divided into clauses that kept in sync with my actual thought process similar to the way in which instant messengers compel us to send fragments of our thought as they occur to us rather than later, when they can be articulated in paragraph form. There is a good deal of folk wisdom in the stigma against “writing novels” in our text messages. Not only because doing so keeps our interlocutors in limbo as ellipses flash on the screen, but also because trying to write this way with digital text inherently distorts the rhythm and syntax of our thoughts in a way that handwriting does not. The main difference between typed outlines and instant messaging, however, is the ability to regroup and rearrange these fragments of thought after they have been transcribed.

I found that this added modularity promoted more exhaustive, paratactical exploration of my ideas than I was able to achieve writing by hand. It encouraged me to experiment with sentence structures that I would probably have found tiresome, convoluted and Latinate. As it turns out, however, the genius of a Miltonic and Proustian sentence shines even more brilliantly when its clauses are parsed and subordinated in this manner. Outlining this way allowed me to compare and qualify more examples and contradictions in a more graphically intuitive manner than I ever could have while writing linearly.  I knew I could pursue more tangential threads without losing sight of the overall structure of the idea. I could conceive entirely new ideas in the labor of delivering others without having to abort either. The ease of excising and grafting these new growths elsewhere made cultivating them more appealing. Being able to do so in media res actually seemed to enhance my writerly rhythm more than disrupt it. Thus, we can see how the shift from handwriting to digital type has the power to naturalize what would otherwise appear aberrant (e.g. the shift here from natal to horticultural metaphors).

The structural effects of digital type extended far beyond the individual moment of writing. Rather than turning over a new leaf each day and letting the calendar tacitly impose itself on the structure of my notes, I was able to review and rearrange them asynchronously without necessarily having to rewrite them. Many ideas that appeared new and disconnected during class later revealed themselves as estranged relatives of things mentioned earlier or later on.  This reorganization often involved categories that were never introduced explicitly during discussion – themes that served to bridge different interpretations of the same materials between classes and themes that helped me distinguish citations that were of personal relevance from those that were of general relevance to the class. I feel that I began to develop the metathematic awareness necessary for making such complex organizational decisions only after the modularity of digital text made it more feasible. With enough time, even the my most eccentric passages and ideas began to congeal into meaningful themes, pushing back against the typically reductive movement of themes and suggesting that their flexibility is greatly dependent on the medium in which they are formed.

Self-generated themes can easily be regarded as the ideological blinders which, inevitably, they are. If my professors in college were to have looked at the thematic headings of my notes they would have seen, as if radioscopically, the ideological structure of my thought. But is such clarity part of the problem or the solution? If no one can purge their thinking of ideology, then why not don it with pride (or at least candor)? Presumably, because ideological presuppositions have the potential to undermine the integrity of the entire method of interpretation. Masking ideology in our own writing and exposing it in others’ may be the ur-trope of academic credibility, but it is also one of the greatest impediments to collaboration. Would it not be easier to confess the ideological blindness of our thematic choices from the outset by revealing the hierarchical structures of our notes and drafts? This question would seem to presume the visibility of the very structure that blinds us, but my point is not that any one of us can ever atone for ideological bias in toto, only that it might be easier to navigate this kind of bias collectively and strategically if we could easily see the organizational infrastructure behind the published draft – how certain themes are being coordinated hierarchically and how citations are being subordinated to these themes (or resisting this subordination).

As paradoxical as it sounds, I believe the mnemotechnology I’ll be discussing later can actually help us see this ideological blindness more clearly, but only if we embrace the necessarily digital imperative of citational infrastructure. The infrastructure of citation must be digital in order to move beyond the twofold insularity of the scholars working with private texts within semi-private institutional niches. Thus buffered, our ideologies will continue to hide and be hidden. As digital communities, however, we might construct an infrastructure capable of indexing the most nuanced articulations of our ideological framework – a knowledgebase that links citations, annotations and publications in a way that helps sustain more rigorous debate about more exigent and specific topics while lessening our inclination to skirmish trivially over the validity of ideologies in general. I will expand on all of this later but my main point is that printed and written annotations prevent us from inscribing ideology collectively within the kind of knowledgebase that could actually allow for amore  transparent and productive discussion of method.

On a more personal level, greater citational infrastructure promotes a more meaningful and autonomous relationship with the themes which, in a healthy academic body, serve as a kind of backbone – a structure that is stable without being ossified. As I mentioned earlier, the themes I constructed after migrating to digital notes enabled me to reflect upon my methodological and citational choices and to distinguish the interests of the class from my interests overall.  I learned how they must expand and contract as the body of information grows. Seeing my own generalizations swell and burst into more particular categories was far more educational than having these tensions pointed out by others. While I might have suspected my teachers of pushing me to adopt their interpretive frames, seeing generality as an informational, organizational problem of my own making pushed me to take more responsibility for it.

When I handwrote my notes I tended to adhere more closely to the substance of the lecture or discussion, purging them of the occasional connections that might have occurred to me at the time. But when it was time to write the essay I felt compelled to pursue the very connections I had previously excluded simply because using my notes as an outline seemed like regurgitating everything we had already discussed in class. Often I ended up staring dazedly at the quotes I intended to analyze with only a vague sense of how they related to the assignment and, more often than not, my efforts resulted in an awkward grafting of materials we discussed briefly (if at all) onto more general questions raised during the discussion of passages we discussed more extensively. The feedback I received typically suggested that the conceptual leaps I was making between various passages were as interesting as they were confusing, but this only made me try that much harder to bridge this gap – amassing ever more obscurities hoping, perversely, that they would somehow illuminate one another perfectly. I mistook the difficulty of the work for its sophistication. Really, my writing would probably have been easier and smarter had I a keener sense of what might be achieved within so limited a scope. The vagueness and sloppiness of those early analyses stems from an inability to distinguish and balance what was important for the class and what was of interest to me.

I know many teachers who see this kind of anguished indecision as a necessary moment in our intellectual coming-of-age. While I do think there should always be room to question and doubt our own approach vis-à-vis the established methodology, I cannot help but regard this rather romantic outlook as something of a cop out. Is it not this glib embrace of formlessness that scares many students away from the discipline in the first place? Is there really nothing that we can do to make the organizational needs of such formal assignments more transparent?

I, for one, found it much easier to extract essay topics from class discussions once I began outlining my notes on a computer. Having already worked up a kind of running categorical scheme for most classes, I think I was starting to see for myself how some things just couldn’t be accomplished in a few thousand words. When faced with the reality that many of the categories I had created already contained more material than the assignment would allow, I was forced to check my impulse to synthesize everything under the sun and try my hand at selective excision. While this process of selection will never be entirely free from indecision, the anguish of this decision can be greatly alleviated with even the most rudimentary form of citational infrastructure. Teaching and administrating this infrastructure on an institutional level, however, will require the additional capacities of a knowledgebase.

§

Returning to the original question about the basic function of notes – whether their purpose is to transcribe a class accurately or capture insights that might have greater relevance beyond the class – I admit that digital notes are not really a solution per se.  Even though they make it easier to record more of what transpires in a class, we must also account for our tendency to manipulate or dismiss each other’s ideas when they begin to encroach upon our own or challenging our the thematic infrastructure on which our ideas are constructed. The modularity I’ve been praising also makes it easier to efface such tensions. When the various pieces of information in an outline can easily be rearranged and reassigned to different categories, it is easier to lose track of why they did not fit in the first place. It would be unwise, then, to just dispense with these traces for many of the same reasons that it would be unwise to just disseminate the outline of a lesson without animating it through live discussion.

How, then, should we regard the various lacunae that appear in the initial drafts of notes taken under time constraint? The lapse may be the arbitrary result of distraction or sleep deprivation, but it may also be worthy of further examination. Lines of thought that terminate in unanswered questions are not necessarily a sign of poor discussion, instruction or understanding. What would we lose if the silence following such questions were effaced? How, in a system of collective and continual annotation, do we maintain the digital equivalent of this silence? Individual lapses in the record will probably remain inscrutable, but collective lapses may speak volumes for those with the (prosthetic) ears and eyes to detect them. Considering the significant role these lapses play in the rhythms of private reflection and public discussion and their general obscurity in digital text, we should insist that any mnemotechnology worthy of the name provide some means of tagging and engaging them.

Print / Digital Text / Folder Tree

During my undergraduate degree, I tended to view PDFs as a necessary evil.  Like most of us, I loved printed books even for their most superficial and sentimental trappings and I was in no way willing to give them up for the dubious convenience of a PDF. Moreover, the PDFs circulated for class ranged widely in quality from illegible scans of damaged books to digital type directly issued from the publisher. This disparity led me to use them largely as a means to an end – useful for filling in the gaps in my library, but in no way capable of replacing it. My attitude has changed significantly for two reasons which I will discuss below.

Politics of Digitization

While the digitization of the public record has been underway for quite some time, it remains one of the more divisive issues because the formatting of the digital text has the power to create and eliminate levels of accessibility and functionality. This gives rise to a myriad of approaches and redundant labor even within disciplines where the general standards of fidelity to the original text are more or less clear.

Hosting all of our texts on the web in HTML format seems, at first, the most logical choice. But when we consider how susceptible HTML versions of scholarly editions are to copyright law, we can see how semi-private digital editions like PDFs do, in fact, have a considerable strategic value. As I have argued, our mnemotechnology is most likely to advance if it does not deviate too rapidly and drastically from the technology of the book. By preserving the facsimile of the printed page, PDFs do not presume to “revolutionize” the book nor do they restrict us to a methodology or workflow tacitly encoded in file types of more limited compatibility. They are clunkier than other digital formats, but this remains a valuable protective mechanism that has not yet been rendered vestigial in the slow evolution of text.

While many academic publishers offer digital editions in forms other than PDF (e.g. epubmobiazw), we should not be so quick to view these as legitimate alternatives. Not only do many of them have digital rights management (DRM) restrictions that would interfere with our ability to extract and share larger selections of text, they also tend to compromise (if not butcher) the typography and aesthetic of the page. Many seem to have been converted into digital text in a rather desultory manner by some third party with little regard for the layout choices made by the publisher. Their ISBN numbers are sometimes missing or inaccurate and the page divisions which, for now at least, remain the fundamental unit of academic citation, have often been done away with entirely. Even if we eventually start cataloging and citing digital texts in some other way, this lack of compatibility with the current academic methodology does little to facilitate the shift.

The poor quality and low opinion of digital editions behooves the academic publishers who still depend, to a great extent, on the sale of individual copies of works. It’s important to realize, however, that this precarious situation cannot be sustained for much longer; the rather sad attempts to come up with a viable alternative to the facsimile PDF should be seen as a last ditch effort to cling to a univocal copyright (often at the expense of the individuals and institutions whose intellectual property is nominally protected). While visually speaking, facsimile PDFs appear faithful to this model of the book, they are, in fact, quite treacherous; the full-resolution page images of which they are comprised can always, eventually, be ripped from the DRM structures that contain them making it more and more difficult to price publications based on the quality of the text itself.

Pirate Libraries

The other reason for my shift to a PDF based library is far more pragmatic. I really had no idea just how many high-quality, full-text PDFs were available on illicit digital libraries hosted on servers abroad. These were so expansive that, upon discovering them, I found myself binge downloading books for hours (sometimes days) on end. I knew that what I was doing was in some way impacting humanities publishing houses and, thus, myself. But, as a typically destitute English graduate, I couldn’t resist the temptation to get while the getting was good. I was ever wary that someone was going to blow the whistle that would inevitably stanch this seemingly endless reservoir of free text.

I remember when one of the earliest and largest libraries was taken down, a blogger had likened it to the burning of the library of Alexandria – a statement we should not immediately dismiss as overblown especially when we consider how revolutionary the digital liberation of the entire print archive would be. There really is something rather epic (and epoch-defining) about the battle between pirates and publishers. Were it not for all of the priceless, original manuscripts purportedly enveloped by the blaze at Alexandria, the loss of such a massive and free digital library would have to be regarded as greater in magnitude (at least as far as the sheer quantity of information is concerned).

Although numerous sites have risen and fallen over the years under legal pressure from the academic presses, the overall breadth of the free digital archive has only continued to grow. It shows the same tenacity as the popular torrent site, The Pirate Bay, which, interestingly, has contributed to its own modern day mythopoiesis by adding the figure of the hydra to its original pirate ship insignia. Each of the beast’s many heads stands for a mirrored server in a different country outside the jurisdiction of western copyright law, emblematizing the reality that, even as some instances of the free digital archive are inevitably cut down, many more will rise to replace them. I will not name any of these directly here, but suffice it to say that there are at least a few people in every department capable of pointing curious readers in the right direction.

Folder Tree

In a matter of months, I had amassed over one thousand digital texts and was beginning to struggle to manage them. It was not as if I had PDFs strewn haphazardly across my hard drive either. The directory of subfolders I created to organize them was robust – too robust, in fact. The length of the various branching file paths were, in some cases, long enough to overload the processing power of my computer (especially during large-scale, backup and file transfer operations). One might ask why, with the availability of indexed searching, I would even bother to create such a folder tree? Why not just place all texts in the same folder and search for them by name?  As easy as it might be to recall a text by its title or the name of its author, this requires that these names are readily accessible within our biological memory. Searching by name would work well enough for all of the texts that had formerly resided on my physical bookshelf – the texts I had consulted so frequently that their very position on this shelf had its own mnemonic value – but when my library grew several times larger than what could reasonably be shelved on my mnemonic bookshelf, this manner searching became far less practical. I needed to keep track of texts I only considered reading for just a few moments while skimming and downloading others.

The folder tree I kept on my hard drive, while far from ideal, was an attempt to deal with the dramatic expansion of my library. It was divided into three major branches: ‘Literature,’ ‘Theory-Philosophy,’ and ‘Assorted.’ This last category is of particular interest because it marks the failure of the two more dominant categories and archives the traces of an organizational problem that I will return to later in the discussion of the personal knowledgebase. It was in this ‘Assorted’ folder that I found it useful to group texts according to the more contingent, thematic interests of my classes and research projects (rather than by the names of the authors as I had in the other two categories). But this meant that I could either keep duplicate copies of texts in the author folder and in the assorted folder or decide which of these locations was most relevant. I eventually compromised by placing shortcuts of files in the assorted folder and keeping the original texts in the author folders, but not before I realized how the locations of texts in this system could neither be restricted or duplicated without introducing structural tensions – how, on the one hand, populating these subfolders inconsistently with shortcuts would never fully preserve the associative pathways and thematic designations that helped me remember them and, on the other, discarding it would erase all of the pathways it did store, however imperfectly.

As I accumulated the oeuvres of nearly every author of major and minor importance to me and my Amazon wish list was halved and halved again, I began to entertain the possibility of abandoning the printed book entirely. Being something of an absolutist, I didn’t want half of my library vaulting into the 21st century with the other half lagging behind in the Gutenberg era. At the time, I was facing the brutal impracticality of lugging my entire physical library across the country for graduate school and, while I wasn’t quite sold on the idea of reading everything on a screen, after considering how much time and effort I already spent transcribing citations from printed texts and how much time I might save copy-and-pasting them, I decided just to go for it.

A large part of this decision was based on the realization that roughly 80% of the texts that I owned or wanted to own were available as high-resolution, low file size PDFs from the digital libraries I mentioned above. So it was only the small fraction of my library for which I couldn’t find a decent, preexisting PDF copy that would need to be converted manually. How hard could it be to make a decent PDF? Wouldn’t the time I spent digitizing print books be paltry when compared with the time saved copy-and-pasting quotations?

The whole process turns out to be remarkably simple if not particularly cheap. We must first be willing to spend several hundred dollars on a high-speed, sheet-fed scanner, which entails the willingness to cut some of our beloved books to pieces. (I imagine that for many of us the affective “cost” of the latter far outweighs the monetary cost of the former.) The Fujitsu ScanSnap sheet fed document scanner is capable of scanning books at ~50 pages/minute once their binding has been removed via paper slicer (provided that the pages themselves are not problematically thin, warped or glued). Once scanned, optical character recognition (OCR) enables us to convert any high-quality scan of any conventionally-formatted book into digital text in a matter of minutes with more than 95% accuracy. Any errors or artifacts introduced in the scanning process can then be corrected in Acrobat. This means that, with the necessary equipment, it’s possible for anyone with a modicum of experience to generate a professional-grade PDF of a full-length book in anywhere from 20 minutes to an hour.

I’ve come to think of digitization as a kind of mummification: after the book is gutted – its vital organs, excised and scanned, their digital immortality, assured – the pages are reinserted back into the outer cover. The process is almost undetectable until some unknowing bibliophile plucks a book from the shelf only to have its pages fall out and scatter across the floor which, for better or worse, hasn’t really been much of a problem. (I’ve even begun plastic wrapping stacks of digitized books I don’t want to display in order to prevent them from falling apart.) Another thing worth noting is that it often is not even necessary to purchase new copies at retail value because of the abundance of decommissioned library copies available for a fraction of the price on Amazon. While I am not without qualms about the violence of the digitization process, I think that the very existence of decommissioned library books which, so far as I can tell, were never even read is a more tragic reality than the need to dismember them in order to expedite the digitization process. At least their digital spirit actually gets read. It lives on in an infinitely reproducible, intrinsically shareable form. Despite the mummification, the digital copy is really less mum than ever before.

In retrospect, I can confidently say that the benefits of digitizing my library have more than outweighed the difficulties of learning how to digitize texts. The time saved by keyword searching and extracting citations allows me greater depth of coverage and annotation of each individual text. I have attempted, in the supplemental videos, to capture the nuances of this entire process from start to finish in order to reduce the learning curve for anyone interested in migrating from print to digital text. While it might seem rather mundane, I believe that our lack of awareness about the relative ease of this digitization process is one of the greatest impediments to actualizing some of the utopian visions of collective scholarship in the humanities that still remain ‘theoretical’ almost a century after they were first articulated. These tutorials should, thus, be seen as a practical and political intervention in the mnemotechnical infrastructure that prevails in many of our institutions.

 Intra / Extratextual Annotation / Citation Tree

The chances of actually making use of our notes, as I’ve argued earlier, hinges heavily upon their immanent visibility – being able to see which passages have been marked and annotated within a text without having to cross-reference separate documents. Shifting to digital text not only achieves the enhanced searchability and modularity we saw in the shift to digital notes, it also allows us to streamline the two by layering annotation and citation within the same document.

Once I got accustomed to reading on a screen, I began taking notes in Adobe Acrobat, which enabled me to write paragraphs worth of notes on a specific page or within a specific highlighted passage and to save these annotations directly to the PDF file itself (rather than some proprietary layer of metadata like many of the off-brand PDF annotators). The marginalia, in other words, was no longer “marginal.” The most ephemeral traces of every reading could be inscribed in the moment and with the greatest possible detail within the text itself.

This shift to intratextual annotation was largely solicited by my coursework, which was heavily invested in the close reading of individual texts. While this method served me quite well for a few years, I eventually rediscovered the antitheses of intra- and extratextual annotation when it came time for qualifying exams. Intratextual annotation sacrifices the ability to categorize citations from multiple sources necessary for more comparative projects and classes. However detailed and graphically intuitive my PDF annotations might have been, they still relied on the document being opened to the page in which this content was embedded. Skimming through the pages of Absalom! Absalom! I could see acute instances of catachrestic, figural excess in Faulkner, but I could not see how these resembled or differed from the works of other American and Continental Modernist authors or how Faulkner fit into a larger historical picture. I could expound Freudian motifs in any given work, but I could not easily juxtapose these with the passages to which they were referring or other works making similar references. Some form of compromise was in order if I was to have any hope of writing coherently about the hundred or so texts on my lists. As devoted as I had been to the askesis of close reading, I realized that my workflow was not sufficient for this more comprehensive task (essentially, the task of thematization against which I had bridled for so long). Faced with the reality of academic evaluation and advancement I needed to thematize…or fail!

This was the birth of what I would later call the citation tree. After I had read and annotated the various PDFs on my list, I would copy-and-paste the citations from the PDFs into a categorical, outline scheme structured around the individual author and the specific work. Unlike the intratextual citations I had been doing in Acrobat, where all of the thematic associations could be listed together within the embedded annotation, the citation tree required that one of these themes be declared primary and all of the citations associated with that theme, subordinate.  I considered the idea of pasting the same citations under multiple categories, but found that this caused more trouble than it was worth (in much the same way that the shortcuts I used to bridge categories in the folder tree strained its overall functionality). The difficulty of manually maintaining and syncing multiple instances of citations made it easier to limit each instance to one thematic heading even though it compromised the more democratic representation of themes embedded in the PDF notes (where a more nuanced sense of the connections between thematic elements were grouped by the citation itself rather than the primary theme).

I was surprised to find that the flexibility and general clarity gained by making these difficult thematic choices overall outweighed (or at least mitigated) the perils of thematization. With a larger sampling of citations from a specific work – about 30 per text – the categories grew more nuanced than they had been in my previous class notes and essays. A larger number of thematic subdivisions applied to a larger sampling of citations allowed for a more concrete negotiation of the conceptual territories involved in each. The citation tree not only forced me to prioritize themes that would easily translate into exam essays, it also required me to justify the inclusion of passages that I felt were important despite their lack of exam relevance.

Transferring notes from inside the texts to an external outline was greatly facilitated by digital text, but it also revealed a number of problems. I must confess the citation tree never really bloomed. One of the simplest reasons for this was that, after a certain point the mass of text became too large to properly handle in a Word document. The number of possible branches (i.e. subcategories) was limited to the number of indentations that could be accommodated by the standard width of a page. PDF pages that appeared to be a perfect facsimile of the printed book might be riddled with errors in generated by the ClearScan OCR I used to convert the page scans into searchable, selectable text. Eventually, the spelling and grammar checking ability of Word gave out all together and the entire document grew too unwieldy even for me, let alone my exam committee. This all became quite clear when one of my advisers attempted to actually print a draft of the citation tree that would probably have required an actual trees worth of paper to complete. He informed me later that his son was the one to intervene and cancel the job, explaining to him that this was not the kind of document intended for print.

Copying these citations into the citation tree also meant effacing the contextual relevance of the intratextual annotations that tended to emphasize more discrete phrases. The tree made it possible to survey and recall far more material than before, but at the expense of the kind of emphasis that might be inferred from within the context of the page. In order to preserve some of this context I had to expand the citation range to the paragraph level or beyond. While I tried to preserve the emphasis of the previously highlighted phrases by putting them in boldface, this was a step backwards in several respects.

In short, neither mode of annotation could successfully model the complexity of themes without diminishing context. Faced with this material resistance, we must reconsider how to effectively communicate context to an audience removed from the graphic supplement of the page. This discrepancy between the broader citations made for dissemination (teaching, conferencing, discussion) and the more acute citations needed for rhetorical and linguistic analysis is one of the core infrastructural problems. If these are already present in my own relatively small project, how can we hope to accommodate networks of users working on multiple projects with varying styles of citation and annotation? What kind of infrastructure might model the concentricity (and eccentricity) of citations and annotations in a way that would enable many readers to interface with one another in a collective textual environment?


Category / Keyword / Personal Knowledgebase

The citation tree finally reached an impasse when the sheer volume of text overloaded Word’s spelling and grammar checking capacity. Because the OCR scanning method I was using to extract citations required frequent (and occasionally extensive) proofing, what might otherwise be seen as a minor bug actually signaled the systems failure. Without automatic proofing tools, the small margin of machine error introduced by OCR exceeded my ability to correct it. Pursuing the project meant severing the branches of the tree into separate documents (which would eventually need to be severed again as the amount of information grew) or working through all of the errors manually (which would virtually cancel out the amount of time saved by copy and pasting citations from OCRed text rather than typing them out myself).

This failure of the citation tree illuminated a symmetrical failure in the file tree. Both were unable to create thematic categories that were not exclusive – categories that would allow for kinds of growth and connectivity that were not strictly vertical.  I could only manage redundancy by allowing each reference or citation to appear once within their respective trees but, despite my attempts to maintain thematic clarity across the various branches, they refused to grow outward without also growing backward and inward upon each other in gnarled trans-thematic clusters.

The real question, I realized, was: how can I remember anything if I can only keep it in one place? Counterintuitive as it may sound, it is inherent in almost every mnemotechnical failure thus far. Limiting the number of locations in which texts are stored can be helpful when the mnemotechnology is written, printed or neuronal but, even then, any transfer of information between media must be seen as an act of mnemonic doubling. Although the number of locations that we can manage in the wetware of our minds is relatively low, it is difficult to deny that, within certain limits, we remember things better when they are linked to a variety of places. Much of our memory remains accessible through the iterated pathways between mnemonic places and, thus, it’s important not to think of “place” in an overly literal sense. After a certain point it is no longer productive to regard the process of storing and recalling information in the same way we regard the task of locating a printed book in physical space. The influence of physical space on our memory is considerable, but not absolute; a well-designed digital interface has the potential to radically reconfigure the bookshelves out of which it evolved.

If there is not a finite limit to the number of mnemonic places we can meaningfully query with the assistance of our technology, then the limit must lie in the efficiency and speed with which we navigate these places. Workflow delimits workspace even though the (non-finite) limits imposed by habit often suggest otherwise. This is to say that the space in which we work is ultimately as vast as the space in which we can imagine ourselves working – dependent on how far we can stretch the metaphor of space beyond the physical world through which we came to know it.

Once I grasped how both trees were blighted from the start because their media prevented them from growing together – I began looking for a program that might be able to store both references and the citations within them in a variety of locations without creating the redundancy and inefficiency I had encountered thus far. What I was looking for (though I did not realize it at the time) was a database or, better still, a knowledgebase.

OneNote was a promising option in that it allowed me to divide various branches of the citation tree into notebooks, sections and pages – separate documents on my hard drive that could be aggregated together without overloading the proofing tools. The cloud storage capacities of OneNote and its compatibility with Word were obviously appealing in their own right, but I was still hesitant to embrace OneNote because it seemed to sacrifice much of the layout design and prepublication capabilities of Word . Most importantly, however, OneNote lacks the ability to create and manage metadata that we find in a program like Evernote. But Evernote, too, has some rather damning limitations. It’s not quite a reference manager or a word processor so, when it comes to preparing documents for publication or print, we still end up having to transfer everything into a word processor. It does extend metadata to discrete pieces of text, but this is limited by poor PDF integration. Like many other citation and annotation programs, Evernote still allows citations to fall out of sync with their parent document.  The hassle of manually maintaining the connecting between the two, in my experience at least, promotes a superficial use of metadata.

I also tried out some of the more popular reference managers like Mendeley and Zotero, which allowed me to manage categories with multiple instances of the same work and offered the ability to automatically generate citations for each text (untangling the file tree and streamlining it with my word processor). I could even attach PDF files to the references in order to view and annotate them natively. I thought, at first, that I had found the workflow I had failed to achieve with either intratextual PDF annotation or the extratextual citation tree. The problem was that the metadata for both of these programs, even with the relevant updates and plugins, did not reach deeply enough into the discrete citations.

The problem of concentric citations returns here in a new shape. Before, I asserted the importance of delineating and annotating phrase/sentence-level citations from the paragraph/page-level citations that contain them. Here, we can see how much of the software fails to even distinguish discrete citations from entire works.  If we want our technology to help us thematize more effectively, we need a program built to handle these concentricities. Without such a program, even the most robust and interactive document cloud will only exacerbate our worst thematic tendencies – reifying them in an infrastructure that only enables us to tag each book by its cover.

While some combination of either of the aforementioned programs would have resolved some of the problems I was facing, I was generally unimpressed with the way they interfaced with PDF texts. I wanted more reciprocity between the intratextual PDF notes and the external tree outline but was struggling to find a program that could even link citations directly to a specific PDF page. I contacted Adobe about enhancing the annotation capacities of Acrobat and was quickly put in touch with Walter Chang, a principal scientist and member of their natural language and text analytics group, who explained how several of my interests overlapped with those being explored at Adobe.  Adobe was interested in mining semantic content on their document cloud, whereas I was trying to find the tools for a more modular classification system for personal use. I am grateful to Walter for being the first to encourage me to consider how these improvements might also fulfill the more universal needs of the information technology industry:  the possible synergy between human and machine markup and its implications for linked open data, the semantic web and machine learning that I will discuss later. Walter also advised me to seek out existing models for the kind of enhancements I was proposing. It was in doing so that I came across a beta version of Citavi 5, a program that was quite popular in European academic circles but relatively obscure in the US at the time.

Having used Citavi 5 for over a year I can say with some confidence that it exceeds the programs above because it is, at once, a database for references and the citations (or “knowledge items”) within them. What’s more: the link between “reference” and “knowledge” is preserved on the most fundamental level because each citation is directly anchored to specific lines of specific PDF pages. Citavi has enabled me to remember more by storing more pieces of information in more locations and, thus, to thematize more exactingly within and between works.

The shifts from written annotation of printed text to intratextual annotation of digital text to extratextual annotation of digital text that I have described above each took an exorbitant amount of time because they required a total organizational overhaul within and between documents. With Citavi, however, these large scale reconfigurations can be made precisely and rapidly because its advanced search and batch processing tools make it possible to modify the content or metadata for selections that are as vast as they are specific. Its purpose (qua knowledgebase) is the organization of texts rather than organization within texts.

Before I say any more about the various categories and keywords I’ve created in Citavi, I should clarify the extent to which they participate in the prevailing thematic practice of the university and the extent to which they might resist it. I freely admit that many of my  primary categories work on a level that is more or less equivalent to the familiar and problematic genres we find in the humanities. They are macrothematic: generalizing to the level of the author/work. The various subcategories and subthemes beneath them, however, grow increasingly microthematic: referring to specific citations within the works themselves. Keywords, as I’ve been using them thus far, represent the most microthematic layer of metadata since they help tag the content within each citation.

While I feel that microthematic tagging has the most potential to alter our thematic practices overall, this does not mean that macrothematic categories do not also have some transformative power or that they simply reproduce established genres (as if genre were something transparent and fixed that could easily be represented in hierarchical form). Many of the macrothematic categories I describe here are valuable insofar as they are personalized. In order for themes to contribute something meaningful to the knowledgebase they need to be personalized in such a way that they

  • convey the maximum amount of information with the minimum amount of redundancy
  • avoid placing too few texts in too many categories or too many into too few
  • aggregate less familiar sources more extensively than familiar sources

The result will always be something of a hodgepodge, but one that juxtaposes texts in a way that facilitates the kind of comparative analysis required to delineate knowledge along increasingly nuanced thematic lines. However fraught they may be, we might learn a great deal about the inherent structure of our minds by watching the asymmetry of such themes in motion within a knowledgebase.

You probably will not learn anything profound by looking at the macrothematic levels of my knowledgebase, but you will see the various proclivities and inconsistencies of my knowledge represented far more transparently and concisely than would have been possible with any previous mnemotechnology. At the macrothematic level, the value of themes lies more in the ideology they expose than the ‘truth’ they reveal. A knowledgebase like Citavi, if used collectively, might enable us to see that our teachers, colleagues and students are not necessarily or exclusively the thinkers they appear to be within the context of a lecture, conference or final paper. This is not to say that we should all grant each other access to the deeper recesses of our personal knowledgebases or that, if we did, we would necessarily find some more complex and authentic mind hidden behind the contingencies – not even that we can think anything at all without some degree of contingency – it is only to suggest that whatever contingencies might pervade our organizational structures could be rendered more visible by this kind of mnemotechnology than by any that has come before. Perhaps, if academic institutions were charged with the maddening task of maintaining clarity and consensus on this macrothematic level within a collective knowledgebase, we might all be better equipped to distinguish more exigent classification problems from contingent pseudoproblems (i.e. problems that are exposed as artifacts of a  thematic machinery incapable of administering a good self-diagnostic).

Let us turn now to the revised categorical system that evolved out of the citation and file trees.  Rather than restricting all of my texts and citations to one branch of a unified hierarchy I now sort them into three or more positions across three branches.

Authors

The ability to manage multiple instances in multiple hierarchies means that texts and citations no longer need to be subordinated primarily to themes simply because the medium can only sustain one text per branch. I can preserve the author-work-theme hierarchy of the citation tree alongside any other thematic categories I might devise. PDFs with multiple authors can be assigned to multiple branches and individual citations by each author can be distributed accordingly. After reading and extracting citations from a work, I now review and tag them en masse, creating sub-themes specific to the sampling. If the material is fairly focused, I might use the individual chapter headings. If its subject matter is wide-ranging (or my purposes for using it are varied) I might create these subthemes myself. I typically do this after I have finished reading the work in its entirety because the process of reviewing and categorizing each citation after reading has proven to be a particularly stimulating mnemonic exercise.  I know that the passages I have marked will eventually be revisited when I have more context, so the first read through functions more as a survey of the terrain than anything else (a strategy which has proven particularly effective for systematic  treatises and encyclopedic novels).

Projects

The ‘project‘ branch allows me to sort texts for more immediate and occasional ends without disrupting the fundamental organizational structure of the other branches or depleting them of content. Essays and classes that adhere closely to a particular author or theme could probably be managed fairly easily within the ‘author’ branch, but I’ve found that organizing a narrower sampling of texts and citations according to the specific needs of a project is especially useful in instances when the essay I’m writing or the class I’m taking/teaching spans several authors, time periods or themes. Without their own dedicated branch these projects would muddle many of the distinctions between various works and citations that I would otherwise like to maintain. If I happen to form categories that have a greater relevance beyond the context of the project, I can easily graft them back onto the author branch. I currently have separate project branches for all of the classes I have taught so far, qualifying exams, the backlog remaining from the citation tree, extra-curricular reading groups, personal reading lists and even this very essay.

The ‘project’ branch also leaves room for me to experiment with different strategies of organization in a non-destructive manner. One example of this is when I wanted to approach Cormac McCarthy’s Blood Meridian from a comparative standpoint – focusing on the conventions of the Western genre for a composition class that I was teaching. This was quite different from how I might view it in my own writing and research. Citavi not only enabled me to create two distinct thematic hierarchies an overlapping selection of citations, it also allowed be to weave together quotations from the novels and films with screenshots from the latter in the hierarchy devoted to the class. Being able to juxtapose speech, text and image made it much easier to explore elements of mise-en-scène that were difficult to efficiently cite alongside the primary texts or film scripts.

Themes

The theme branch most closely resembles the folder tree I originally created to accommodate the influx of downloaded PDFs. Unsurprisingly, it is the most fraught of the three branches. Unlike the others, I do not incorporate individual citations here – only works themselves. It is purely macrothematic.

If I had to say whether the works of thinkers like Lucretius or Walter Benjamin belonged to “literature” or “philosophy,” I would have to concede that the answer is problematic enough to justify assigning them to both – not because it is impossible to make this distinction in every case, but because the act of actually going through each case (compiled as they are within anthologies and collected works) is simply not worth the time. It’s also easier for me to create a  ‘modernity / postmodernity’ category  than to try to differentiate modernism from postmodernism in strictly historical terms (especially since I have already entered the original publication dates for each reference and can quickly sort texts chronologically within and across categories) or to distinguish the literature of these genres from the criticism thereupon (since so many modernist and postmodernist works are regarded as such insofar as they blur the line between original production and critical reception).

The asymmetrical relationship between “theory,” “criticism” and “critical theory” shows how counterintuitive thematic denomination and correlation can be. Criticism proves to be a critical category as de Man (and Mallarmé) once observed. Indeed, it is in ‘criticism’ – the deceptively inconspicuous vestige of a former system – that de Man would read the ironic allegory of the system’s undoing. The problems it poses within my knowledgebase reflect those that concern our institutional infrastructure more broadly. Even after the subfolders of the original ‘assorted’ folder are reborn as themes, ‘criticism’ remains intransigent as if undead – a revenant.

When, after all, does a work become ‘critical’? For many, the term ‘criticism’ might stand for any number of hermeneutic approaches, but I have already assimilated these to more specific themes. After all of the critical modalities are spoken for and all of the assorted texts get sorted, is there still a place for criticism? Is its primary mnemotechnical function to catch the spillover from other categories? Can it outlive its usefulness? Should this be seen as a problem or a solution?

Eventually, I decided to use it more as a keyword than anything else – something to filter works by an author from works about an author. This at least enables me to add secondary texts to the ‘author’ categories because I know that I can use the ‘criticism’ tag to distinguish them from the primary texts. After all, ‘criticism’ also tends to suggest the subordinate relation of ‘critical’ to ‘primary’ texts. While this is the most pragmatic use of ‘criticism’ I have come up with so far, it is still problematic. Do critical works involve sustained attention to specific works or might they be directed towards more abstract areas of interest (e.g. cultural studies)? How sustained? How specific? How general can the critical object become before it nullifies any pretense of critical evaluation? These questions might be answered provisionally for many texts, but are especially difficult when it comes to those more parasitic, deconstructive works that threaten to usurp their hosts. The works of Derrida, de Man, Benjamin, Deleuze, etc., put criticism in crisis on an informational level since designating them as both ‘criticism’ and ‘theory,’ ‘philosophy’ or ‘literature’ risks undermining the sorting function of ‘criticism’ qua tag. I’m certain that all of the aforementioned authors would be tickled by this; after all, these were the categories they strove so valiantly to dismantle. But is there really no way to preserve the distinction between the text doing the reading and the text being read (even if this requires mapping these shifting relations more precisely than ever before)?

I have considered converting the category of criticism into an actual keyword (rather than a category that functions like a keyword), but soon saw that this would be little more than a half-measure. Really the only functional difference between categories and keywords is that categories allow for subordination while keywords do not – so it’s not like anything would be gained by doing so. There’s also the argument that this additional layer of metadata should not be wasted on the works themselves, but reserved exclusively for the discrete citations within them. While neither categories nor keywords will fully resolve the problem, I still prefer to see this anti-category of ‘criticism’ as a placeholder for another kind of metadata entirely: the bidirectional (one-to-many) link proposed by Vannevar Bush and elaborated by Ted Nelson. This kind of link would enable us to join any citation to any other instance of that citation in such a way that all texts would be tied to each of their descendants and antecedents in a vast web encompassing the remotest references and the farthest reaches of recorded history. But for this to happen we would at least need the kind of infrastructure that a program like Citavi might, eventually, sustain. For the time being then, criticism maintains its odd spectral role within the knowledgebase – a kind of negative categorical function that can no longer be what it was and cannot yet be what it might.


Keywords

I’ve acknowledged numerous times that much of this categorical structuring is traditional (i.e. macrothematic) in its attention to entire works. The real beauty of Citavi is its ability to extend metadata to the microthematic level of the citations themselves. So far, the work-specific themes in the author branch of the database have pushed furthest in this direction but there are still limits to the precision of categories. In order to further anatomize each citation, it is helpful to discard the hierarchical structure of categories altogether in favor of keyword tags. While categories can be made to function like keywords (e.g. ‘criticism’), this blurs the line between the two kinds of metadata and eventually prevents them from functioning optimally. Categories are best suited for macrothematic grouping and the subordination of multiple pieces of information. Keywords, by contrast, are most adept at enumerating the properties of a particular knowledge item in a list-like fashion.

The difference between categories and keywords is much less pronounced with a sentence-length citation than it is with one that spans several pages. The usefulness of keywords increases with the length of the citation. As I discovered during my qualifying exams, longer citations, while they provide more context for a wider variety of occasions, risk generalizing the meaning of the categories under which they are grouped. This is why I tried to preserve some of the sentence-level emphases in the citation tree using boldface text (as I continue to do in Citavi). Highlighting, underlining, circling, boldfacing and marginal annotation can all be seen as more primitive forms of microthematic tagging. Their greatest advantage is their immanent visibility, but this is limited to the outermost layer of the interface. Regardless of whether the interface is a page or a screen, there are only so many visual marks one can add to a passage without muddling the distinctions entirely. Both of my previous attempts at intratextual and extratextual annotation were limited in their microthematic potential because of the visual and spatial economy of the interface. Their most visible layer still needed to remain brief to reduce the likelihood of it being skimmed or skipped.

Perennially inundated with information, our minds are conditioned to pay more heed to macrothematic generalities than microthematic nuances. The potential of keywords lies in their ability to register this nuance without necessarily relying on the outermost layer of the interface. Obviously they must become visible in order for us to read them, but they do not compete with the text of the citation for graphical real-estate. They form an invisible layer that can be searched, sorted and displayed in any number of ways. This means that we no longer need to waste time and space inscribing or digging up microthematic content. In Citavi, at least, deep metadata can be layered beneath the facsimile of the printed page in a way that allows for microthematic annotation to become increasingly visible and central in years to come.

I must admit that when I first began experimenting with keyword tagging I did not fully appreciate its usefulness. Unlike the categorical structure I had already developed in the citation tree, I was building this keyword lexicon from scratch. As I continued to tag a variety of passages from different authors and genres, however, I found myself refining it along similar lines as the themes branch. In order to avoid redundancy, I began aggregating synonymous (and often antonymous) keywords into clusters. Eventually I was able to tag each new passage closer to the rate at which I read it (thanks largely to the autocomplete functionality of Citavi which only required me to type the first few letters of each keyword or cluster). The increased fluidity of this workflow promoted deeper and more extensive tagging. More significantly, the repetitive act of reading while tagging eventually embedded the keywords in my mind in a way that, I believe, has fundamentally altered the way that I read. This was at least as powerful as the technique of generating text-specific themes from general samplings of citations and, uncoincidentally, the two procedures have become almost inextricable in my current workflow; exploring the breadth of themes present within a sampling of citations with keywords reveals the coarser-grained themes best suited for categories.

It’s quite difficult to describe the sensation I had after doing this kind of tagging for any length of time. Obviously, it’s easy to work ourselves into a state performing any kind of scholarly or repetitive task for hours on end. I had certainly experienced more than enough of this while building the citation tree. But I remember noticing a distinctly different sensation once I incorporated keywords into my workflow – as if my mind were being siphoned out into some new lateral dimension. Perhaps because, cognitively, this extreme form of technologically assisted, microthematic tagging is at the farthest remove from the more dominant mode of macrothematic reading that seeks to cherry-pick passages for the immediate task at hand. The latter approach lacks the random access memory to even imagine reference-specific categories, let alone an entire lexicon of keywords. With the prosthetic memory of a knowledgebase, however, the stress of retention diminishes, leaving the mind free to explore less hierarchical forms of linkage.

I’ve even found that this kind of deep tagging is a particularly effective way to bootstrap myself out of writer’s block. Often, sitting down to write this dissertation, I am so overwhelmed by all of the possible sequences and combinations of ideas that I find myself utterly nonplused. But even taking a half hour to tag citations for some relevant text can restore my clarity of purpose and writerly inertia. Perhaps this is because our ability to subordinate ideas in order to construct a narrative suffers when this perceived need for sequentiality begins to restrict our ability to explore the breadth of all possible microthematic associations. This is to suggest that the vertical, linear, hierarchical and sequential dimensions of our imagination might actually be primed by this more lateral, ad hoc, nebulous kind of thinking rather than distracted by it.

Rather than trying to come up with perfectly phrased categories or keywords, I create groups that juxtapose closely related elements the most relevant of which, in the context of the passage, is easy to infer. I’ve found that the ‘/’ works particularly well as a rough and ready symbol of thematic juxtaposition. But clustering is another partial solution to the problem of marking reciprocal relations between increasingly specific levels of metadata. Like the hybrid, tag-like function of the ‘criticism’ category, it too must be seen as a placeholder for a more advanced form of metadata – for the kind of intertextual links that might directly join works and citations without relying on categories or keywords as intermediaries.

What if there were a way to cluster keywords that also granted us the capacity to apply one or more of the clustered keywords to concentric selections within a general citation? This would offer an interesting alternative to hierarchical categorization insofar as the selection of a specific term from a cluster of related terms would not necessarily subordinate one term to the other. Rather than hierarchy, we might end up with something more like density – something capable of registering linguistic intensity within a web or cloudlike structure.

It is true that metaphors of knowledge as a web and cloud have already been reified by our search engines and social media platforms. But rather than seeing this reification as a sign of alienation, many scholars have argued that it has “democratized” the formerly hierarchical structure of information. Might keyword density and intensity eventually replace thematic hierarchies in academia as well? Would webs and clouds model the mnemotechnology of our minds more accurately than vertical axes of power relations? Would this kind of thinking alienate us further from our work or would it make collaboration between humans and machines less alienating? For now, the keywords work more or less like hashtags, but I continue to cluster them hoping that they might one day evolve into a form of metadata capable of describing linguistic information more precisely, in our own words, but in a manner that is endemic to the semantic web and not alienated from it.

The ways in which Citavi has enabled me to hone in on the limitations and possibilities of microthematic tagging by observing the friction between categories and keywords is what really opens it to the advent of a digital future in the humanities. It makes it possible for all of us to intervene materially in the mnemotechnology of thematization. In speculating about the ways in which keyword clustering might facilitate a more social text, however, I am already drifting beyond what is currently achievable with Citavi or any other scholarly mnemotechnology.

Utopia / Living Text / Collective Knowledgebase

While describing Citavi to fellow digital humanists, I’m often met with skepticism regarding some of its more significant limitations: Where’s the cloud storage? Where’s the Mac version? Where’s the app? Legitimate questions – especially since Zotero and Mendeley both tout the cross-platform, web integration and mobile support that Citavi lacks. How, then, can I extol the values of live linkage for a program that does not even link us automatically through the web?

As we begin to negotiate the transition from the personal knowledgebase to the web we must proceed with caution. Too much connectivity too early can limit the efficacy and popularity of a mnemotechnology by alienating some of the scholars integral to its responsible design. To a great extent, digital text has still failed to replace or revolutionize the time-honored technology of the page.  There are similar resistances at play in the move from print to digital texts as there are in the move from personal to collective knowledgebases. I have attempted to keep these at the forefront of this narrative while arguing that the personal knowledgebase is, at once, the technology most faithful to the book and most open to the possibility of a collective mind.

The previous section began with the question: how can we remember anything if we can only keep it in one place? There, it was a question of how metadata enhances our memory by allowing for various instances of the same knowledge object to be stored in multiple places. Here, it is a question of where all of this knowledge can be stored and how it can be shared without sacrificing its freedom of form.

What if, after uploading our knowledgebases to the cloud, we found that we were suddenly unable to view our citations in their entirety because of a mandatory update unilaterally imposed? If the discontinuation of features no longer deemed necessary from a corporate standpoint erodes our creative control over the collective knowledgebase, then thematization will have prevailed once more in an even more insidious form. The microthematic precision of the personal knowledgebase must, therefore, be retained until the web (and the laws governing it) are supple enough to host the output of each individual mind.

The reality is that knowledge is only as good as its infrastructure and that the kind of infrastructure best-suited for scholarship is not necessarily the best for the Web (qua economic entity). Enhancing our collective scholarly memory means dismantling the macrothematic machinery of our institution with the help of the microthematic tagging that is now possible within the personal knowledgebase. Insofar as it promotes greater precision and understanding, this is a profoundly unprofitable exercise to which advertising is diametrically opposed. The latter succeeds by generalizing discrete user metadata into common trends and funneling this interest toward commodities. Behind the scenes, advertising pioneers mnemotechnologies that might greatly enhance our collective knowledgebase but, at the end of the day, it remains a science of forgetting. The commodity is as valuable as its technology is obscure. In this regard, it is an object of anti-knowledge from which we stand to learn a great deal if we can only reverse engineer its technologies of seduction into technologies of discernment.

While I acknowledge the value of cloud and mobile integration for digital scholarship in years to come, I must maintain that this is something worth doing right – with the proper infrastructure – if we are to do it at all. As I have stressed from the beginning, the point is to keep as much programming power as possible in the hands of the knowledge workers who are actually using it. Even if we are not writing the code ourselves, we can at least articulate and evaluate features that promote deeper metadata and a more robust infrastructure for our collective knowledge. Before we begin automating the conversion of our metadata, we must understand which categories, keywords and links map onto which so that we can participate in this conversion. It is not that this remapping of idiosyncratic metadata for more general web applications won’t exert any influence over our personal knowledgebases, more that the separation of personal and web knowledgebases allows us to resist pressures of standardization and shelters us from the questions of intellectual property that would otherwise quash our potential to experiment with the unrestricted power of deep tagging. The practice of translating our own metadata to the languages of the web would also teach us a great deal about the limitations of our thinking and the limitations of the technologies through which they are reified.

Of course the idiosyncrasies must, necessarily, be standardized and optimized for dissemination across the web, but we cannot allow this informational bottleneck to fundamentally determine our own infrastructures. Each individual knowledgebase must remain a laboratory in which to experiment with levels of linkage and tagging not yet sustainable across the web. Despite, or, rather, because of its lack of cloud and mobile integration, Citavi is better suited for multi-user experimentation on copywritten material because it has the potential to aggregate users’ knowledgebases across semi-private networks in a peer-to-peer fashion and, thus, circumvent the policing of information on cloud servers – a strategy that continues to confound media conglomerates that are far more lucrative and powerful than the academic presses that will inevitably try to restrict this flow of information.

Only after experimenting on smaller, semi-private networks will we attain the sheer volume of data necessary to program and troubleshoot a more global information infrastructure capable of resisting the macrothematic pressures of the knowledge industry. While such academic social networks already exist (e.g. academia.edu) they do not interface with each other or with personal knowledgebases. The recent history of social media makes it clear that we cannot trust any major site to maintain privacy and accessibility parameters that are in the best interest of all users. Networks of any size are too valuable to resist the influence of the market and have tended to evolve in a way that best suits the monetization of user metadata (qua advertising) rather than responsible design from an intellectual, infrastructural standpoint. The real strategic value of the personal knowledgebase is that it ensures our total control over and freedom of experimentation with deep metadata and extends this to every digital text we have in our archive.  If we first learn to sync metadata between personal knowledgebases that are immune to the policy changes of social media sites and the jurisdiction of copyright law, then we can be sure our knowledge will remain sovereign and that we have the leverage and savoir-faire to negotiate structural changes in years to come.

Responsible informational design requires that we all ‘own our masters’ so to speak – especially in the early stages of the network when we are most vulnerable to decisions that might inherently divide us based on privacy and access rights dictated by the global information marketplace. Eventually, we might settle upon some generalized conversion protocol that retains the nuances of microthematic tagging. Here, Google may be one of the most promising patrons given its willingness to take on massive projects of dubious legality and its skill in mitigating the inevitable repercussions after the fact (e.g. Google Books, Google Earth). But this is not to say that each of us should simply render our knowledgebases unto Google, only that Google is perhaps the tech giant most capable of fighting and arbitrating the battles over intellectual property that will inevitably ensue when massive amounts of curated content from knowledgebases across the globe begin to aggregate on the web. Even to get a majority of published work hosted on the web in a way that retains a live linkage with networked knowledgebases would be a remarkable achievement – one that will require all the help we can get.

Suffice it to say that the worldwide web, which might appear to be the most obvious form of live linkage is, in many ways, the most convoluted and problematic. In order to keep the links between our minds alive we must not relinquish them to the web without first fortifying them through the trials of responsible design. As I continue to speculate about the possibilities of the Web and the Archive writ large, I will attempt to show how they depend, fundamentally, on some rather practical changes to the technologies of drafting, pedagogy and review.

Drafting

While it’s possible to draft an essay entirely within Citavi, I must confess that I still prefer OneNote for its cloud connectivity, haptic interface and superior aesthetics. Despite its lack of deep metadata and layout control, OneNote smoothly syncs and compiles many pieces of text rather than saving them separately as documents. It currently serves as an archive for all of my professional and personal writing and contains notes and drafts (both typed and scanned) as far back as my undergraduate years. It is also the program in which I am currently writing this very document. It does not yet offer a Citavi plug-in like Word, but a simple code makes it easy to restore all of the references for each citation. Even with the plug-in, the information used in a word document does not automatically sync with Citavi; the link is broken once we import our knowledge objects into Word, orphaning any changes made afterwards. More significantly, the side-pane display of the add-on is quite cumbersome and prevents the fluid transition between database, draft and layout.

What if there were a single, continuous link between each stage of academic work so that the kind of thinking proper to each stage were never lost in the shift between documents and programs? What if OneNote could function as a bridge between the knowledgebase and a more publication-oriented software? What if the changes I made while finalizing this document in Word synced automatically with this current draft in OneNote and also with the relevant citations and annotations in Citavi?

As nice as it would be to have these corporate powers working collectively to enhance the structures of our knowledge, the prospect is, admittedly, farfetched.  The market discourages the open exchange of “trade secrets” necessary for the kind of software suite I’m describing. The tools developed by for-profit companies will likely surpass those of open-source developers in power and ease of use. Citavi, a private company, was first to achieve full PDF integration and deep metadata, not Zotero, its open-source competitor. More important than the individual programs or companies actually uniting, however, is compatibility and standardization. The software suite does not even need to combine Citavi, Word, OneNote or any of the other programs as long as it brings together the most necessary functions of each. Microsoft, Apple, Google and Linux could each have their own academic software suites, as long as there were some reasonably effective means for exchanging metadata without sacrificing its depth.

By fusing together the media of scholarly production I believe that we can greatly enhance our individual memories and see that the fruits of our intellectual labor are registered more accurately and efficiently in the ever-expanding archive of our culture. How much more might we remember if we could transcribe our thinking in the heat of the readerly moment? How might we join the acts of envisioning and revising our work into a more continuous and collaborative process in such a way that the earliest organizational stages (which are too often carried out in isolation) are immanently linked with the collective process of revision? How do we design a workflow in which every vision is already a form of revision? Wouldn’t collaboration be more likely and productive if

  • the links between the origins and end products of our intellectual labor were rendered fluid and transparent?
  • we could communicate with one another across the margins of these texts?
  • we could effectively represent the discrete nodes of our common interest immanently within them?

Under the macrothematic pressures of our current infrastructure, these nodes are often truncated and strewn, elliptically, throughout independent, published documents. With the advent of the personal knowledgebase, however, we no longer need to rely on the macrothematic generalities of publication in order to discover common interest. Communication prior to publication on the level of knowledge organization has never been more feasible. Using citations as primary organizational units we can now pursue closer, more collective analyses of increasingly singular texts. And if we can communicate microthematically without having to subordinate citations to themes, then why bother doing so at all? While this might read like a rhetorical question for many of us, the best part is that we do not even need to make such a unilateral decision at all because the metadata is flexible enough to incorporate traditional and progressive infrastructures alike.

A truly (re)visionary workflow would not only collapse the boundaries between documents, it would fundamentally reconfigure the history of each document. The temporal distinctions between drafts would ultimately give way to an infinite process of drafting in which a living record of every letter typed, erased or otherwise modified could be resurrected. Imagine watching the fits and starts of the draft as it evolves in real time, pausing to reread and reformulate itself within and between other texts. Is such insight really necessary or would it be better to have more intuitive and powerful tools for comparing distinct drafts? Here, too, we needn’t decide just yet, especially since the former technology would likely be an extension of the latter. Drafts may well be the preferred nodal points of the revision process, but this doesn’t mean that deeper ways of visualizing the writing process are not worth exploring in their own right (see idea animation below).

In a post-Snowden age, many will recoil at the idea preserving our writing process with such fidelity. We shudder collectively at the thought of what might happen if all those text messages we decided not to send somehow reached their would-be recipients. These fears are justifiable but, perhaps, a little misplaced. I remember speaking briefly with N. Katherine Hayles about this after her Wellek lecture on writing and extinction. While discussing the political consequences of digitization she made a remark about the relative freedom and privacy of pre-digital writing compared with the oppressive visibility of writing in a digital age. I was curious to see whether she really thought that digital text was more oppressive or if this was a claim being explored in the context of the novel she was reading, but she politely turned the question back upon me – mentioning NSA surveillance as a prime example of the oppression of digital text. If such an eminent historian and theorist of technology fears for the freedom of digital text, then clearly it’s more than popular paranoia.

While I would never deny that the possibilities of surveillance are greater now than they were before digital text, I question whether digital text is unequivocally oppressive. Certainly, we have plenty of reason to suspect that nothing we enter into a networked device falls outside the purview of the international surveillance state. We do not even know for sure whether we have to hit ‘send’ for our writing to appear before unknown eyes. But is this a good enough reason not to explore the depths of visibility (i.e. surveillance) in an academic context? If we were all to revert to pre-digital technologies, would our minds really be liberated or would they be even more oppressed by the knowledge infrastructures that these technologies reify?

Freedom and oppression look quite different when viewed from the mnemotechnical perspective we have been pursuing thus far. Pre-digital text may be relatively free from surveillance, but it is also quite powerless to contest the reality of surveillance. The kind of collective knowledgebase that digital text makes possible, while it might resemble the NSA in the depth of its metadata and power of its search (i.e. surveillance) algorithms, actually has some chance of generating meaningful resistance to the negative influence of surveillance within and beyond the text. A collective knowledgebase is, perhaps, our only real hope of understanding the extent to which our privacy is compromised, coordinating the efforts of those capable of bringing about real political change and enabling the vast majority of us who are baffled by the complexity of these issues to follow these efforts closely enough to overcome our own politico-educational impotence. Ironically, our collective ignorance of what surveillance really means and how it harms us might best be overcome by repurposing these very surveillance technologies for more educational ends – using the knowledgebase we know to decrypt the knowledgebase we don’t. Rather than browsing the web with a vague technophobic paranoia, we could potentially learn what some of the thousands of leaked classified documents actually mean for our freedom and collaborate with those trying to do something about it.

Even with the heightened power of political resistance afforded by such enhanced knowledge infrastructure, restoring the kind of privacy we had in the pre-digital era still seems improbable. It may be that privacy is already, irrevocably and profoundly lost. My point is that it is still far more probable with digital technology than it is without it. But just because we can never be absolutely certain that hackers and governmental organizations cannot see everything we do, does not mean that we do not still have substantial control over the extent to which the general public might access our thoughts.

As personal knowledgebases congeal into larger and larger networks, the ethics of visibility will be of upmost concern. The possibility of encountering other minds within a collective text implies access to their private texts and the metadata that joins them. Varying levels of access could be defined by each user in such a way that more conservative scholars could retain privacy up till the point of publication and still take advantage of the collaborative potential afforded by whatever level of metadata and linkage they see fit. Apprehensive users could test the waters first, experimenting with more traditional macrothematic tagging or sharing their draft materials with a close circle of colleagues. As long as we can all get used to working within the same or similar knowledgebases, our individual privacy settings need not limit the overall transparency or effectiveness of the system.  What’s essential is that we at least have the option to share the unfolding of our ideas even if not everyone chooses to share in kind. As long as we do not alienate scholars by forcing them to surrender their privacy, the kind of work made possible by those who do should be advertisement enough.

The tensions within and between ideas are often more visible during the drafting process before their rough-hewn edges are smoothed into a publishable argument. If our work were less heavily invested in polishing away these resistances and we had a chance to encounter other minds on a conceptual plane that was not inherently defined by the pressures of publication, would it not be easier to carry out more critical and microthematic discussions of the texts at hand? And, if not, wouldn’t the traces of antipathy between thinkers citing the same texts provide invaluable data for those seeking to design a mnemotechnology that promotes a less thematic form of collaboration? While such encounters would not, in themselves, constitute some “giant leap” for our collective mind, they would, at the very least, provide one of the most powerful diagnostic tools for the mnemotechnical crisis – allowing academics and programmers alike to see, materially, where and how potential connections are being missed. Even if a majority of these connections were hostile, petty and dismissive – bent on preserving the tendencies of our current macrothematic mode of academic production – scholars more open to the (re)visionary workflow would, nevertheless, be able to explore collaborative possibilities that were previously foreclosed by a more univocal regime of publication and copyright.

What new forms of collaboration might be possible if the initial organizational vision together with the embedded history of its continual revision were folded into the “final” publication? Everyone’s text could be woven into what new media theorist, George Landow, calls “borderless text” – a web of knowledge that, at last, fulfills the true educational potential of the internet.  If every cited text were joined, microthematically, to the text citing it:

  • Any instance of an author’s name could recall all of his/her available works and all of the works in which these works (or his name) are referenced
  • Any instance of a work could recall all of the works in which this work is referenced (down to the specific phrase being quoted wherever direct quotation is made)
  • Any instance of a word could be linked with its definition in any available dictionary
  • Any key words or phrases could be cross correlated with any other instances of such words and phrases
  • Any translation, version, or draft of a work could be cross-correlated and compared interlinearly or side-by-side

With a greatest possible interfacing of digital libraries across scholarly communities, we might imagine a readerly workflow in which scholars can literally work within and between the texts of Cervantes’ Don Quixote – all its versions, translations, criticisms, translations of criticisms, criticisms of translations of criticisms etc.  Obviously such a total library would easily become unmanageable without the proper search and filtering criteria, but we should not automatically assume that it would necessarily become a library of Babel. With a powerful enough algorithm, we would be able to filter the critical reception spatiotemporally (by the historical and geographic origins and proliferations of citations), methodologically (by academic/institutional affiliation) or, perhaps, by an adaptive blend of parameters that weighs our recent movements through the library against the entire history of our itinerary.

Such a knowledgebase will not begin on a global scale. It will begin with many micro-insurgencies brought about by personal knowledgebases – thousands of individuals and groups digitizing texts and weaving them autonomously. This weaving cannot properly begin until there is so much “pirated” intellectual property in circulation that it becomes impossible for publishing houses to effectively litigate against insurgents. After all, once a document is digitized it is nearly impossible to control its dissemination – to track those who possess it and bill or fine them accordingly. Once such a critical mass of free text has been reached our institution will need to find some other means of sustaining itself or collapse entirely. If we’re willing to surrender our univocal model of intellectual property, we might begin to profit from something that is virtually unpiratable because of its scale and the rapidity of its growth. We might make a new living off of live subscriptions to global networks of digital scholarship and the vast tributaries of metadata they contain. We may yet construct a world in which knowledge can be liberated without requiring scholars to work for free.

If this is starting to sound too Quixotic or Borgesian, I must insist that even a more modest infrastructure would enable us at least to:

  • See what our peers are saying about the knowledge objects and citations most relevant to us
  • Carry out more productive and nuanced discussions without having to read essay or book-length arguments in their entirety.
  • Trace the emergence of each other’s ideas directly from the citations on which they are grounded and follow them through the various stages of organization to the completed draft
  • Generate and refine citation-specific metadata collectively.

Even with this, book-length arguments could be written into the interstices of the texts referenced and cited, thus, allowing us to inscribe our ideas more gravely and materially in the archive than ever before. But for this to happen we must take great care in the way we choose to visualize these interstices – we must insist that the conversations are animated in such a way that they are no longer marginalized.

Pedagogy

While the web has not yet superseded the brick and mortar university, this does not mean that it will not – or should not – especially if we ourselves are the ones negotiating the transition. Breaking down the spatial and temporal boundaries between students and teachers in the classroom would be a vital step towards developing the interfacing and filtering capacities necessary for making this kind of living text feasible on a larger scale. With the kind of infrastructure I’ve been describing, teachers and students could continuously interact from within the course text itself, regardless of whether they were participating in-person or online, within the dedicated term of study or without. Collective annotation, analysis, discussion and review could all be immanently and vitally linked. Rather than writing essays, students could write focused responses to specific citations and then discuss and review them immanently within the text – defining their views in relation to one another, the teacher and the scholarly community at large.

The labor of teaching could be divided much more synergistically. Upper level writing teachers, rather than being torn between the need to address higher level conceptual and structural issues and the compulsion to correct glaring grammatical mistakes, could assign remedial grammar modules in which students would be required to review and apply the rules they were violating. Computer algorithms might alleviate the difficulty of diagnosing and remedying the grammatical weaknesses in a way that could even free up the dedicated grammar teachers and speech therapists for more focused one-on-one time with the students who fail to make progress through the modules (or those who pay extra for it). The final stage of such remedial language training might even require students to proofread other students’ essays with a sufficient degree of accuracy and thoroughness. Such work needn’t remain strictly remedial either. Additional levels of mastery could be pursued for curricular credit or compensation. The ideal, in other words, is that students who initially need to pay extra to remediate themselves can eventually be compensated for remediating others.

The student-teacher boundary may grow increasingly amorphous with such a knowledge infrastructure, with the best students being promoted to more pedagogical roles within their given areas of expertise. As far as grading and evaluation is concerned, this would enable a greater degree of transparency and quality control throughout the review process. We might even develop a reciprocal system of double-blind peer review that pools writing from different institutions offering similar classes and requires students to review their reviewers without knowing whether they are students or teachers. They would just be anonymous voices in the citational cloud surrounding the piece of writing under review. This would not only help with grade norming; it would also help assess the quality of student vs. teacher feedback. This system could, mutatis mutandis, be employed for increasingly higher levels of evaluation and accreditation. It might, however, challenge the fixity and generality of tenure as we know it, replacing it with a more dynamic and democratic process grounded on our manifest competence in more precise areas of expertise.

Applying the core citational infrastructure to the classroom would allow for great advancements in eLearning insofar as each iteration of a class would become the database for the next and each instructor, the curator of this database.  Certain sections of collective text could be animated to voice recording so as to restore a degree of human narrativity to the proliferation of written commentary. This would make lecturing less repetitive and more modular – freeing up time for educators to develop new materials as they would no longer have to invest so much time in rehashing the old. I do not mean to discount the value of in-person discussion and the ethos of ‘relatability’ I discussed earlier. I’m merely suggesting that some lectures might, over time, be animated in a way that would adequately reproduce the revisionary junctures arrived at in the most critical discussions, condensing them into a format that could be viewed in a fraction of the time it would take to audit all of these classes in real time. This would be a powerful extension of the educational model of the “flipped classroom,” in which lectures are uploaded and assigned before class in order to take advantage of class time for live discussion.

At present, many digital lectures are poorly recorded, one-off videos.  Understandably, they fail to create a more dynamic interplay between audio and visual components since this is a skill few academics have mastered. Editing, animating and producing high-quality video content, it might be argued, is an art in itself. But if the future of textual knowledge requires a higher degree of animation, is in not worth reconsidering these deficiencies on a curricular level? In any case, we would not need extensive training in audio-visual design if there were software capable of automating this process for the least technologically inclined among us. Several existing applications can produce modestly impressive results (e.g. PowerPoint, Keynote, Prezi), but they are not yet adequate for the kind of pedagogical work I have in mind.

We need a more intuitive software for animating our ideas that sustains a reciprocal linkage between knowledge objects and the living text of ongoing commentary. Live video locks audio and visual elements into a linear temporality that can be quite challenging to splice and reconfigure in an appealing way. With animation, however, the audio and visual elements are inherently divided, which grants us more free play when it comes to representing ideas through text and images. Certainly animation, in itself, is equally if not more involved than editing video, but I believe that the kind of animation that most of us would need to animate our ideas and lessons is far easier to automate, more visually appealing and more modular than what we might be able to achieve with live video.

With enough automation, creating and curating a digital class could almost be as easy as diagramming a lecture on paper or illustrating the resulting conversation on a whiteboard. Both of these activities are rudimentary forms of mind mapping: a visual strategy of knowledge organization that does not necessarily follow the vertical or horizontal sequentiality of a written text or outline. Mind maps tend to take the shape of webs and clouds which, as I’ve mentioned, are shapes that resist the macrothematic pressures of a more conventional outline without necessarily compromising its mnemotechnical functionality because they are able to approximate hierarchical relations through other graphical means (e.g. relative size or centrality of ideas).  The question, then, is how to open our lesson plans and whiteboards to a knowledgebase and transform them into something that no longer gets discarded and erased each term.

Numerous whiteboard animation technologies exist, but they are rather sad approximations of what they might be. They trivialize the power of animation with photorealistic hands that illustrate what we type as a cursive font alongside stock images that are little better than clip art. What we really need is more precise control over the temporality of illustration vs. narration. Balancing the timing of voice and illustration before a live class is no simple task. It often requires extensive abbreviation and a steady rhythm of contributions between the teacher and the students. If we simply record the screen of our computer as we handwrite or type out the text of the lecture while trying to speak it, we’ll soon discover how unnatural it sounds when we try to keep the two processes in sync. If we are trying to condense and compile these live interactions as animations, then we need additional control over the speed at which text appears in relation to what is spoken. The overall aesthetic of the visual content would also benefit from some form of smoothing control so that the gaps between the appearance of typed letters is less jarring and the lineaments of words written with a stylus appear smoother and more calligraphic. Most importantly, we need this process to be automated in such a way that we can simply delete words and elements without having to re-record them so that, for instance, we can correct words we’ve typed or handwritten in the middle of a line and the animation would play as if we had written it that way in the first place, (rather than animating it being corrected after the fact). With even these modest forms of automation, we could begin to explore new modes of textuality that are better able to capture the inherent rhythm of commentary and discussion. Our readings of any given citation could, essentially, be brought to life through an animation directly linked to the source text within the knowledgebase.

While the relatively small file sizes of text documents makes it particularly well-suited for this kind of exploration, there’s no reason that a similar infrastructure might not allow us, eventually, to begin animating visual studies, music and film scholarship as well. Websites like YouTube, Soundcloud and Genius have already begun to explore the possibilities of microthematic tagging within individual media files. Programs such as Dedoose, nVivo and Scalar have also placed such controls more directly in the hands of scholars working on individual and collective projects. So, while there will, of course, be additional levels of difficulty involved in incorporating increasingly complex media formats, it’s not impossible to foresee a collective knowledgebase that weaves together text, image, sound and video seamlessly. This, however, would require a far more robust system of multimedia document review than we currently have at our disposal.

Review

Microsoft has developed one of the more popular modes of proofing and peer review, but there are still significant improvements to be made if we are to keep the link between the various versions of documents alive. Neither OneDrive nor rival cloud technologies like Google Drive or iCloud have really addressed this problem head on. We need a more effective means of visualizing the review process – one that retains all comments and editorial changes without making them too distracting to the overall reading experience.

This would require a more intelligent means of filtering substantive changes from typographical errors. We still rely on OCR because it allows us to integrate texts currently under copyright into our knowledgebases even though correcting the errors introduced by this process can be a rather time consuming endeavor. The problem is that, when it comes to cross-referencing instances of words, phrases or citations across personal knowledgebases, a single misread character can corrupt what might have been a very important linkage, since it’s impossible to guarantee that all users are using the exact same copy of a given text. The interface would, thus, need to be able to index with a certain degree of play – notifying the readers of ‘near matches’ which could then be reviewed personally.

It’s possible, however, to distribute this task across the entire community of scholars. For this, we would need a textual analytic program that would query the web and other networked knowledgebases and automatically detect errors based on near-matches in our own version. From here we could inspect these matches in detail or accept them en masse. Our input in this process could then be averaged into the statistical norm for the particular version of the text we are using. In doing so, we would effectively be working collaboratively and algorithmically with both humans and machines to minimize time wasted proofing machine error. If a text were popular enough and the network, large enough, all such errors might even be resolved before we even had a chance to participate. After uploading the text file to our database we would be prompted with all of the corrections appropriate for our version and, as long as we had a facsimile PDF, we could always cross-check the image against the embedded text for good measure. Such a tool would also allow us to remain “on the same page” despite having different versions of the text and may even reveal errors that the most skilled copyeditors failed to catch before publication.

A certain degree of play in the algorithm would also allow the metadata we generate for one citation to connect with every other ‘near match’ while flagging every minor variant as a potential error. Major variations, of the sort that appear across the folios and quartos of Shakespeare or, for that matter, the modern and Elizabethan English versions of his works, would have to be tagged manually. But this is the very kind of work best-suited for a collective knowledgebase since it allows decisions made by a specialized group to become immanently visible to the public and, hence, open to evaluation.

The good news is that we do not need to build these tools from scratch. Millions of websites have been using this kind of crowd-sourced proofing technology for years. reCAPTCHA, which we probably recognize as a security measure used to differentiate human users from bots, serves the dual purpose of correcting suspect phrases in digitized text. Jerome McGann et al. have spent decades designing tools for collating and annotating different versions of drafts in the BlakeRossetti and NINES archives at UVA as have Greg Crane et al. in the Perseus Digital Library at Tufts. It may even be possible to create a plugin for McGann’s Juxta program that could work across networks of personal knowledgebases.

Peer-review, while it overlaps with proofing in many cases, will also require its own set of tools.  How many conversations is it possible to represent within a document? If we’re talking about balloon comments of the sort we find in Word and Google Docs – not many. While comment balloons may suffice for our current institutional practice, they will become increasingly inadequate in an environment in which citations are immanently woven together throughout a document. Even a handful of users leaving detailed comments on the same text begin to crowd out the interface. Comment balloons have to stretch further and further above or below their target reference, making it difficult to see, in a glance, what is referring to what. The way they are stacked in a single, marginal column is inherently and mnemotechnically constricting. More problematically still, they only allow for precise linking between the primary text and the comments. There can be no critical conversation between comments because the only way of linking these is as ‘replies’ nested under the parent comment. There is no way for a comment to cite a specific part of another comment in reference to a specific citation from the document under discussion. This prohibits the kind of triangulation between citations necessary for productive critical conversation. Inline comments are even more problematic because they break up the actual flow of the commented text and admit even less room for multiple user entries. While they may be adequate for most proofing tasks, where there is more of a consensus regarding errors or variants, the more substantive commenting needed for peer review renders them impractical. A far better option for proofing and peer review would be the ability to toggle the comments between paratextual and intratextual display so that we could expand comments from marginal notes to primary texts as needed. This is more or less what Ted Nelson was advocating when he proposed expandable links as an alternative to the unidirectional variety that dominates the web today. Such scalability will be essential as the entire history of review both within and between documents becomes part of a general texture of citation – when drafting becomes a truly (re)visionary process and no longer needs to be reconstructed from independent documents both online and on personal hard drives.

If we could scale up the margins and/or scale down each critical node, we could begin to experiment with far more nuanced, cloud-like representations of scholarly conversation. If every comment could be collapsed to a phrase or abstract or, perhaps, authors’ names visually coded to represent the basic content of the commentary, then we might be surprised to find how much commentary we can track in the margins of our page and the peripherals of our minds. On the most macrothematic level, we would be able to see the text we’re working on condensed as the central node of this citational cloud. This cloud would also hover in the margins of the particular text, repopulating from page to page based on the metadata embedded there. We could narrow the range of reference further still, by highlighting specific passages, phrases or even words so that we might see whether a particular line of a text has ever been cited directly or indirectly by another author.  This all sounds rather miraculous, but condensing and clustering citational relationships in this way should be seen as the telos of the human-generated, microthematic tagging I’ve described in the previous section.

Obviously not all texts can be tagged quite so deeply and painstakingly by human knowledge workers. But even those untouched by human minds could still be represented in the cloud using an automated content mining algorithm. This would lack the nuance of human tagging, but would prevent underrepresented texts from falling off our collective mind map. What’s more, they could be flagged accordingly – as areas needing attention. Ideally, the citation cloud would provide the dialectical image of the critical history of a text. It would be capable of directing our energies towards the most relevant and least worked reaches of the archive – in a way that intrinsically stimulates our collective memory by exposing relevant links between established and unknown areas.

Perhaps the greatest potential for collaboration in human and machine intelligence lies in translation. An interface of this sort could also promote relevant connections between languages using a form of hybrid human and machine translation. Machine translation may stumble over many of the technical nuances but, so long as the human metadata were translated with reasonable accuracy, we could still communicate microthematically with scholars around the world. It is important not to forget that the efficacy of machine translation will improve with more human input and correction. The relationship is reciprocal – machine algorithms directing us to citations we know we should read even if, initially, we haven’t the faintest idea how to read them, then learning from our subsequent attempts to translate and understand them. This would easily be one of the most powerful pedagogical tools ever conceived – a web of language in which we teach machines to teach us to teach each other to teach them etc. Imagine being able to peruse every known translation intratextually, paratextually or interlineally with the history of commentary ever unfurling beyond the margins.

The richness and depth of this collective metadata would also be the perfect soil for the neural networks currently striving to transcend the limit between human and machine intelligence once and for all. Neural nets have already surpassed human intelligence in games like Chess, Go and Jeopardy by analyzing thousands of instances of us playing these games. Natural language acquisition is the final frontier. What better place to learn the nuance of language than in a knowledgebase of such microthematic precision?

Even a relatively “soft” artificial intelligence might save us the effort of citing anything at all. We might simply begin typing a phrase and, after reaching a certain statistical threshold a similarity, be prompted with a drop-down, auto-complete menu enabling us to select however much of the passage we’d like to directly quote or indirectly link. The anti-thematic implications of such an augmented intelligence are especially profound when we consider the possibility that we might not even be consciously trying to cite another work when we find ourselves solicited in this way. A vague constellation of key words and phrases might present us with passages we had read, forgotten, and were in the process of re-membering as our own – perhaps even passages that we had never read, but really ought to have read before presuming to (re)invent them independently. This capacity of a machine algorithm to interrupt and challenge the originality and thematic generality of our work is where the mnemotechnology I’ve been describing thus far appears most amenable to de Man’s deconstructive praxis and closest to his unsettling vision of an “implacable” “text machine.” By inhumanly juxtaposing all the text we find original and unique with all the text in which similar tropes turn up, this machine would provide a vital safeguard against our all too human fantasies of inclusivity and exhaustiveness – helping us to remember more by forcing us to perpetually deconstruct everything we thought we knew.