The Spread of Searchable Digitized Versions of Chinese Materials and the Study of Pre-modern Chinese History
Atsushi Aoki

Recently, the environment surrounding the study of pre-modern Chinese history in Japan has undergone two revolutionary changes. First, consequent upon a series of nation-wide reforms, universities in Japan have come to place greater emphasis on administrative and educational activities, at the cost of significantly reducing the amount of time that faculty members in humanities can spend on their research activities. This change may turn out to be a heavy body blow, which may lower the quality of research in the humanities and social sciences in Japan in the long run, depriving the academic community of China studies in Japan of its long-held advantages over those in other countries. Putting my laziness aside, most of my time is devoted for seven to nine classes a week, and have practically no time left for our own research activities. However, in the perception of the government, with the backing of public opinion, this is the ideal way to manage universities, and we are at the statefs beck and call, without having much say in its decisions. Indeed, sacrifices are unavoidable when one is carrying out a grevolution.h

           The other revolution – which is much brighter, and the main theme of this essay – concerns the recent advances in information technology, which have dramatically accelerated the digitization of books in Chinese classics.

           The pioneering effort that opened the doorway to digitization of Chinese books was the Scripta Sinica (repositories of digitalized Chinese literature) compiled by Academia Sinica in Taiwan. Initially, in the 1990s an MS-DOS-supported full-text searchable database of the Twenty-Five Dynastic Histories of China was put on the market. Subsequently it placed the database on the web, making them publicly accessible in an unrestricted manner (causing anger among universities which had purchased the database, who felt that they had wasted their money). The Scripta Sinica has since grown richer in content, as other collections of Chinese books, including local histories of Taiwan, have been put on-line one after the other.

           Aside from these digitized databases out of Taiwan, various books in Chinese have also been published on CD-ROM, primarily from China. Perhaps most stunning to the researchers studying pre-modern Chinese history was the publication of the full-text searchable electronic version of Ssûkfu Chfüanshu (Complete Collectanea of Four Treasuries), or ge-SKQSh for short. Although it was very expensive when it was first put on the market, it is now available at a price of about one million yen. Other large-sized digitized collections in CD-ROM form already available on the market include: Ssûpu Tsfungkfan (Collection of Books in Four Categories), Volume on the Sui-Tfang and Five Dynasties of the Chungkuo Litai Chipen Tienchikfu (Collection of Fundamental Classic Books of Chinafs Successive Dynasties), and the Chungkuo Ming-Chfing shi Tangfan Wenhsien Kuangpfankfu (Archival Materials of Chinafs Ming-Chfing History on CD-ROM). The Chungkuo Litai Chipen Tienchikfu, which is now being developed and marketed on a subscription basis, is expected, upon completion, to be much greater in size and contents than the e-SKQS, and it is rumored that the entire collection will be sold at a price of around 10 million JPY.

           Aside from these databases, which are accessible free of charge on the web, there are a number of databases that are not yet open to the public for free, including the Ming-shih-lu (Records of the Ming Dynasty) compiled by Academia Sinica. There are also a number of databases in China and Japan, as well as in Taiwan, that have been privately developed by individual researchers for use by small circles of friends. In my own field of specialty, there is a great likelihood that historical documents such as the Chfing-Yüan Tfiaofa Shulei (Legal Documents of the Chfing-Yuan Periods in the Southern Sung) will be printed with the addition of punctuation and textual notes, and will later be scanned and digitized. Projects to digitize various historical documents are now underway at the Institute for Research in Humanities, Kyoto University, and there are strong expectations that the Institute will put the digitized databases on-line.

           Now, the purpose of this essay is not to present an accurate overall report on the progress made by various projects for developing digitized databases of Chinese books, but rather to comment on the possible advantages such digitized databases will bring to historical science, and precautions that must be taken in their use. It should be pointed out at the outset that I am not necessarily gin the knowh about this theme, and also that any report on the current situation in the computer-related world, where change is measured in gdog years,h can become obsolete in just a year or two.

           The fields of study of Chinese history that are drawing the greatest benefits from the growth of digitized historical documents seem to be the history of pre-Sung China and Taiwanese history in the Chfing era. In particular, Chinese historical materials pertinent to the history of Taiwan in the Chfing era have been so extensively digitized that depending on onefs research theme, it will soon become possible to gather nearly all basic materials on the web. Much the same is also true for the study of history of pre-Sung China, and in particular the history of China in the Tong Dynasty and earlier periods, because virtually all the extant materials from these periods that are available in book form, excluding most of the excavated wood and bamboo inscriptions, have been digitized, and are available on the market at a price that is well within the budget range of university libraries and other institutions. Especially noteworthy is the fact that texts of poetry and prose that previously tended to escape the attention of historians, and documents like the Tsfefu Yüankui, which were rather difficult to use because of lack in its handful index, can be readily searched from the e-SKQS using a single search operation. Thus the advantages offered by digitized databases can be invaluable.

           With regard to the study of history in the pre-Wei Jin Nanbeichao period, there is no denying that an increasing number of researchers are becoming aware that there are limits to what they can accomplish relying solely on printed historical materials, and are making greater use of excavated bamboo documents. But it is not clear how extensively these excavated historical materials will be digitized and put on the web. Most of the important sources are still kept beyond the existing bamboo curtain even in photos or printed form.

           Turning to the history of the Sung Dynasty, I remember having once heard a distinguished scholar assert, gIt is possible for a scholar to thoroughly go over the extant historical materials from the Sung Dynasty at least twice during his or her lifetime.h Given this assertion, the fact that a large number of cumbersome collections of the Southern Sung Dynasty are now readily searchable by means of the e-SKQS is a great boon to historians studying this period. Cross-referencing among basic classic works such as the Hsü Tzûchih Tfongchien Chfangpien (Collected Data for the Continuation of the Comprehensive Mirror for Aid in Government), the Yühai (Ocean of Jade), and the Chfünshu Kfaosuo (Exploration in Numerous Books), which constitutes an indispensable basis for a study of institutional history, has to be carried out manually like blue-collar work; the digitization of these texts, therefore, will help save researchers much tedious work. Even Sudô Yoshiyuki, who completed a staggering amount of work on the basis of meticulous and extensive analyses of the extant historical documents of the Sung Dynasty, is reputed to have been less than thoroughgoing in his exploration of a part of historical texts of poetry and prose; however, it may become possible for us to make up for the remaining gaps in his studies by availing ourselves of computers and digitized texts. I would like to add, moreover, that it would be of great help to digitize, and make public on the web, documents such as the Sung Huiyao (Essentials of the Sung dynasty history) and the Mingkung Shup'an Ch'ingming Chi (The Enlightened Judgments). However in fact, considerable amounts of these texts have already been digitized, but are not yet publicly accessible. Nonetheless, the rapid progress made in Shanghai Libraryfs ongoing project to reproduce the images of well-preserved rare books of the Sung and other periods on CD-ROM is at least helping to improve these documentsf accessibility for perusal within the library.

           Turning to local histories, those of the Sung-Yüan period are readily searchable as they are included in the e-SKQS, but local prefectural histories of the Ming and later periods are mostly inaccessible in digital form. In fact, however, this is not much of a cause for concern. For one thing, unlike obscure annalistic historical materials or leishu (Chinese encyclopedias or reference books consisting of extracts from sources extant at the time of compilation), which were edited according to ill-defined editorial policies, local histories, by nature, have contents that make for relatively easy and accurate gguesstimates.h In addition, given the fact that people appearing in local histories of the Ming era housed in Tfienfikê have already been indexed, and especially that an index of people of the Sung era appearing in such local histories of the Ming era is available, it is somehow possible for us to quickly find our way to a needed passage. Moreover, although this is my own trade secret, I would like to reveal a fine strategem that sometimes proves useful, which takes advantage of the fact that comprehensive provincial gazetteers of Chekiang, Fukien, Kiangsi, and other provinces of the Chfing era are contained in the e-SKQS. If one finds some hits in the e-SKQS, then there is a fair chance that one will find old records contained in local provincial and prefectural gazetteers compiled during the Ming-Chfing period. The next step is to choose some names of people who appear in these old records, and search for them in the above-mentioned index of people appearing in local histories of the Ming era. Following this procedure, I often find, in a short amount of time, descriptions close to, or similar to, the original forms in local gazetteers compiled in the Ming era. 

           By contrast, documents of the Ming era and later have only been digitized to a rather limited extent. Only small numbers of authentic records, archives, and historical records have been digitized, and very few of the already digitized versions of these records are publicly accessible. However, it certainly is only a matter of time before these documents will become readily accessible in digitized form. I will be thrilled to see how the infrastructure for research will be improved by the year 2024, when I will celebrate my sixtieth birthday.

           Now, there must be some researchers who feel that it is somehow heretical to use digitized texts of Chinese classics in the foregoing manner. Even I myself sense that the gtrade secreth I revealed earlier has an element of heresy, because some documents may be missed if they are neither indexed by name nor electronically contained in databases. They can only be found by low-tech searching yet. Researchers who are skeptical about the use of digitized texts may wonder whether we can conduct high-quality research by gchoosing wordsh consisting of three to four characters – the ideal word length for searches that does not produce a number of hits neither too large or too small – and searching through a digitized database of classical texts for the occurrence of those words. It used to be common practice to visit the Seikadô Library in Tokyo day after day, making mountains of notes. The library posesses not a few books that were published mainly in the Sung era and after and are found nowhere else in the world. In the course of doing so, we read many documents at random, pored through others, and by doing so made new discoveries. Isnft that the gproperh way of doing research? I think I can understand this skepticism. However, if people criticize electronic searches using this kind of reasoning, I have to wonder if they think itfs really possible to write a paper after doing nothing more than an electronic search. Actually, there are not many instances where the argument of a paper depends exclusively on a handful of search results. An electronic search is like consulting a dictionary, an index, or a table of contents. It is no more than a time-saving device. It goes without saying that the basic process of writing a paper is not much different from that in the pre-electronic days.

           Classrooms have undergone sea changes as well. Even if I were to instruct the students in my seminar classes not to use databases, few would comply. I therefore make it a policy to encourage them to make active use of digitized databases, and I believe that it is only natural for students today in Chinese history to prepare an outline for a seminar presentation by copying some texts of classical Chinese writing from digitized databases into the outline using clicks of the mouse. I require my students to thoroughly compare all the different versions of the same text accessible on campus, and then to make their presentations. Students who fail to do an electronic search, stating that they gcould not find any other case in which the same word occursh are strongly admonished. The other day, when we were reading an unpunctuated Chinese text written by an author of the Yuan period, we came across a passage that read: gc wu pu pi pei shihchfên chih yen.c.h(無不畢備世臣之言) The reporter, who was a senior, ended the sentence with gc wu pu pi pei h. At that juncture, a junior raised an objection, asking, gShouldnft we take epi peif as a transitive verb and eshihchfên chih yenf as an object, and interpret this part as meaning eEveryone lets the subjectfs words become fully preparedf?h Then another senior, who is always careful to thoroughly prepare for classes, immediately intervened and stated: gI did a search with the word using ewu pu pi peif, and found that in most of the instances in which the word occurred, the sentence ends with epi peif. I believe the reporterfs interpretation is correct.h Yes, indeed, in this passage, it is correct to end the sentence with gpi pei.h This episode reveals that even undergraduate students, by browsing through gigabyte-sized databases, can sharpen their gperceptivenessh for words to a level that they would not have been able to attain without going through years of training. (It should be easy for readers to do a single search and instantly identify the text that was used in my seminar.)

Today, in this period of transition, what have we lost and what have we gained? Ironically enough, at universities in Japan today, we have much less time than ever before to absorb ourselves in historical materials. Furthermore, students of Chinese history, who have become addicted to electronic searches since their school days, may lose some of their perceptiveness toward books in Chinese. First of all, sitting in front of a screen is bad for the eyes, and we have to be careful about this. At the same time, however, there is no denying that there are enormous merits, inconceivable in the past, to using searches to make repeated cross-sectional comparisons among results from various historical documents, including memorials to the throne, books on political and economic institutional arrangements, and encyclopedias. For example, when we conduct a search using a certain financial term, we might discern a pattern peculiar to the theory of government finance at that certain period. By examining sentences with certain collocations of administrative terms, we may begin to notice that a certain term occurs in a fixed pattern. By looking through a huge amount of related historical documents in close comparison with each other, we may begin to see the mode of thinking of people at a certain time in bold outline. In looking up vocabulary in areas I am not familiar with, I have lately fallen into the habit of first making a glarge-scale observationh using an electronic search, before searching and reading existing studies. This glarge-scale observationh makes existing studies easier to read and understand. Electronic searches also have the advantage of being able to demonstrate, in just a few minutes, the fact that the occurrence of a certain word in a certain sense dates from a certain period, or increased or decreased significantly from a certain period. For example, it might that the word gchiensungh(litigiousness) was never used prior to the Sung Dynasty, and that the appearance of General Hupo (Ma Yüan, the Han dynastyfs conqueror of barbarians) significantly decreased beginning from the Sung Dynasty. 

           @@@@@ Thus, it is not a question of whether it is good or bad to make use of electronic search. We never have enough time. Timesaving devices have been developed and we can make use of them. In any case, we are living in such a time.(I would invite interested readers to pay a visit to my website of research sources: http://t_links.at.infoseek.co.jp/)