Sanskrit
The Sanskrit textual corpus used in BuddhaNexus was obtained from the Gƶttingen Register of Electronic Texts in Indian Languages (GRETIL), Georg-August-UniversitƤt Gƶttingen), the Digital Sanskrit Buddhist Canon (DSBC, University of the West), some files from SuttaCentral (SC), and a couple of files obtained from individual researchers.
Due to the huge amount of material, some texts from the original databases have been omitted (cumulative pÄda indexes and duplicate texts from the same source). Moreover, there has been no attempt by BuddhaNexus to improve the quality of the texts (e.g. removing typos, introducing identical conventions, and the like). Some minor changes have, nonetheless, been made for the sake of standardization. In order to make the matching process feasible, some markup information of the original files has been neglected.
The Buddhist Sanskrit files are structured in accordance with the organizational scheme of the Tibetan Buddhist Canon, whereas āBuddhist Scripturesā corresponds to the Kangyur and āBuddhist Non-Scripturesā to the Tengyur. For the non-Buddhist material taken over from GRETIL, the data structure has been slightly altered. The folders 4_rellit/buddh and 6_sastra/3_phil/buddh/ have been removed and their content divided under āBuddhist Scripturesā and āBuddhist Non-Scriptures.ā Moreover, a numbering scheme was introduced to simplify the access to the different GRETIL categories. For example, "GV01" represents GRETIL Veda 1.
For the calculation of the Sanskrit matches and for the global search function, a stemming algorithm has been used. The stemming algorithm is accessible as a standalone application.
The minimum possible length for a match has been set to 25 characters.
Background image: Courtesy of the Nepal-German Manuscript Preservation Project (NGMPP).