Hello Developers of SuttaCentral,
I hope this message finds you well. My name is Lianghao Lu, and I am a metadata editor at Atla, formerly the American Theological Library Association. I am posting here at the recommendation of Bhante Sujato to ask about the possibility of setting up OAI-PMH, the Open Archives Initiative Protocol for Metadata Harvesting, for SuttaCentral.
Atla is currently developing a new project called Eureka: https://www.atla.com/eureka/. Eureka aims to enhance the accessibility and discoverability of valuable content in religion and theology. It is intended to become a centralized hub for unique and often hard-to-find resources sought by researchers, practitioners, instructors, students, and others. It will serve as an information search and retrieval platform designed specifically for theology and religious studies. Therefore, it would be wonderful if Eureka could incorporate SuttaCentral’s resources and promote them to the wider academic community.
I noticed that SuttaCentral has already developed a comprehensive API, and we hope that SuttaCentral might consider building an additional OAI-PMH layer on top of its existing API. Specifically, we are interested in harvesting metadata for SuttaCentral’s published English translation texts, rather than full text.
The OAI-PMH implementation could support metadataPrefix=oai_dc, with records limited to published English translations. Each record could use an identifier pattern such as oai:suttacentral.net:{uid}.en.{author_uid} and include title, translator name and UID, SuttaCentral UID, canonical hierarchy, root language, translation language, publication title, first-published date, license, and the public SuttaCentral URL. Including subject headings or topical keywords, where available, would be especially helpful for improving discovery in theological and religious studies research platforms such as Eureka.
Useful OAI sets might include lang:en, type:translation, pitaka:vinaya, pitaka:sutta, collection sets such as collection:dn, collection:mn, and collection:sn, and creator sets such as creator:sujato or creator:brahmali.
For each record, the primary public URL should be the SuttaCentral website URL, for example, https://suttacentral.net/pli-tv-bu-vb-pc1/en/brahmali , rather than a GitHub source URL. GitHub or API paths could remain internal source data, but the harvested dc:identifier should direct users to the public SuttaCentral page. Full text does not need to be exposed through OAI-PMH. We only need lightweight metadata records with stable links to the corresponding public SuttaCentral pages.
Thank you very much for considering this possibility.