Data standards

Through a survey of the fields, this page provides information about data standards used in Trismegistos Texts. For a more general survey of the database structure, click here. For a survey of the criteria used to determine what constitutes an individual record identified by a unique, stable Trismegistos number, click here).

Traditional identifiers

Although the TM number is increasingly used as an identifier, its use in a non-digital or human-readable environment is not yet very widespread. According to the discipline, scholars traditionally rely on three types of identifiers: publications, inventory numbers, and names.


Publications of texts are stored in a separate related table, connected to the main table by means of the TM number. Because of the great diversity of standards between the different disciplines and even within each discipline, this table follows a very eclectic set of rules.

Very often these publication references take the form of sigla, short abbreviations for publication series, followed by numbers identifying volume (optional) and text, e.g. P. Oxy. 19 2232 bis
SEG 8 664 a
In many cases we have opted to provide the full title followed by the most common abbreviation, e.g. Rix, Etruskische Texte [ET] p. 149-168 no. AS 1.336
Monumenta Linguae Messapicae [MLM] 1 Ad
For a tool trying to cope with the variation of publication sigla used in Greek and Latin papyrology and epigraphy, click here.

Other publications, not taking the form of sigla, normally take the following form: JHS 57 (1937), p. 30-32 no. 6
Studies Quaegebeur 1 (OLA 84) p. 441-454
But there may be exceptions and ideosyncracies ...

TM has separate fields in the publications database for the date of publication and the editor (i.e. the person responsible for reading the text), but these are not always shown.

Because the bibliographic abbreviations do not follow strict rules, it would be ideal if each reference would point to a bibliographic database with the non-abbreviated form. TM has only done this systematically for Demotic, with links to the Demotistische Literaturübersicht, and to some extent for Aramaic and Hieratic (in the TM Bibliography). For other scripts and languages, TM looks forward to linking to other bibliographies in the future, such as the Bibliographie Papyrologique or Arachne.


The numbers assigned to objects with texts in museums or collections are stored in a separated database, connected on the one hand to the main table by means of the TM number, and other hand to Trismegistos Collections.

The following are examples of collection information: Vienna, Kunsthistorisches Museum dem. 6052
Berlin, Ägyptisches Museum S. 85 Abth. VIII Nr. 8
Oxford, Private collection Crum number unknown
This information has three components: the collection, a specification of the numbering system (optional), and the number itself.

The first part of the information, with normally the (English) name of the city or town, and the (local) name of the collection, is pulled from TM Collections. In this database of collections of ancient texts, each collection has its own numeric id, e.g. 357 for Vienna, Kunsthistorisches Museum ( A full list of collections is accessible here, and TM encourages its use by partner databases and would be happy to add missing collections at simple request. Both public and private collections are included, provided there is sufficient information for their identification. Preservation information about texts that are not in collections (e.g. those in situ or 'lost') is given in a separate field ('inventory_temp'), which is joined with the compound field for the online edition.

The second (optional) part specifies which numbering system is used. Some large collections are split up in subcollections for the different areas and culture they cover, and in some museum a wide variety of numbering systems is in use (or has been used in the past). TM tries to cover most of these, but obviously there is scope for improvement. For each museum the information on number systems is listed also here.

The third element is the number itself. Although almost invariably a number (or numbers) constitute the core, there are many possible alternatives including full stops, comma's, 'a', 'bis' etc., which may make searching and sorting problematic.


Although names such as 'the Rosetta stone' or 'the Book of Armagh' are very common in everyday use, they are not always easy to process in a database. TM has opted to include information of this kind in the non-standardized field for collection and inventory ('inventory_temp'), which makes names searchable online.

Information about the writing process

Trismegistos contains information about some material aspects of the text and the surface on which it was written. A distinction is made between the material itself, the form of the material, the tool used, and the possible reuse for another text.


All kinds of material could be used to write upon, but some were obviously more common than others. The following list is not exhaustive, but provides the main categories: bone
More specific information, e.g. about the type of stone or the type of wood is added following the main category, e.g. 'stone: limestone' or 'wood: tamarisk'.

Trismegistos is aware that standards to describe material are developing and is contemplating their use in a future consolidation period.


There is no real standardized vocabulary to describe the form a writing material takes, and developing one would imply consultation of all disciplines and fields involved. This seems a daunting task, which may explain why Trismegistos is not currently attempting to actually standardize this information, although attempts have been made in the past, leading to compound entries such as 'architecture: door' or 'statue: naophoros'.


The tool with which the text was written is not systematically specified in Trismegistos. The field was originally created to provide information about whether an Egyptian 'rush' was used or a Greek 'reed', but with the expansion to epigraphy 'chisel' became a third possibility and the occaional inclusion of numismatic and sigillographic sources led to 'mint' and 'mould'.


In principle, a single Trismegistos number (TM_id) is assigned to 'multiple (sub)texts written on what was in antiquity a single writing surface, unless there are good reasons to believe that the only (and unintended) relation between the texts is the writing surface itself'. This implies that if a writing surface was reused as 'old paper' or otherwise recycled for an unrelated text, two TM-id's are assigned, and Trismegistos then connects these two (or more) records in the 'reuse' section.

The connection between the texts is normally specified by expressions such as: blank side reused, new text is:
reuse of blank side, old text is:
blank space reused, new text is:
reuse of blank space, old text is:
palimpsest old, new text is:
palimpsest new, old text is:
tomos synkollesimos, other texts are:
In many cases, however, TM uses a generic 'another text on papyrus is:' or in cases of complicated reuse the procedure is explained in a free text field.

Information about the contents

Apart from language and script, Trismegistos also contains general information about the type of text with occasionally more details.

Language and script

Although language and script are two separate aspects of a text, they are currently not systematically split up in Trismegistos. For the time being, TM for most languages and scripts assumes that they form an organic whole, in the sense that Greek language is normally written in the Greek (alphabetical) script, while Demotic is the script which is normally combined with the stage of the Egyptian language commonly called Demotic. Only for languages with are not more or less stably associated with a script is the script itself normally indicated. The following scripts and languages are currently attested in Trismegistos: Abnormal Hieratic
Brittonic (with indication of the script)
Coptic (often with indication of the dialect)
Gaulish (with indication of the script)
Goidelic (with indication of the script)
Italic (language group; with indication of the script)
Meroitic cursive
Meroitic glyphs
Middle Persian
Norse (with indication of the script)
North Picene
Old Coptic
Old Nubian
Old South-Arabian
Pseudo-Hieroglyphic (i.e. imitation of script)
Pseudo-Runic (i.e. imitation of Runic)

When more than one language or script are used in a single TM-record, the text is 'multilingual' and the various languages or scripts used are enumerated separated by slashes, e.g. multilingual: Greek / Coptic

If an entire text or a longer passage in a text is written in a non-standard combination of script and language, this is indicated by the prefix 'anomalous: ' followed by a description of the combination, e.g. anomalous: Greek written in Latin characters


The field 'type' describes the genre the texts belongs to and its contents. This is without doubt the least standardized field in the database, and it is normally currently not shown in the online version. An exception is the DAHT database, where the information is somewhat more standardized. Nevertheless much works remains to be done, and this will reauire much interdisciplinary cooperation.

Chronological and geographical information

Two essential items of metadata are date and provenance, each stored in separate databases.


In Trismegistos, a separate database stores the dates on which the text was written (not the date of the original in the case of copies!). Each date is linked to the main database by means of the TM-id, and this relational system allows us to have e.g. multiple possible precise dates attached to a single text (e.g. when a year 4 can refer to multiple kings or emperors), or multiple dates attached to sections of the document (e.g. one for the manuscript itself and the other the glosses dating to two centuries later). As a rule, Trismegistos does not currently use this system to implement different dates suggested by different authors, e.g. AD 100-199 by X and AD 300-399 by Y.

For database-historic reasons, Trismegistos converts 'second century AD' to AD 100-199 rather than the more correct AD 101-200. For 'middle second century' AD 125-175 is used, for 'early' AD 100-125 and for 'late' AD 175-199. Of course other projects use 30 years for early, and others still 33. Trismegistos will be happy to adapt to any standards that develop, but there are no clear signs that any are forthcoming.

The dates are based on multiple, mostly numeric fields, which are converted to something more humanly readable. The use of figures is potentially confusing, since you lose some of the vagueness which is in the words, but has many advantages for searching and sorting. For BC dates we use negative numbers, e.g. -99 to -1 for the first century BC (note that this century because of the nature of our system is one year shorter than the other centuries). The fields are the following: y1 : the earliest year of the range
y2 : the latest year of the range
m1 : id. but month
d1 : id. but day
uncertain: 0 or 1, to express uncertainty
extra: precisions such as 'before', 'after', 'shortly before', 'not earlier than', etc.
If there is an exact date, the same figure is used in y1 and y2 (and if necessary also in m1 and m2 and d1 and d2): e.g. '25 January 435 BC' is y1 = -435, m1 = 1, d1 = 25
y2 = -435, m2 = 1, d2 = 25;
'between 25 Jan 435 BC and 27 February 434 BC' is y1 = -435, m1 = 1, d1 = 25
y2 = -434, m2 = 2, d2 = 27
Trismegistos always tends to date texts, e.g. when a Demotic text is undated and no indication whatsoever is provided by the editor, -699 is filled out in y1 and 499 in y2.


Information about where a text was written or found is also stored in a separate database (GEOTEX), with records that are on the one hand linked to the text database by the TM number, and on the other hand with the Trismegistos Places database. The latter is a geographical database designed to be used for ancient texts, and is currently being mapped with databases such as Pleiades and Geonames. Each site has its own numeric id, e.g. 332 for Egypt, 00 - Arsinoites (Fayum) ( Again this was originally limited to Egypt, but is now being expanded.
The related database structure allows easy differentiation between where a text was 'found' or 'written', and for letters sometimes also its 'destination' is added. Trismegistos is currently also including the current whereabouts ('preserved') in this database, provided the text is not kept in a collection (for which see above).
A few exceptions notwithstanding, Trismegistos does not want to provide information about the provenance of a document on a level below the settlement as a whole. This implies that GEOTEX records are not linked with more specific GEO records, even if these are available.
If the precise provenance of a document is not known, it is ascribed to a modern country, a region or a provincia as a whole, depending on the information available.