Thesaurus Linguarum Hiberniae

How to submit documents to the project

Submissions and donations of suitable texts are very welcome and will be appropriately acknowledged in the file headers. The creation of the CURIA corpus is a cooperative effort, and users' contributions are encouraged.

File formats

Text may be submitted in almost any machine-readable form, as facilities for conversion from many formats are available, but the following formats are suggested. They are shown in descending order of preference from the project's point of view:

Physical medium

File transfer

By Internet anonymous ftp into pub/net/doc/uploads on curia.ucc.ie (;


Arrangements can be made to collect files using Internet ftp or Janet (Colour Book) ftp (transfer).

Electronic mail

To curia@curia.ucc.ie

Floppy disk

IBM PC 5.25'' or 3.5'' disks, high or low density;

Apple Macintosh 3.5'' disks, high or low density.

Magnetic tape

QIC (1/4'') 6150-format tape (QIC-24) cartridge (UNIX tar format)

1/2'' TK70-format (CompacTape II) cartridge (as on DEC VAX machines).

1/2'' open-reel 9-track ANSI labelled magnetic tape.

File format

SGML files

Any valid, parseable format (please include the SGML Declaration if it is not the Reference Concrete Syntax, and the DTD if it is not TEI).

Plain text files

Files using (preferably) only 7-bit (ISO 646 IRV or ASCII) characters. Files using 8-bit characters can be used if the encoding system is supplied (eg ISO, Windows, DOS, Mac etc)

Wordprocessor files

Microsoft Word, Wordperfect, Nota Bene, DisplayWrite, PC-Write, Windows Write, MacWrite or similar formats.

DTP files

Files from desktop publishing systems which are in proprietary binary formats cannot be used unless they can be exported as ASCII files.

Text markup

TEI-conformant SGML

This is the standard to which text will be marked for loading in the corpus database.

Other SGML

For example, non-TEI markup such as DocBook, CALS, HTML 2.0, HTML3, TextTool.

Non-SGML markup

Similar systems of markup which identify text structure, such as COCOA or LaTeX.

Other systems which use unambiguous markup.

Unmarked (raw) text is of course also welcome.

Making a submission

If you are able to donate text to the project, please could you let the Managing Editor or the Project Director know the sources and references where possible, so that the appropriate accreditations can be made:

To discuss technical problems, please contact Peter Flynn, Computer Centre, University College, Cork, Ireland (Phone +353 21 902609, fax +353 21 277194, email pflynn@curia.ucc.ie).

Page last updated: 6 August 1996