TMR-LCG || Participants || Meetings || Texts || Resources
Proposal to TMR Network
Precis: The practice of implementing grammars on the computer has improved the quality, especially the reliability of syntactic description and it has opened the door to a number of applications in natural language processing (NLP). The best of these systems---whether theoretical or applied---are invariably the result of person-decades of specialized work and become difficult to extend. The focus of the network Extending Computational Grammars by Learning (ECGL) is the investigation of ways to improve these state-of-the-art systems by machine learning applied to current best practice.
Background: Computational grammar models have been developed both for linguistic generality and elaboration and for computational implementation. A grammar is a precise linguistic description, and clearly the domain of linguists. A computational grammar involves at least the implementation these grammars in a machine-interpretable formalism. To a large extent this should not replace purely linguistic work, and it must be linguistically well-informed. The needs of implementation immediately improve the precision, detail and reliability of grammatical description. Novel theoretical proposals have also grown of mixed linguistic and computational roots. Thus [Kaplan 88a] develop a novel theory of unbounded movement motivated by the wish to constrain the LFG formalism; [Abeille 91] derives original analyses of idioms in a way transparently motivated in the TAG framework, which was developed almost exclusively by computer scientists; and [Pollard 85] originally motivated the abandonment of metarules in GPSG for computational reasons. Computational grammars are now being applied to increasingly complex fragments of natural language, normally with the goal of supporting the analysis (parsing) and production (generation) of language.
There is nonetheless a complexity bottleneck in grammar development. Grammars that are developed for wide coverage tend to use dozens of multivalued features (in relatively free combination). Their lexicons need tens of thousands of items, each of which may appear in dozens of inflectional variants, and the number of grammatical constructions tend to run in the hundreds (but this number may be hidden in further lexical complexity). In point of fact, it is very difficult to train new people to work on further development of these simply due to the complexity. While there has been a resurgence of interest in nonlinguistic statistical methods for language processing in the last five years, it is safe to say that none of these promises to replace linguistically informed grammars. This suggests that alternatives need to be sought.
Computational grammar also allows applications which were unthinkable in the recent past. These range from information retrieval (IR), computer-assisted language learning (CALL), grammar checking, language aids for the handicapped, translation tools, documentation standards tools, speech control and speech query---all of which may be found in primitive forms on the market today.
Beyond the importance of commercial exploitation, language technology holds the cultural promise of easing communication among the peoples of Europe. This has already begun to take place as language technology finds its way into CALL, multilingual IR, and software supporting reading and translation (intelligent dictionaries).
The project will apply several of the currently interesting techniques for machine learning of natural language to a common problem, that of learning noun-phrase syntax.
Common Base: In order to function well, the network must build on state-of-the-art work in several areas, including writing and implementing extended grammars (for different languages), the linguistic theory informing such computational grammars, and language processing algorithms and implementations. The partners within the proposed network are separately involved in these efforts so that they are well-positioned to focus on the critical problem of extending grammars via methods from machine learning.
Common Foci: The network will be focused by its tasks: the common application and evaluation of machine-learning techniques as used to learn natural language syntax from a given knowledge base. To further focus the network, a subarea of syntax will be chosen, preferably NP syntax. Test material is increasingly available through the work of the Linguistic Data Consortium, the Text Encoding Initiative, the Parallel Grammar Project, and the LRE project ``Test Suites for NLP'', and it can be used within the normal work of the consortium. Network partner ISSCO has supported (and in some cases led) projects in corpus collection and standardization, and measures for diagnosis, evaluation and assessment.
Evaluation: The degree to which objectives are reached may be measured. The appropriate measures are given by task, recognizing and analyzing the noun phrases (NP's) in free text. The success of the methods may be measured in standard ways: for the recognition task, recall (% of NP's from text found), precision (% of items found that are NP's); and for the analysis task, the degree to which the correct constituent structure is assigned.
Expected Results: The network expects to contribute to the knowledge of whether and which machine learning techniques are well suited to the task of learning language.
An especially interesting task is that of combining techniques from statistical and symbolic machine learning. This is especially focused in the subprojects FE, GEN and TBEDL (see § 4 ``Research Method'' for an explanation of subproject names). But MBL, SDL and DOSP also proceed (partially) from annotated data, providing a potential link to symbolic categories. NN cannot be expected to proceed from knowledge-based grammars, but is regarded as an interesting area to follow because of its affinity to human neuroprocessing, its sensitivity to higher-order correlation, and its current dynamism in the learning field.
Previous Approaches: The consortium can build on previous work on automating lexical acquisition. There are a number of proposals for how this can be done, including those associated with AQUILEX ([Briscoe 93]) and DATR ([Evans 90]). These efforts have focused on the existing patterns of structure within the lexicon (e.g., the relation between predicative adjectives used with the verb ``be'', etc. and attributive adjectives used to modify nouns), and on the processes available to create new words (e.g., using ``re'' with a verb such as ``fax'' to create a new word, perhaps for the first time), and on the theoretical apparatus needed for a correct description. The AQUILEX effort was particularly important because it combined a well-founded theoretical view of lexical processes with practical experiments in AQUIring LEXical information automatically (or semi-automatically) from MRDs. It may be that such efforts may be more successful if combined with the learning techniques mentioned above.
Promise: Learning techniques constitute a promising area: not only because purely knowledge-based approaches have been difficult to extend, but also because the knowledge they derive is less sensitive to expert knowledge, and expert error, something rule-based approaches often founder on. The current project is well-poised not only to experiment in machine learning techniques, but also to evaluate more exactly the quality of the results (in the context of grammar extension, this means both the absolute quality and the degree to which it is faithful to the original). GEN is intended to investigate applications of genetic algorithms, which is novel but promising as long as the a locally sensible fitness function is possible.
Benefits of Collaboration: The benefits of collaboration are the control in evaluating the very varied approaches being taken currently to the machine-learning of natural language. None of the laboratories has the resources (or expertise) to experiment simultaneously in all of these areas. The benefits may be seen concretely as well in the opportunity for sharing of resources such as data, information about development systems (present at all sites), and through the sharpening of questions over the problem role of specification vis-à-vis learning/training in the syntax problem.
Specific Innovations are discussed in the following section, in which the various techniques to be applied are presented.
We describe the projects in turn.
| Title | Site | Lang. | Key Techniques
|
| ----- | ----- | ----- | ----------
|
| TBEDL | Groningen | Eng, Ger | Finite-State Methods
|
| FE | Tübingen | Ger, Eng | Parameter Estimation
|
| GEN | SRI | Eng | Variation Control
|
| MBL | Antwerp | Eng | Case-Based Learning
|
| NN | Dublin | Eng | bootstrapping
|
| SDL | ISSCO | Fr, Eng | Maximum Entropy
|
| DOSP | Rank Xerox | Eng, Fr, Ger | Full-Structure Sensitivity |
Schedule: The first year of each project should emphasize the following: (i) the extension and codification of existing methods, especially as these have been applied to the problem of lexical extension; (ii) theoretical investigation of the local methodology; and (iii) the detailed design of an experiment involving NP syntax. The second two years focus on (i) running and refining the experiments; and (ii) analysis of results.
Milestones: Given the project definitions specified at three years each, checkpoints are at 15 months---in time for the mid-term review, and 33 months (with a final three months for wrap-up, reporting, responding to criticism, if needed). The major milestones are thus a bit less than halfway through the project, when all of the initial two-year subprojects may be evaluated. The definition of the milestones are not given here, but are implicit in the ``goals'' noted in the table above. In every case the initial milestone should show:
[Abeille 91] Anne Abeillé. Une grammaire lexicalisée d'arbes adjoints pour le français. PhD thesis, Université de Paris, 1991.
[Bod 95] Rens Bod. Enriching linguistics with statistics: Performance models of natural language. PhD thesis, University of Amsterdam, 1995.
[Bod 96a] Rens Bod, Remko Bonnema, and Remko Scha. A data-oriented approach to semantic interpretation. In Proceedings of the Workshop on Corpus-Oriented Semantic Analysis, Budapest, 1996. ECAI-96.
[Bod 96b] Rens Bod, Ronald Kaplan, Remko Scha, and Khalil Sima'an. A data-oriented approach to lexical functional grammar. In Jan Landsbergen, editor, Computational Linguistics in the Netherlands, Eindhoven, 1996. IPO, Phillips.
[Brill 95] Eric Brill. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543--566, 1995.
[Briscoe 93] Ted Briscoe. Introduction. In Ted Briscoe, Anne Copestake, and Valerie de Paiva, editors, Inheritance, Defaults and the Lexicon, pages 1-12. Cambridge University Press, Cambridge, 1993.
[Evans 90] Roger Evans and Gerald Gazdar. The DATR papers. Technical Report SCRP 139, School of Cognitive and Computing Sciences, University of Sussex, 1990.
[Gecseg 84] Ferenc Gecseg and Magnus Steinby. Tree automata. Akademiai Kiado, Budapest, 1984.
[Hutchens 95] Jason L. Hutchens. Natural Language Grammatical Inference. PhD thesis, Univ. of Western Australia, 1995. Dept. of Information Technology.
[Kaplan 96] Ronald Kaplan. A probabilistic approach to lexical functional grammar. Paper Presented at the LFG Colloquium and Workshops, 1996.
[Kaplan 88a] Ronald Kaplan and Annie Zaenen. Long-distance dependencies, constituent structure, and functional uncertainty. In Mark Baltin and Anthony Kroch, editors, Alternative Conceptions of Phrase Structure, pages xxx--yyy. University of Chicago Press, Chicago, 1988.
[Lankhorst 94] Marc M. Lankhorst. A genetic algorithm for the induction of context-free grammars. In Gosse Bouma and Gertjan van Noord, editors, Proc. of CLIN IV, pages 87--100, Groningen, 1994. Alfa-informatica.
[Lankhorst 96] Marc M. Lankhorst. Genetic Algorithms in Data Analysis. PhD thesis, University of Groningen, 1996.
[Christ 94] M.H.Christiansen. Infinite languages, finite minds: Connectionism, learning, and linguistic structure. PhD thesis, University of Edinburgh, 1994.
[Pollard 85] Carl Pollard. Phrase structure grammar without metarules. In Proc. of the 4th Annual Meeting of the West Coast Conference on Formal Linguistics, pages 246--261, 1985.
[Rayner 88] Manny Rayner. Applying explanation-based generalization to natural language processing. In Fifth Generation Computer Systems 1988, volume 3. Springer Verlag, 1988.
[Rounds 70] William Rounds. Mappings and grammars on trees. Math. Systems Theory, 4(3):257--87, 1970.
[Zavrel 96] Jacob Zavrel and Jorn Veenstra. The language environment and syntactic word-class acquisition. In Charlotte Koster and Frank Wijnen, editors, Proc. of the Groningen Assembly on Language Acquisition (GALA95), Groningen, 1966. Groningen University.
See individual pages for description of institution and mission, key personnel and their experience and expertise, and recent publications. These are marked ``Extending Computational Grammar'', Partner 1, etc.
The host institution for the TMR Network Proposal is the Centre for Language and Cognition, University of Groningen. The host group is Computational Linguistics. Computational Linguistics is one branch of Alfa-Informatica (Humanities Computing), founded in the Faculty of Arts at the University of Groningen in 1986. The work in computational linguistics is concentrated on grammar and parsing, esp. as these are applied in speech understanding, computer-assisted language learning, information systems, and text representation and processing. Foci of this work are the use of computers as laboratories for linguistic research, particularly in syntax and semantics, but with significant efforts in lexical structure, phonology, morphology and language learning. The research take places under the auspices both of Groningen's Centre for Behavioral and Cognitive Neurosciences, and of the Dutch graduate school in logic. @Alfa is a spin-off focusing on WWW applications.
For more detailed information, including personnel, opportunities and requirements for study, etc. see our World-Wide Web page at http://www.let.rug.nl/
Gertjan van Noord. ``An Efficient Implementation of the Head-Corner Parser.''
Accepted to appear in Computational Linguistics, 1997.
Gosse Bouma and Gertjan van Noord, ``Constraint-Based Categorial Grammar'' in: Proc. 32nd ACL, 1994, 147-54.
John Nerbonne, ``A Feature-Based Syntax/Semantics Interface'' in: Annals of Mathematics and Artificial Intelligence, 1993, 107-132.
John Nerbonne, ``Nominal Comparatives and Generalized Quantifiers'' in Journal of Logic, Language and Information 4, 1995, 273-300.
Mark-Jan Nederhof and Ewald Bertsch. ``Linear-time suffix parsing for deterministic languages.'' Journal of the ACM, 43(3), 1996, 524-554.
The University of Tübingen Department of Linguistics (Seminar für Sprachwissenschaft) incorporates three sections: Computational Linguistics (Prof. Erhard Hinrichs), Mathematical Linguistics (Prof. Uwe Mönnich) and General Theoretical Linguistics (Prof. Arnim von Stechow). Cooperating professors in other departments include Prof. Marga Reis (German) and Prof. Bernard Drubig (English Linguistics).
Steven Abney. Parsing by chunks. In Berwick, Abney, and Tenny, editors, Principle-Based Parsing, pages 257--278. Kluwer, 1991.
Dale Gerdemann and Paul John King. The correct and efficient implementation of appropriateness specifications for typed feature structures. In COLING 94, Proceedings, pages 956--960, 1994.
Thilo Götz and Walt Detmar Meurers. Compiling HPSG type constraints into definite clause programs. In Proceedings of ACL 1995,
Erhard W. Hinrichs and Tsuneko Nakazawa. Linearizing AUXs in German verbal complexes. In Nerbonne et al., editor, German in Head-Driven Phrase Structure Grammar, CSLI, 1994.
Bob Carpenter and Gerald Penn. Compiling Typed Attribute-Value Logic Grammars. In H. Bunt and M. Tomita, eds., Recent Advances in Parsing Technology, Kluwer, (1996)
SRI International is a not-for-profit research organisation founded over 50 years ago by Stanford University. SRI Cambridge was SRI's first research laboratory outside California, and has been in existence for 10 years, concentrating on natural language processing and formal methods. It carries out research and consulting for government and commercial clients, and has very close links with the University of Cambridge Computer Laboratory.
Alshawi, H. and Carter, D. 1994, `Training and Scaling Preference Functions for Disambiguation', Computational Linguistics, 20:4, pp 635-648.
Lewin, I. 1995, `Indexical Dynamics', in Polos, L. and Masuch M. (eds), Applied Logic: How, What and Why, Kluwer, pp 121-152.
Milward, D. 1994, 'Dynamic Dependency Grammar', Linguistics and Philosophy, 17, pp 561-605.
Pulman, S. 1996, 'Unification Encodings of Grammatical Notations', Computational Linguistics, 22:3, pp 295-327.
Rayner, M., and Carter, D. 1996, 'Fast Parsing using Pruning and Grammar Specialisation', in Proceedings of the 34th ACL, Santa Cruz, pp 223-230.
Current central research topics of CNTS include machine learning of natural language (symbolic induction of lexical and grammatical knowledge), speech synthesis, lexical organisation and acquisition (design of object-oriented lexical databases, acquisition of lexical knowledge from corpora), data-oriented parsing and part-of-speech tagging, machine learning of user models and pragmatic knowledge, and intelligent text processing (spelling checking, hyphenation, report generation).
CNTS is an associate node of ELSnet since 1993, member of Erasmus ICP NL-1022/09 on Natural Language Processing since 1991, and its follow-up ACO*HUM (Computing in the Humanities thematic network), and coordinator of CLIF (Computational Linguistics in Flanders), the Flemish FWO-funded research community on Computational Linguistics and Language Technology since 1995. The group is also the European headquarters of CHILDES (electronic archive of corpora) and participates in the development of data retrieval and manipulation tools for these corpora. CNTS is currently involved in several externally funded research projects (FWO, VNC, IWT etc.).
Daelemans, W., Van den Bosch, A., & Weijters, A. `IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms.' To appear in D. Aha (ed.) Artificial Intelligence Review, special issue on Lazy Learning, 1997.
Daelemans, W., J. Zavrel, P. Berck, S. Gillis. `MBT: A Memory-Based Part of Speech Tagger-Generator'. In: E. Ejerhed and I. Dagan (eds.) Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, 14-27, 1996.
Daelemans, W. `Memory-Based Lexical Acquisition and Processing.' In: P. Steffens (ed.) Machine Translation and the Lexicon, Springer Lecture Notes in Artificial Intelligence 898, 85-98, 1995.
Daelemans, W., S. Gillis and G. Durieux. `The Acquisition of Stress, a data-oriented approach.' Computational Linguistics 20 (3), 421-451, 1994.
Daelemans, W. and K. De Smedt. `Inheritance in an Object-Oriented Representation of Linguistic Categories.' International Journal Human-Computer Studies, 41, 149-177, 1994.
The largest research group in the Department of Computer Science at UCD is the AI group, and work on natural language processing accounts for a significant portion of the group's research effort. The dominant research themes are connectionist NLP, discourse modelling, robust parsing. There are contacts to the Antwerp group with whom Ronan Reilly spent a year at NIAS, and with the Groningen Behavioural and Cognitive Neuroscience center.
Arthur Cater and Dermot McLoughlin Compound noun interpretation using taxonomic links: An experiment in progress. Cognitive Science of Natural Language Processing Proceedings, Dublin City University, 1996.
Arthur Cater Lexical knowledge required for natural language processing. In Cheng-Ming Guo, ed., Machine tractable dictionaries: Design and construction, Ablex, 1995.
Gemma Lyons Presupposition: Its use in Discourse. In Cognitive Science of Natural Language Processing Proceedings, Dublin City University,1992.
Gemma Lyons Natural Language Generation: An Intermediate Step. In Proceedings of AICS'94 Conference, Trinity College Dublin,1994.
Ronan Reilly Sandy ideas and coloured days: The computational implications of embodiment. Artificial Intelligence Review, 9, pages 305--322, 1995.
Ronan Reilly A connectionist technique for on-line parsing. Network, 3, pages 37--45, 1992.
Ronan Reilly and Noel Sharkey, eds., Connectionist approaches to natural language processing. Hillsdale, NJ: Erlbaum, 1993.
ISSCO was established by the Fondazione Dalle Molle and is affiliated with the University of Geneva. It has been active in NLP for twenty years, especially focusing on multilingual language processing, evaluation of NLP systems and products, and the development of corpus-based techniques. ISSCO members have participated in numerous EC-funded projects, among which are TEMAA, TSNLP, and MULTEXT, and EAGLES (Evaluation and Corpus Groups). ISSCO is a member of the European network of excellence for language and speech (ELSNET) and maintains the secretariat for the European Chapter of the Association (EACL) and the European Association for Machine Translation (EAMT). Corpus collection initiatives in which ISSCO has played a major role include the European Corpus Initiative (ECI) and the Multilingual Corpora for Cooperation (MLCC) which have resulted in two of the largest collections of parallel and multilingual data currently available to the NLP research community. This data provides the necessary resource for many of the learning methods currently under development.
Recent work directly related to this proposal focuses on the extraction of semantic information from text corpora. The work attempts to recognize semantic equivalences across portions of texts.
As an ``External Partner'' ISSCO will seek funding from Swiss sources.
S. Armstrong-Warwick. Acquisition and Exploitation of Textual Resources for NLP. In: Proceedings of the KB & KS Workshop, Tokyo, 1994.
Susan Armstrong. Using Large Corpora, MIT Press, Cambridge, 1995.
S. Armstrong, G. Russell, D. Petitpierre, and G. Robert. An Open Architecture for Multilingual Text Processing. In: Proceedings of the ACL Sigdat Workshop -- From Texts to Tags: Issues in Multilingual Language Analysis. Dublin, 1995. pp. 30--34.
P. Bouillon, S. Lehmann, D. Petitpierre, G. Russell. Definition and Exploitation of Sublanguage Descriptions for MT in a Finite Domain. Final Scientific Report - Projet FNRS no. 1213-42173.94, 1996.
P. Bouillon, S. Lehmann, D. Petitpierre. Inférence statistique de structures sémantiques. In: Journées Scientifiques et Techniques 1997, Avignon, France, 15-16 April, forthcoming.
Rank Xerox European Research Centre comprises two laboratories, one in Cambridge (UK) in existence since 1988 and one in Grenoble (France), created in September 1993. The research will be conducted in laboratory in France.
The Grenoble Laboratory's aim is to enhance the understanding of business processes in a multilingual, distributed environment around multimedia documents and to create technology which helps businesses and individuals to become more efficient in these environments. MLTT concentrates on developing technologies which support effectively the work of individuals and groups of individuals in multilingual settings: creation, manipulation, modification, translation of the natural language content of documents. Most relevant to this project, is the work done on 'light parsing' and on constraint based grammar development (LFG). Some of this work is carried out in collaboration with PARC.
Breidt, Lisa and Segond Frederique, 1995, "Comprehension automatique des expressions a mots multiples en francais et en allemand" to be presented at the Quatriemes journees scientifiques de Lyon
Chanod J.-P., Tapanainen P. "Tagging French: comparing a statistical and a constraint-based method" Proc. 1995 EACL, Dublin, 1995.
Chanod J.-P., Tapanainen P. "Creating a Tagset, Lexicon and Guesser for a French Tagger" From texts to tags: issues in multilingual language analysis, ACL SIGDAT Workshop, Dublin, 1995
Zaenen, Annie and Mary Dalrymple, 1995, "Polymorphic Causatives",to appear in Klavans (ed.) Representation and Acquisition of Lexical Knowledge, Proceedings of the AAAI symposium.
Karttunen, L., R. Kaplan and A.Zaenen, "Finite State Morphology with Composition", in Proceedings of COLING, 1992.
The teams will collaborate and interact regularly through email, and further through three network meetings. A kick-off meeting must guarantee that the task is well-defined and that common training and test material are available. This will be the primary responsibility of ISSCO and Groningen. At this point the ``common base'' mentioned in § 2 must be inventoried (informally) to avoid duplication of effort. A midterm meeting around 16 months into the project must review milestones, providing feedback and course-correction where needed. The final meeting is to be devoted to evaluation of results, dissemination and discussion of plans for potential further exploitation.
As noted in § 3, ``Originality'', the benefits of collaboration are the control in evaluating the very varied approaches being taken currently to the machine-learning of natural language. None of the laboratories has the resources (or expertise) to experiment simultaneously in all of these areas. The benefits may be seen concretely as well in the opportunity for sharing of resources such as data, information about development systems (present at all sites), and through the sharpening of questions over the problem role of specification vis-á-vis learning/training in the syntax problem.
The network is organized as seven teams with a single coordinator. There is a single focus problem which is to be attacked by related methods. Travel has been budgeted to allow visits by postdocs and also by permanent staff within the focal groups.
The network consists of very experienced teams who require no micro-management. The projects have been defined to be of the same temporal span so that review and presentation can be done simultaneously at annual meetings. This was deliberately done in order to ensure that attendance would be maximally attractive. We should regard the annual meetings as particularly attractive if they attract other site members not directly attached to ECGL. In order to further enhance the quality of meetings, we propose to invite leading researchers for extended presentations of their work. We propose to consider tutorial-like presentations among these, given opportunity and cooperation.
The teams in the proposed network know each other to some degree. They are involved in European projects together (DYANA, COMPASS, GLOSSER), participate in exchange programs (ERASMUS), share duties in the European ACL and in the Foundation for Logic, Language and Information, and have experience in organizing professional meetings and summer schools. This, too, should ease the management task.
In order to facilitate communication, each postdoc will report quarterly on progress. We have in mind reports for initiates of approx. one page in length. In order to promote dialogue, the quarterly reports will be distributed throughout the network. We do not foresee a formal reporting procedure for ``reviewing'' these, although the coordinator will report, naturally.
The coordinator, John Nerbonne, is experienced in project management. Having managed groups in industry (Hewlett-Packard Labs) and contract research (German AI Center, Saarbrücken), he is currently the chair of a dept of five permanent staff, and six on temporary contracts. He furthermore serves as the chair of the European chapter of European ACL ('97-98), and is on the board of the Foundation for Logic, Language and Information, and the Dutch National Science Foundation. He has organized conferences (HPSG '91, Linguistic Databases '95), ACL tutorials and ESSLLI and ELSNET summer school sections.
The particular training need in the area of applying machine-learning techniques to NLP arises because the techniques are novel and not widely understood, and because the growing demand for NLP in the marketplace is competing with research for the small number of trained professionals.
Each of the postdocs will work in one of the leading labs of the continent. The site descriptions detail the highly qualified personnel related to ECG's project goals ate each of the sites. We estimate the project supervision---all of which represents donated time---at 10% of project time, totalling 18 person months.
Furthermore, the labs involved are almost all located at or associated with universities where advanced courses in natural language processing, descriptive linguistics, and machine learning---the main feeder disciplines for ECGL---are given. If each postdoc takes a single course (estimated at 0.5 person months) each year, this contributes a further 7.5 person months of training.
At several of the participating sites, postdocs also have the opportunity to conduct graduate courses. If half of them avail themselves of this opportunity, each holding a course for five graduate students (counting as 0.5 person-months of training each), this adds another 7.5 further person-months of training.
Finally, the participants in the ECG network are committed to professional services such as the teaching of summer school courses and professional tutorials. If we estimate that three of these are given over the three-year time span, for thirty participants each at 0.25 person-months, this adds 22.5 person-months. These participants tend to be young postdocs. (One such course will be given in the 1997 European Summer School in Language, Logic and Information.)
We estimate the training effect at approx. 55.5 person-months, about half of this directly to the special target group (postdoctoral researchers).
There is direct industrial involvement through the participation of Rank Xerox, perhaps the world's leader in marketing natural language technology. SRI, one of the leading private laboratories charged with of language technology, also participates. It is part of SRI's charter to perform contract research for industrial clients, so that the network has a direct channel to industry. Finally, several of the other partners have personnel with industrial experience (e.g., Nerbonne in Groningen and Hinrichs in Tübingen) who will seek industrial application of the technology developed here wherever this seems promising. The partners in the network collaborate with dozens of companies in other projects, most of whom are very interested in applying this technology.
All the partners have budgeted money for postdocs, including their salaries and benefits, computers, incidentals such as copying and telephone, and overhead. There is furthermore a generous budget for travel since collaboration of the different partners will require a good deal of it.
All figures in kECU/annum.
| Partner | Budget/annum |
| ------ | ---------- |
| 1. Groningen | 69 |
| 2. Tübingen | 68 |
| 3. SRI Cambridge | 70 |
| 4. Antwerp | 70 |
| 5. Dublin | 70 |
| 6. ISSCO | 0 |
| 7. Rank Xerox | 70 |
| --- | |
| Total | 417 |
| Average | ~69 |
(ISSCO, which does not seek funding, is not included in the average. With ISSCO, the average would be ~60 kECU/annum.)
| Breakdown into main titles/annum | |
|---|---|
| Salaries and Benefits | 270 |
| Overhead (20%) | 54 |
| Computing & Incidentals | 54 |
| Travel & Networking | 39 |
| --- | |
| Total | 417 |