Institut für Informatik
Augustusplatz 10
04109 Leipzig


Gerhard Heyer is Professor Emeritus of Natural Language Processing. Before working at the university, he worked as systems specialist and head of research and development tasks in industry and a company of his own. His research focuses on automatic semantic processing of text, research data infrastructures and applications of text mining in the digital humanities, among others. His book on text mining (new edition 2022, is the German standard textbook on this current topic. He is the PI at the Saxon Academy of Sciences and Humanities in charge of the NFDI project Text+. Prof. Heyer is co-opted on the board of directors of the Institute for Applied Informatics (InfAI), an affiliate institute of the University of Leipzig for knowledge and technology transfer in informatics, which he co-founded in 2006. He also is a member of the advisory board on Digital Value Creation of the Saxonian Ministry of Economics and Labour (SMWA).

Professional career

  • since 01/1985
    For many years, he worked as a systems specialist and head of research and development tasks on language and multimedia products in industry and a company of his own, which developed one of the first translation memory systems. He was appointed professor at the University of Leipzig in 1994 as one of the first professors of Natural Language Processing in a Computer Science Department. The chair's fundamental research approach has always been a close connection between data and processes.


  • 08/1973 - 12/1984
    Gerhard Heyer studied philosophy, mathematical logic and linguistics at the Universities of Cambridge (M.A.1980) and Bochum (Dr.phil. 1983). He then spent a year researching natural language processing at the University of Michigan, Ann Arbor (USA), as Visiting Assistant Professor and Feodor Lynen Research Fellow of the Alexander von Humboldt Foundation.

Panel Memberships

  • since 01/2018
    Prof. Heyer is a member of numerous scientific advisory boards and co-opted on the board of directors of the Institute for Applied Informatics (InfAI), an affiliate institute of the University of Leipzig. From 1997 to 2006 he was a member of the scientific advisory board of the Information Science Centre (IZ) in the GESIS network, and from 2006 to 2007 he was also a member of the GESIS Board of Trustees.

His research focuses on the automatic semantic processing of text. In addition to numerous publications on this topic - including the German-language text mining textbook "Text Mining: Wissensrohstoff Text" (W3L-Verlag, 32011, revision by Springer Campus 2022) - he has also carried out a large number of research projects in this field. Noteworthy recent projects include his work on research infrastructures (CLARIN-D,CLARIAH and Text+), on information and relation extraction in the iLCM project (interactive Leipzig Corpus Miner, funded by the DFG together with GESIS, and the application of machine learning methods in OCR and HTR in the DFG joint project OCR-D, Coordinated Funding Initiative for the Further Development of Methods for Optical Character Recognition.

  • iLCM - A virtual research environment for large-scale qualitative data
    Heyer, Gerhard
    Duration: 02/2017 – 12/2022
    Funded by: DFG Deutsche Forschungsgemeinschaft
    Involved organisational units of Leipzig University: Automatische Sprachverarbeitung
  • Recognition and Enrichment of Archival Documents - READ
    Heyer, Gerhard
    Duration: 01/2016 – 06/2021
    Funded by: EU Europäische Union
    Involved organisational units of Leipzig University: Automatische Sprachverarbeitung
  • Sprachdatenressourcen – Deutscher Wortschatz, multilinguale Corpora und Wörter-des-Tages
    Heyer, Gerhard
    Duration: 05/2011 – ongoing
    Involved organisational units of Leipzig University: Automatische Sprachverarbeitung
  • Information distribution and language structure - correlation of grammatical expressions of the noun/verb distinction and lexical information content in Tagalog, Indonesian and German
    Heyer, Gerhard
    Duration: 07/2020 – 06/2022
    Funded by: DFG Deutsche Forschungsgemeinschaft
    Involved organisational units of Leipzig University: Automatische Sprachverarbeitung
    Heyer, Gerhard
    Duration: 05/2011 – 09/2020
    Funded by: BMBF Bundesministerium für Bildung und Forschung
    Involved organisational units of Leipzig University: Automatische Sprachverarbeitung
  • Niekler, A.; Bleier, A.; Kahmann, C.; Posch, L.; Wiedemann, G.; Erdogan, K.; Heyer, G.; Strohmaier, M.
    ILCM - A Virtual Research Infrastructure for Large-Scale Qualitative Data
    Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018.
  • Heyer, G.; Eckart, T.; Goldhahn, D.
    Was sind IT-basierte Forschungsinfrastrukturen für die Geistes- und Sozialwissenschaften und wie können sie genutzt werden?
    Information - Wissenschaft & Praxis. 2015. 66 (5/6). pp. 295–303.
  • Remus, R.; Quasthoff, U.; Heyer, G.
    SentiWS - a publicly available German-language resource for sentiment analysis
    In: Calzolari, N.; Choukri, K.; Maegaard, B. (Eds.)
    Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10). Valetta: ELRA. 2010. pp. 1168–1171.
  • Yousef, T.; Schlaf, A.; Borst-Graetz, J.; Niekler, A.; Heyer, G.
    Press Freedom Monitor: Detection of Reported Press and Media Freedom Violations in Twitter and News Articles
    2021. pp. 153–159.
  • Schroeder, C.; Bürgl, K.; Annanias, Y.; Niekler, A.; Müller, L.; Wiegreffe, D.; Bender, C.; Mengs, C.; Scheuermann, G.; Heyer, G.
    Supporting Land Reuse of Former Open Pit Mining Sites using Text Classification and Active Learning
    2021. pp. 4141–4152.
