Natural Language


Information Retrieval (Sem. 2)

Learning outcomes

Know how to build an index for a document collection;
Learn the difficulties related to information retrieve based on NLP approaches
Define and implement punctuation models and ranks.

Syllabus

Construction and Compression of Indexes
Boolean Recovery;
Punctuation and Scores:
TF-IDF;
Vector Space Models;
Evaluation in Information Retrieval
Relevance feedback and Expansion or search expressions;
XML Recovery;
Probabilistic retrieval;
flat and hierarchical Grouping ;
Latent Semantics.

Teaching methodologies and evaluation

A Project oriented methodology will be followed during this course.
At the beginning of each class a theoretical introduction will be made, students are expected to go deeper on
subjects at home. Remaining time will be used to develop a battery of small projects that aim to work the theme
of the lesson.
The assessment of learning involves two instruments: an experimental development work and writing, to be
held in group, and a individual practical test.
Both the individual component as a component of the group have a well-defined time limit, never exceeding the
academic year.
The final classification is given in the form:
• 40% of the grade comes from the practical component;
• 60% of the grade comes from the individual practice component.

Bibliography

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval,
Cambridge University Press, 2008
Ricardo Baeza-Yates, Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology
Behind Search, Addison Wesley, 2010
Ian H. Witten, Alistair Moffat, and Timothy C. Bell, Managing Gigabytes: Compressing and Indexing Documents
and Images, Morgan Kaufmann Publishing, 1999

Machine Learning for Natural Language (Sem. 1)

Learning outcomes

Build tools to train language models from Text
Understand the differences between the main training methods and their applications.
Understand the importance of the evaluation and the main processes used.

Syllabus

• Language Models
o Word probability and sequence probabilities: n-grams.
o Entropy and perplexity.
• Smooth language models;
• Text classification and sentiment analysis;
• Naïve-Bayes;
• Max entropy models;
• Markov Models;
• Evaluation
o concepts: development set, training set, evaluation set.
o Precision recall and F-measure.

Teaching methodologies and evaluation

A Project oriented methodology will be followed during this course.
At the beginning of each class a theoretical introduction will be made, students are expected to go deeper on
subjects at home. Remaining time will be used to develop a battery of small projects that aim to work the theme
of the lesson.
The evaluation will be undertaken by projects. During the first five weeks, five small projects will be given to
students that will develop those as homework and deliver them for evaluation. The resolution of each project will
not exceed 2 hours. This set of small projects corresponds to 40% of the grade. The remaining 60% will be
obtained through a medium-sized project that will start in the middle of the semester and will be accompanied in
class til the end of the semester.

Bibliography

Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press.
Cambridge, MA: May 1999.
Nitin Indurkhya, Fred J. Damerau (Editors), Handbook of Natural Language Processing, Chapman&Hall/CRC.
2010.
Anne Kao, Steve R. Poteet (Editors) Natural Language Processing and Text Mining, Springer, 2010.
Daniel Jurafsky and James H. Martin. Speech and Language Processing: An introduction to natural language
processing, computational linguistics, and speech recognition. Pearson Prentice Hall, 2008.
Alexander Clark, Chris Fox, Shalom Lappin (Editors), The Handbook of Computational Linguistics and Natural
Language Processing, Wiley-Blackwell, 2010.

Natural Language Processing (Sem. 1)

Learning outcomes

Understand the relevance and constitution of the basic levels of NLP aplication.
Understand the function of the morphological analyzer;
Know how to define extraction patterns conceptual relations.
Learn how to use machine learning techniques disambiguation and machine translation.

Syllabus

• Text segmentation an atomization;
• Spell checking;
• Morphological analysis;
• Entity recognition;
• Extraction of relations;
• Part of speach tagging;
• Disambiguation;
• Dependency parsers;
• Machine translation;

Teaching methodologies and evaluation

This course is practical. Students will be asked to solve specific problems in collaboration with the teacher.
Tasks will be proposed to students to be solved between classes to deepen their knowledge.
The evaluation is made based on a test and small practical problems
to be solved outside the classroom.

Bibliography

Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press.
Cambridge, MA: May 1999.
Nitin Indurkhya, Fred J. Damerau (Editors), Handbook of Natural Language Processing, Chapman&Hall/CRC.
2010.
Anne Kao, Steve R. Poteet (Editors). Natural Language Processing and Text Mining, Springer, 2010.
Daniel Jurafsky and James H. Martin. Speech and Language Processing: An introduction to natural language
processing, computational linguistics, and speech recognition. Pearson Prentice Hall, 2008.
Alexander Clark, Chris Fox, Shalom Lappin (Editors), The Handbook of Computational Linguistics and Natural
Language Processing, Wiley-Blackwell, 2010.
Jean Véronis (Editor), Parallel Text Processing: Alignment and Use of Translation Corpora, Springer, 2010.
Jörg Tiedemann, Bitext Alignment, Morgan&Claypool, 2011.

Formal Language Theory (Sem. 1)

Learning outcomes

Expertise in writing Regular expressions
Know how to write grammar and detect and deal with ambiguities
Discuss the use of non-ambiguos grammars for NLP, and how context free probabilistic grammar can be used
to solve this problem

Syllabus

• Regular Languages
o Regular expressions
o Finite state automata
• Formal languages
o Grammars
o Grammars with ambiguity
o Context Free grammars
• Probabilistic context free grammars
o dealing with ambiguity
• Parsing algorithms
o Bottom-Up Parsing
o Top-Down Parsing

Teaching methodologies and evaluation

All classes have theoretical and practical parts. At the beginning of each class a small briefing is made about
the previous session and discussed the work or questions proposed; then some time is devoted to raise
questions and present the different theoretical approaches and methodologies that support their resolution; this
part, although theoretical, is always developed in dialogue with the students.
Continue the lesson in a more practical part in which students are invited to research and make a summary of
what is already written about it, or else to develop experimental tools or implementation of algorithms.
The assessment is based on two written tests (or 1 exam final appeal) and through various jobs (biweekly)
delivered students, classes resolved forces but defended before the classroom. Work for assessment are
twofold: monographs, with overviews of specific topics; implementation works.

Bibliography

Jeffrey E. F. Friedl, Mastering Regular Expressions, O'Reilly Media, 2006.
Hopcroft, John E. and Jeffrey D. Ullman. Introduction to automata theory, languages and computation. AddisonWesley Series in Computer Science. Addison-Wesley Publishing Company, Reading, Mass. 1979.
Karttunen, Lauri, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular expressions for
language engineering}. Natural Language Engineering, 2(4):305–328.1996.
Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press.
Cambridge, MA: May 1999.
Aho, Sethi and Ullman. Compilers Principles, Techniques and Tools}, Addison-Wesley, 1986.