Know how to build an index for a document collection; Learn the difficulties related to information retrieve based on NLP approaches Define and implement punctuation models and ranks.
Construction and Compression of Indexes Boolean Recovery; Punctuation and Scores: TF-IDF; Vector Space Models; Evaluation in Information Retrieval Relevance feedback and Expansion or search expressions; XML Recovery; Probabilistic retrieval; flat and hierarchical Grouping ; Latent Semantics.
A Project oriented methodology will be followed during this course. At the beginning of each class a theoretical introduction will be made, students are expected to go deeper on subjects at home. Remaining time will be used to develop a battery of small projects that aim to work the theme of the lesson. The assessment of learning involves two instruments: an experimental development work and writing, to be held in group, and a individual practical test. Both the individual component as a component of the group have a well-defined time limit, never exceeding the academic year. The final classification is given in the form: • 40% of the grade comes from the practical component; • 60% of the grade comes from the individual practice component.
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008 Ricardo Baeza-Yates, Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology Behind Search, Addison Wesley, 2010 Ian H. Witten, Alistair Moffat, and Timothy C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishing, 1999
Build tools to train language models from Text Understand the differences between the main training methods and their applications. Understand the importance of the evaluation and the main processes used.
• Language Models o Word probability and sequence probabilities: n-grams. o Entropy and perplexity. • Smooth language models; • Text classification and sentiment analysis; • Naïve-Bayes; • Max entropy models; • Markov Models; • Evaluation o concepts: development set, training set, evaluation set. o Precision recall and F-measure.
A Project oriented methodology will be followed during this course. At the beginning of each class a theoretical introduction will be made, students are expected to go deeper on subjects at home. Remaining time will be used to develop a battery of small projects that aim to work the theme of the lesson. The evaluation will be undertaken by projects. During the first five weeks, five small projects will be given to students that will develop those as homework and deliver them for evaluation. The resolution of each project will not exceed 2 hours. This set of small projects corresponds to 40% of the grade. The remaining 60% will be obtained through a medium-sized project that will start in the middle of the semester and will be accompanied in class til the end of the semester.
Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999. Nitin Indurkhya, Fred J. Damerau (Editors), Handbook of Natural Language Processing, Chapman&Hall/CRC. 2010. Anne Kao, Steve R. Poteet (Editors) Natural Language Processing and Text Mining, Springer, 2010. Daniel Jurafsky and James H. Martin. Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall, 2008. Alexander Clark, Chris Fox, Shalom Lappin (Editors), The Handbook of Computational Linguistics and Natural Language Processing, Wiley-Blackwell, 2010.
Understand the relevance and constitution of the basic levels of NLP aplication. Understand the function of the morphological analyzer; Know how to define extraction patterns conceptual relations. Learn how to use machine learning techniques disambiguation and machine translation.
• Text segmentation an atomization; • Spell checking; • Morphological analysis; • Entity recognition; • Extraction of relations; • Part of speach tagging; • Disambiguation; • Dependency parsers; • Machine translation;
This course is practical. Students will be asked to solve specific problems in collaboration with the teacher. Tasks will be proposed to students to be solved between classes to deepen their knowledge. The evaluation is made based on a test and small practical problems to be solved outside the classroom.
Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999. Nitin Indurkhya, Fred J. Damerau (Editors), Handbook of Natural Language Processing, Chapman&Hall/CRC. 2010. Anne Kao, Steve R. Poteet (Editors). Natural Language Processing and Text Mining, Springer, 2010. Daniel Jurafsky and James H. Martin. Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall, 2008. Alexander Clark, Chris Fox, Shalom Lappin (Editors), The Handbook of Computational Linguistics and Natural Language Processing, Wiley-Blackwell, 2010. Jean Véronis (Editor), Parallel Text Processing: Alignment and Use of Translation Corpora, Springer, 2010. Jörg Tiedemann, Bitext Alignment, Morgan&Claypool, 2011.
Expertise in writing Regular expressions Know how to write grammar and detect and deal with ambiguities Discuss the use of non-ambiguos grammars for NLP, and how context free probabilistic grammar can be used to solve this problem
• Regular Languages o Regular expressions o Finite state automata • Formal languages o Grammars o Grammars with ambiguity o Context Free grammars • Probabilistic context free grammars o dealing with ambiguity • Parsing algorithms o Bottom-Up Parsing o Top-Down Parsing
All classes have theoretical and practical parts. At the beginning of each class a small briefing is made about the previous session and discussed the work or questions proposed; then some time is devoted to raise questions and present the different theoretical approaches and methodologies that support their resolution; this part, although theoretical, is always developed in dialogue with the students. Continue the lesson in a more practical part in which students are invited to research and make a summary of what is already written about it, or else to develop experimental tools or implementation of algorithms. The assessment is based on two written tests (or 1 exam final appeal) and through various jobs (biweekly) delivered students, classes resolved forces but defended before the classroom. Work for assessment are twofold: monographs, with overviews of specific topics; implementation works.
Jeffrey E. F. Friedl, Mastering Regular Expressions, O'Reilly Media, 2006. Hopcroft, John E. and Jeffrey D. Ullman. Introduction to automata theory, languages and computation. AddisonWesley Series in Computer Science. Addison-Wesley Publishing Company, Reading, Mass. 1979. Karttunen, Lauri, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular expressions for language engineering}. Natural Language Engineering, 2(4):305–328.1996. Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999. Aho, Sethi and Ullman. Compilers Principles, Techniques and Tools}, Addison-Wesley, 1986.