Knowledge and Language Processing


Grammars in Software Comprehension (Sem. 1)

Learning outcomes

In the end, students must have: ability to develop specifications of the syntax / semantics of languages in
general and problems with grammar, ability to generate programs (prototypes) using automatic grammars
based tools; ability to build powerful front-ends for the analysis of programming languages; ability to design
and implement complex data structures for the representation of information extracted from the interim
analysis of the code, ability to create visual representations suitable for clear understanding of complex
knowledge detained; ability to develop software as a processing program task and / or specifications into
efficient implementations; ability to use processing techniques to optimize software programs (eg calculating
partial detection of dead code), to make debugging programs, improve the structure of programs, ability to
develop tools to aid understanding of code

Syllabus

Program Comprehension, cognitive models, approaches, concepts and knowledge domain;
Languages and Grammars, characterization of these concepts and knowledge domains;
Attribute Grammars (GA): formal definition, development;
Grammar based language processing: grammatical notations, grammar translators (GT) and attribute
grammars (GA); Syntax Directed Translation versus Semantics Directed Translation; Program Generation from
GT / GAs ;
Program analysis;
Program transformation;
Program viewing paradigms.

Teaching methodologies and evaluation

All classes have theoretical and practical parts. At the beginning of each class a small briefing is made about
the previous session and discussed the work or questions proposed (briefing); then some time is devoted to
raise questions and present the different theoretical approaches and methodologies that support their
resolution; this part, although theoretical, is always developed in dialogue with the students.
Continue the lesson in a more practical part in which students are invited to research and make a summary of
what is already written about it, or else to develop experimental tools
or implementation of algorithms.
The assessment is based on two written tests (or 1 exam final appeal) and through various jobs (biweekly)
delivered students, classes resolved forces but defended before the classroom. Work for assessment are
twofold: monographs, with overviews of specific topics; implementation works.

Bibliography

Aho, Sethi and Ullman, ``Compiler Principles, Techniques and Tools'', Addison-Wesley, 1986;
Pierre Deransart, M. Jourdan, and B. Lorho, "Attribute grammars: Main results, existing systems and
bibliography''. In LNCS 341. Springer-Verlag, 1988;
A. van Deursen and Paul Klint, "Little languages: Little maintenance?''. Journal of Software Maintenance,
10:75-92, 1998;
Pedro Rangel Henriques, "Atributos e Modularidade na Especificação de Linguagens Formais''. PhD thesis,
Universidade do Minho, Dec. 1992;
C. A. R. Hoare, "Hints on programming language design''. Technical Report CS-TR-73-403, Stanford University,
CA, USA, 1973;
John E. Hopcroft, Rajeev Motwani, and Jeffrey Ullman, "Introduction to Automata Theory, Languages, and
Computation''. Addison-Wesley, 3rd ed. edition, 2006.

Information Representation and Processing (Sem. 1)

Learning outcomes

At the end, students should: know the document lifecycle, know how to identify the various stages and the
technologies to be used in each of them; be able to specify a markup language regarding a set of
requirements, be able to process documents for various purposes: knowledge extraction, Web publishing,
exchanging information; know and use storage solutions for recorded documents; being able to define the
layers necessary for integrating and exchanging information between different information systems, be able to
implement an electronic publishing project using open international standards: XML, XSL and XSLFO; be able
to program the automatic generation of Web sites from a repository of XML documents, be able to use markup
languages and respective tools developed by others.

Syllabus

Information representation: historical evolution, ASCII, Unicode;
Text as a fundamental way of representing information;
Descriptive markup languages: introduction and some history, SGML, HTML, XML, JSON;
Structured documentation and annotation;
Some markup languages for the Web: XML, HTML, WML, WSDL, SVG;
XML Documents: structure and concepts, document life cycle, DTDs and Schemas development;
Processing markup languages: document abstract tree (ADA), the ADA navigation and search: XPath,
processing models: DOM and SAX;
XSL: XML Stylesheet Language;
XQuery: XML Query Language;
XML and Databases: storing structured and half-structured data;
Integration and information exchange between systems: Data Cleaning, Construction of validators, tools and
migration protocol OAI-PMH;
Electronic Publishing: XSL-FO standard;
Document Processing - HTML, CGI, XML.

Teaching methodologies and evaluation

A Project oriented methodology will be followed during this course.
At the beginning of each class a theoretical introduction will be made, students are expected to go deeper on
subjects at home. Remaining time will be used to develop a battery of small projects that aim to work the
theme of the lesson.
The evaluation will be undertaken by projects. During the first five weeks, five small projects will be given to
students that will develop those as homework and deliver them for evaluation. The resolution of each project
will not exceed 2 hours. This set of small projects corresponds to 40% of the grade. The remaining 60% will be
obtained through a medium-sized project that will start in the middle of the semester and will be accompanied
in class til the end of the semester.

Bibliography

Para a maior parte dos temas abordados a informação que se consegue encontrar na Internet é mais do que
suficiente. No entanto, ficam aqui referências que demonstraram no passado ter sido de grande ajuda:
Harold, Elliotte Rusty. ``XML in a Nutshell: a desktop quick reference''. 3rd ed . Sebastopol: O'Reilly, cop. 2005;
Ramalho, J.C. e Henriques, P. R. "XML \& XSL : da teoria à prática''. Lisboa : FCA - Editora de Informática, 2002
(Tecnologias de informação). ISBN 972-722-347-8;
Carlos Serrão / Joaquim Marques. "Programação com PHP 5.3''. Lisboa : FCA - Editora de Informática, 2010.
ISBN: 978-972-722-341-1.

Knowledge Representation and Processing (Sem. 2)

Learning outcomes

At the end, students will: be able to formally specify knowledge using various methodologies: taxonomies,
thesauri and ontologies; be able to add descriptive semantic to digital objects; be able to specify taxonomies
in SKOS; be able to specify OWL ontologies; be able to process ontologies; be able to add semantics to
websites using standards like RDFa and "Open Linked Data".

Syllabus

Origin and evolution of knowledge formal representation;
Knowledge formal specification: taxonomies, thesauri and ontologies;
Descriptive Semantics in digital objects: RDF - Resource Description Framework;
SKOS - Simple Knowledge Organization System: specification of taxonomies, thesauri and classification
systems;
Ontology Specification: main Entities - Classes / Concepts, properties and individuals; Class Hierarchies ;
Reasoning: Knowledge Inference - 1st orderLogic and Description Logics;
OWL - Web Ontology Language: OWL-Lite and OWL-DL;
Editors and browsers / OWL;
Ontology processing and construction tools: reasoners, editors and browsers;
Web 2.0 and social software;
Web 3.0: the semantic web (OWL, RDF);
Transformation of Information Systems into Knowledge Systems: RDFa - Resource Description Framework in
Attributes, Linked Open Data.

Teaching methodologies and evaluation

A Project oriented methodology will be followed during this course.
At the beginning of each class a theoretical introduction will be made, students are expected to go deeper on
subjects at home. Remaining time will be used to develop a battery of small projects that aim to work the
theme of the lesson.
The evaluation will be undertaken by projects. During the first five weeks, five small projects will be given to
students that will develop those as homework and deliver them for evaluation. The resolution of each project
will not exceed 2 hours. This set of small projects corresponds to 40% of the grade. The remaining 60% will be
obtained through a medium-sized project that will start in the middle of the semester and will be accompanied
in class til the end of the semester.

Bibliography

Santos, Cláudia da Silva Amaral, "Terminologia e ontologias: metodologias para representação do
conhecimento'', Doutoramento em Linguística, 2010, U. Aveiro;
Geroimenko, Vladimir. "Dictionary of XML technologies and the semantic web''. London : Springer, cop. 2004.
(Springer professional computing). ISBN 1-85233-768-0;
Natalya F. Noy and Deborah L. McGuinness. "Ontology Development 101'': A Guide to Creating Your First
Ontology. In Development, vol. 32, Nr. 1, pp. 1-25. 2001;
S. Grimm. `"Knowledge Representation and Ontologies''. In Scientic Data Mining and Knowledge Discovery:
Principles and Foundations, 2009;
Ivo Serra and Rosario Girardi. "A Process for Extracting Non-Taxonomic Relations of Ontologies from Text''. In
Intelligent Information Management, vol. 3, Nr. 4, pp. 119-124. July, 2009;

Scripting in Natural Language Processing (Sem. 2)

Learning outcomes

At the end, students will: be able to write scripts to automate a variety of tasks and transformations; be able to
solve problems using transformations via regular expressions, understand the advantages and operation of
systems driven by production rules (condition-reaction) ; have the ability to build concrete DSLs; have the
ability to construct and use corpora; have the ability to extract information from different corpora; have the
ability to build electronic dictionaries; have the ability to build small prototypes to model natural language.

Syllabus

Scripting languages: characteristics, goals and concepts, introduction to a scripting language (eg Perl);
Regular language processors and regular expression-oriented programming;
Design Patterns in language processing;
Rules based languages: rewriting and textual domain-specific languages (DSL)-based rewrite rules, production
systems;
Processing of structural trees and DSLs based on these processors (eg XML :: DT, Lingua :: Treebank);
Natural Language Processing: morphosyntactic analysis of natural language texts:
Morphological models, Definite Clause Grammars and Logical Grammars, robust parsing;
Natural Language Processing: semantics and pragmatics;
Multisource dictionaries and Thesaurus;
Knowledge Extraction from Texts;
Summarization and Classification;
Introduction to machine translation.

Teaching methodologies and evaluation

This course is practical. Students will be
asked to solve specific problems using a scripting language
in collaboration with the teacher. Tasks will be proposed to
students to be solved between classes to deepen their
knowledge.
The evaluation is made based on a test and small practical problems
to be solved outside the classroom.

Bibliography

Jeffrey E. F. Friedl, "Mastering Regular Expressions, Powerful Techniques for Perl and Other Tools'', O'Reilly
Media, 2006;
Mark Jason Dominus, "Higher-Order Perl'', Morgan Kaufmann, 2005;
Daniel Jurafsky and James H. Martin. "Speech and Language Processing: An introduction to natural language
processing, computational linguistics, and speech recognition''. Pearson Prentice Hall, 2008;
Igor A. Bolshakov and Alexander Gelbukh, "Computational Linguistics: Models, Resources, Applications''.
Steven Bird, Ewan Klein, and Edward Loper, "Natural Language Processing with Python'', O'Reilly Media, 2009;
Jon Loeliger, Matthew McCullough, ``Version Control with Git, 2nd Edition'', O'Reilly Media, 2012.