Introduction to Computational Linguistics for Non-Linguists (Italian Course)

Welcome to the course “Introduction to Computational Linguistics for Non-Linguists” (taught in Italian)!

This short course is designed for students with little or no background in linguistics. It introduces key concepts in computational linguistics in an accessible, engaging way — mixing slides, cultural references, and hands-on examples.

📚 Lessons

📖 Lesson 1 — What is Computational Linguistics?

Date: July 9, 2025
Description:
In the movie Her, Theodore — a professional writer of love letters — starts a relationship with Samantha, a charming voice-powered AI assistant. It’s science fiction… but how far are we from that reality?
This lesson introduces computational linguistics: what it is, how it began, and why it matters.

👉 📑 View the slides

📖 Lesson 2 — Applications, Italian Research, and Language Modeling

Date: July 15, 2025 Description:
This lesson explores how computational linguistics is used in everyday tools like translation, assistants, and grammar checkers. It introduces major Italian initiatives like CLIC-it, EVALITA, and IJCoL, and explains what it means to build a linguistic model — including abstraction, generalization, and the role of corpora in NLP research.

👉 📑 View the slides –>

📖 Lesson 3 — Corpus Analysis, Word Vectors, and Language Algorithms

Date: July 22, 2025 Description:
This lesson introduces the statistical properties of linguistic corpora, focusing on Zipf’s law and its impact on word frequency analysis. It then explores how word meanings can be modeled through co-occurrence statistics and vector space representations. The lesson closes with an introduction to algorithms for disambiguating meaning, including decision trees and neural networks like the Multilayer Perceptron.

👉 📑 View the slides –>

📖 Lesson 4 — Linguistic Knowledge and Annotation Strategies

Date: July 29, 2025 Description:
This lesson explores what it means to have linguistic knowledge from both explicit and implicit perspectives. It discusses distinctions between strict and broad definitions of linguistic competence, and how these distinctions affect computational modeling. It then presents the concept of linguistic annotation, outlining how explicit linguistic information is encoded to train algorithms. The lesson contrasts symbolic and statistical approaches to representation, explains how annotation pipelines work, and introduces language models as a way to encode implicit knowledge through statistical abstraction

👉 📑 View the slides –>

📖 Lesson 5 - Data Analysis

Date: July 30, 2025 Description: This lesson introduces the practical workflow of building and evaluating models for emotion detection in text. It starts by explaining the role of annotated data (gold standard) for both training and evaluation, and contrasts it with automatically derived labels (silver standard). The lecture emphasizes the importance of separating training, validation, and test data to prevent overfitting, and introduces cross-validation as a robust evaluation method. The lesson then explores how large language models can be fine-tuned for specific tasks, and presents key evaluation metrics: accuracy, baseline comparisons, precision, recall, and F-score, with attention to class imbalance. It extends to regression tasks, introducing mean squared error (MSE) as a measure of prediction quality. Finally, the lecture discusses qualitative error analysis, feature contributions, and the importance of reproducibility in research, highlighting differences between supervised and unsupervised approaches.

👉 📑 View the slides –>

📖 Lesson 6 - Case Study

Date: July 31, 2025 Description: This lesson introduces bleaching for gender prediction in text, an approach that replaces lexical information with abstract features (frequency, shape, vowel/consonant patterns, punctuation). Unlike lexical models, which overfit to language and topic, bleached models generalize better across languages and perform comparably to humans when lexical cues are unavailable. The lecture covers experimental results with lexical, bleached, and multilingual models, discusses accuracy as the main evaluation metric, and explains Support Vector Machines (SVMs) as the chosen algorithm. It concludes with reflections on limitations such as bias, misinformation, and hallucination in computational profiling.

Introduction to Computational Linguistics for Non-Linguists (Italian Course)

📚 Lessons

📖 Lesson 1 — What is Computational Linguistics?

📖 Lesson 2 — Applications, Italian Research, and Language Modeling

📖 Lesson 3 — Corpus Analysis, Word Vectors, and Language Algorithms

📖 Lesson 4 — Linguistic Knowledge and Annotation Strategies

📖 Lesson 5 - Data Analysis

📖 Lesson 6 - Case Study

📖 Lesson 7 - Support Vector Machine

ℹ️ Course Information