Computer-assisted text analysis

Transkrypt

Computer-assisted text analysis
KARTA KURSU
Nazwa
Analiza tekstu metodami komputerowymi
Nazwa w j. ang.
Computer-assisted text analysis: from plagiarism detection to distant reading
Kod
Koordynator
Punktacja ECTS*
dr hab. Maciej Eder
2
Zespół dydaktyczny
dr hab. Maciej Eder
Opis kursu (cele kształcenia)
The course will be taught in English.
It is aimed at giving a concise introduction to the methods, tools and applications in corpus
linguistics and computational stylistics. The survey will include a few different topics ranging
from non-traditional authorship attribution to assessing large text corpora, with special
attention paid to the questions of style differentiation and development. Even if some topics of
statistical inference will be discussed, as well as supervised machine-learning techniques, no
expert knowledge is required.
At the end of the course, the participants should have a general idea of natural language
processing, text classification, and large-scale approaches to the history of literature.
Warunki wstępne
At least intermediate level of English is required. Some basic knowledge of computational
linguistics, as well as understanding the concepts of probabilistics, will be helpful.
Efekty kształcenia
Po ukończeniu kursu student:
Numer efektu
WIEDZA
Odniesienie do efektów
kierunkowych
W01
The student has a basic knowledge of different aspects of
computer-assisted text analysis, including natural language
processing (NLP), language laws, word frequencies
distribution, explanatory text classification, machinelearning classification, authorship attribution, gender
recognition, and network analysis as applied to the study of
style evolution.
UMIEJĘTNOŚCI
U01
The student is able to present his/her own point of view,
to actively participate in a discussion, and to deduce
arguments based on the topics introduced during the
course.
KOMPETENCJE SPOŁECZNE
K01
The student understands the idea of lifelong learning.
The student is able to think independently and is ready to
extend his/her knowledge.
K01
Organizacja
Forma zajęć
Ćwiczenia w grupach
Wykład
(W)
A
Liczba godzin
K
L
15
Opis metod prowadzenia zajęć
A number of lectures, each of them followed by a short discussion.
Formy sprawdzania efektów kształcenia
S
P
E
Inne
Egzamin
pisemny
Egzamin ustny
Praca pisemna
(esej)
Referat
Udział w
dyskusji
Projekt
grupowy
Projekt
indywidualny
Praca
laboratoryjna
Zajęcia
terenowe
Ćwiczenia w
szkole
Gry
dydaktyczne
E – learning
x
x
x
W01
U01
K01
Kryteria oceny
the evaluation will be two-fold; the following factors will be taken into
consideration: (1) attendance; (2) active participation
Uwagi
Treści merytoryczne (wykaz tematów)
1. Quantitative approaches to language and style: from Ancient Greece to Father Roberto Busa.
2. How to read 5 million books? OCR, Google and Culturomics.
3. Linguistic laws, or why some words are rare, and some other very frequent.
4. She speaks / he speaks, or the secret life of pronouns.
5. Authorship attribution, or how to trace a stylistic fingerprint.
6. Big Data, stylometry and distant reading, or large-scale history of literature.
7. Does it make any sense, really? Reliability issues in computational stylistics.
Wykaz literatury uzupełniającej
Baayen, H. (2001). Word Frequency Distributions. Dordrecht: Kluwer.
Burrows, J. (1987). Computation into Criticism: A Study of Jane Austen’s Novels and an
Experiment in Method. Oxford: Clarendon Press.
Burrows, J. (2002). ‘Delta’: a measure of stylistic difference and a guide to likely authorship.
Literary and Linguistic Computing 17: 26–287.
Burrows, J. (2004). Textual analysis. In: Schreibman, S., Siemens, R. and Unsworth, J. (eds),
A Companion to Digital Humanities. Oxford: Blackwell, pp. 323–47.
Craig, H. and Kinney, A. F. (2009). Shakespeare, Computers, and the Mystery of Authorship.
Cambridge: Cambridge University Press.
Eder, M. (2011). Style-markers in authorship attribution: a cross-language study of the
authorial fingerprint. Studies in Polish Linguistics 6: 99–114.
Eder, M. (2013). Mind your corpus: Systematic errors in authorship attribution. Literary and
Linguistic Computing 28(4): 603-14.
Hammerl, R. and Sambor, J. (1993). O statystycznych prawach językowych. Warszawa.
Holmes, D. (1998). The evolution of stylometry in humanities scholarship. Literary and
Linguistic Computing 13(3): 111–17.
Hoover, D. L. (2001). Statistical stylistics and authorship attribution: an empirical
investigation. Literary and Linguistic Computing 16: 421–44.
Hoover, D. L. (2009). Modes of composition in Henry James: dictation, style, and ‘What Maisie
Knew’. In Digital Humanities 2009: Conference Abstracts. University of Maryland, College
Park, MD, pp. 145–48.
Jockers, M. (2013). Macroanalysis: Digital Methods and Literary History. Champaign:
University of Illinois Press.
Juola, P. (2006) Authorship Attribution. Foundations and Trends in Information Retrieval 1:
233–334 [available at: http://www.mathcs.duq.edu/~juola/].
Kuraszkiewicz, K. (1963). La richesse du vocabulaire dans quelques grands textes polonais
en vers, Wrocław.
Love, H. (2002). Attributing Authorship: An Introduction, Cambridge: Cambridge University
Press.
Lutosławski, W. (1897). The Origin and Growth of Plato’s Logic: With an Account of Plato’s
Style and the Chronology of his Writings. London: Longman.
Pennebaker, J. (2011). The Secret Life of Pronouns: What Our Words Say about Us. New
York etc.: Bloomsbury Press.
Rudman, J. (1998). Non-traditional authorship attribution studies in the ‘Historia Augusta’:
some caveats. Literary and Linguistic Computing 13: 151–57.
Rybicki, J. (2006). Burrowing into translation: character idiolects in Henryk Sienkiewicz’s
‘Trilogy’ and its two English translations. Literary and Linguistic Computing 21(1): 91–103.
Rybicki, J., Heydel, M. (2013). The stylistics and stylometry of collaborative translation:
Woolf’s ‘Night and Day’. Literary and Linguistic Computing 28, Advanced Access 27 May 2013
(doi: 10:1093/llc/fqt027).
Sambor, J. (1972). Słowa i liczby. Zagadnienia językoznawstwa statystycznego. Wrocław.
Tweedie, J. F. and Baayen, R. H. (1998). How variable may a constant be? Measures of
lexical richness in perspective. Computers and the Humanities 32: 323–52.
Bilans godzinowy zgodny z CNPS (Całkowity Nakład Pracy Studenta)
Wykład
Ilość godzin zajęć w
kontakcie z prowadzącymi
15
Konwersatorium (ćwiczenia, laboratorium itd.)
Konsultacje indywidualne
2
Uczestnictwo w egzaminie/zaliczeniu
Lektura w ramach przygotowania do zajęć
Ilość godzin pracy studenta
bez kontaktu z
prowadzącymi
15
Przygotowanie krótkiej pracy pisemnej lub referatu po
zapoznaniu się z niezbędną literaturą przedmiotu
Przygotowanie projektu lub prezentacji na podany temat
(praca w grupie)
Przygotowanie do egzaminu
Ogółem bilans czasu pracy
32
Ilość punktów ECTS w zależności od przyjętego przelicznika
2

Podobne dokumenty