Computer-assisted text analysis
Transkrypt
Computer-assisted text analysis
KARTA KURSU Nazwa Analiza tekstu metodami komputerowymi Nazwa w j. ang. Computer-assisted text analysis: from plagiarism detection to distant reading Kod Koordynator Punktacja ECTS* dr hab. Maciej Eder 2 Zespół dydaktyczny dr hab. Maciej Eder Opis kursu (cele kształcenia) The course will be taught in English. It is aimed at giving a concise introduction to the methods, tools and applications in corpus linguistics and computational stylistics. The survey will include a few different topics ranging from non-traditional authorship attribution to assessing large text corpora, with special attention paid to the questions of style differentiation and development. Even if some topics of statistical inference will be discussed, as well as supervised machine-learning techniques, no expert knowledge is required. At the end of the course, the participants should have a general idea of natural language processing, text classification, and large-scale approaches to the history of literature. Warunki wstępne At least intermediate level of English is required. Some basic knowledge of computational linguistics, as well as understanding the concepts of probabilistics, will be helpful. Efekty kształcenia Po ukończeniu kursu student: Numer efektu WIEDZA Odniesienie do efektów kierunkowych W01 The student has a basic knowledge of different aspects of computer-assisted text analysis, including natural language processing (NLP), language laws, word frequencies distribution, explanatory text classification, machinelearning classification, authorship attribution, gender recognition, and network analysis as applied to the study of style evolution. UMIEJĘTNOŚCI U01 The student is able to present his/her own point of view, to actively participate in a discussion, and to deduce arguments based on the topics introduced during the course. KOMPETENCJE SPOŁECZNE K01 The student understands the idea of lifelong learning. The student is able to think independently and is ready to extend his/her knowledge. K01 Organizacja Forma zajęć Ćwiczenia w grupach Wykład (W) A Liczba godzin K L 15 Opis metod prowadzenia zajęć A number of lectures, each of them followed by a short discussion. Formy sprawdzania efektów kształcenia S P E Inne Egzamin pisemny Egzamin ustny Praca pisemna (esej) Referat Udział w dyskusji Projekt grupowy Projekt indywidualny Praca laboratoryjna Zajęcia terenowe Ćwiczenia w szkole Gry dydaktyczne E – learning x x x W01 U01 K01 Kryteria oceny the evaluation will be two-fold; the following factors will be taken into consideration: (1) attendance; (2) active participation Uwagi Treści merytoryczne (wykaz tematów) 1. Quantitative approaches to language and style: from Ancient Greece to Father Roberto Busa. 2. How to read 5 million books? OCR, Google and Culturomics. 3. Linguistic laws, or why some words are rare, and some other very frequent. 4. She speaks / he speaks, or the secret life of pronouns. 5. Authorship attribution, or how to trace a stylistic fingerprint. 6. Big Data, stylometry and distant reading, or large-scale history of literature. 7. Does it make any sense, really? Reliability issues in computational stylistics. Wykaz literatury uzupełniającej Baayen, H. (2001). Word Frequency Distributions. Dordrecht: Kluwer. Burrows, J. (1987). Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Oxford: Clarendon Press. Burrows, J. (2002). ‘Delta’: a measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing 17: 26–287. Burrows, J. (2004). Textual analysis. In: Schreibman, S., Siemens, R. and Unsworth, J. (eds), A Companion to Digital Humanities. Oxford: Blackwell, pp. 323–47. Craig, H. and Kinney, A. F. (2009). Shakespeare, Computers, and the Mystery of Authorship. Cambridge: Cambridge University Press. Eder, M. (2011). Style-markers in authorship attribution: a cross-language study of the authorial fingerprint. Studies in Polish Linguistics 6: 99–114. Eder, M. (2013). Mind your corpus: Systematic errors in authorship attribution. Literary and Linguistic Computing 28(4): 603-14. Hammerl, R. and Sambor, J. (1993). O statystycznych prawach językowych. Warszawa. Holmes, D. (1998). The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing 13(3): 111–17. Hoover, D. L. (2001). Statistical stylistics and authorship attribution: an empirical investigation. Literary and Linguistic Computing 16: 421–44. Hoover, D. L. (2009). Modes of composition in Henry James: dictation, style, and ‘What Maisie Knew’. In Digital Humanities 2009: Conference Abstracts. University of Maryland, College Park, MD, pp. 145–48. Jockers, M. (2013). Macroanalysis: Digital Methods and Literary History. Champaign: University of Illinois Press. Juola, P. (2006) Authorship Attribution. Foundations and Trends in Information Retrieval 1: 233–334 [available at: http://www.mathcs.duq.edu/~juola/]. Kuraszkiewicz, K. (1963). La richesse du vocabulaire dans quelques grands textes polonais en vers, Wrocław. Love, H. (2002). Attributing Authorship: An Introduction, Cambridge: Cambridge University Press. Lutosławski, W. (1897). The Origin and Growth of Plato’s Logic: With an Account of Plato’s Style and the Chronology of his Writings. London: Longman. Pennebaker, J. (2011). The Secret Life of Pronouns: What Our Words Say about Us. New York etc.: Bloomsbury Press. Rudman, J. (1998). Non-traditional authorship attribution studies in the ‘Historia Augusta’: some caveats. Literary and Linguistic Computing 13: 151–57. Rybicki, J. (2006). Burrowing into translation: character idiolects in Henryk Sienkiewicz’s ‘Trilogy’ and its two English translations. Literary and Linguistic Computing 21(1): 91–103. Rybicki, J., Heydel, M. (2013). The stylistics and stylometry of collaborative translation: Woolf’s ‘Night and Day’. Literary and Linguistic Computing 28, Advanced Access 27 May 2013 (doi: 10:1093/llc/fqt027). Sambor, J. (1972). Słowa i liczby. Zagadnienia językoznawstwa statystycznego. Wrocław. Tweedie, J. F. and Baayen, R. H. (1998). How variable may a constant be? Measures of lexical richness in perspective. Computers and the Humanities 32: 323–52. Bilans godzinowy zgodny z CNPS (Całkowity Nakład Pracy Studenta) Wykład Ilość godzin zajęć w kontakcie z prowadzącymi 15 Konwersatorium (ćwiczenia, laboratorium itd.) Konsultacje indywidualne 2 Uczestnictwo w egzaminie/zaliczeniu Lektura w ramach przygotowania do zajęć Ilość godzin pracy studenta bez kontaktu z prowadzącymi 15 Przygotowanie krótkiej pracy pisemnej lub referatu po zapoznaniu się z niezbędną literaturą przedmiotu Przygotowanie projektu lub prezentacji na podany temat (praca w grupie) Przygotowanie do egzaminu Ogółem bilans czasu pracy 32 Ilość punktów ECTS w zależności od przyjętego przelicznika 2