{Repo} Text Processing
Scripts to batch process textual data for analysis in R, Python, TXM and IRaMuTeQ. The repo contains tools to extract text – and its metadata – from digital sources (PDFs, HTML, SRT), clean it (layout and OCR corrections) and format it in a CSV+TXT format for analysis.