{Repo} Text Processing
by J-T.M. · Published July 8, 2020 · Updated July 16, 2020
<< Here are archived scripts to batch process textual data for analysis in R, Python, TXM and IRaMuTeQ. The repo contains tools to extract text – and its metadata – from digital sources (PDFs, HTML, SRT), clean it (layout and OCR corrections) and format it in a CSV+TXT format for analysis. Click on the images to access the GitHub repo. |