Text mining 100+ years of Kanton Zürich's referenda and initiatives
*Peter has some nice papers with previous research
main data sources:
Kantonal level CSV contains URLs to machine-readable pdf voting information
Gemeinde level CSV contains per-Gemeinde historical voting records
CSVs are joined by unique vote ID (STAT_VORLAGE_ID)
PDF are converted to TXT via pdftotext and can be joined to CSV files by field ABSTIMMUNGSTAG
using the code and data
(mostly python 2.7 or bash)
get_pdfs.py scrapes the URLs from the Kantonal CSV file and saves them locally. (Actually we got the PDFs from the organizers on a usb stick, because the scraper was getting IP blocked.) Note that the files Bundesamt.pdf are not URL linked in the CSV files.
vote_mapping.py (experimental) reads the combined text from full_text.csv, and also the metadta from the Kantonal CSV file. It attemps to split the TXT file into multiple elements, one for each ballot measure, using some file-specific some keywords. The code then maps based on the rank of this split array. Output file is fulltextmapped.csv.
sentiment.py reads fulltextmapped.csv and calculates the polarity (-1,1), the subjectivity (0,1) with textblob_de and the readability. Output file is fulltextmapped_sentiment.csv, and the three scores are added as the last 3 columns.