This dataset was created on 2019-08-05, and includes the results
from the 2019 election.  Previous versions:
    2015-09-01: Original.
    2015-11-27: Revisions to some TPP estimates.
    2016-10-06: Update to 2016 election.

The "base" set of raw data is stored in the exact_data folder.  These
files comprise
    - prim_$year.csv, containing the primary votes;
    - tcp_$year.csv or tpp_$year.csv, being one-row-per-seat files with
      two-candidate-preferred results;
    - prefs_$year.csv, containing distributed preferences;
    - flow_$year.csv, containing TCP preference flows from each
      candidate whose preferences were distributed, 1996-2013;
    - turnout_$year.csv, containing enrolment, turnout, and informal
      vote figures.

The name "exact_data" is a slight misnomer, as the TCP files contain
Psephos estimates.

The final dataset is in the consolidated folder.  It can be reproduced
with the following steps.  (For all R scripts, the setwd() call at the
start will need to be adapted to your own computer.  I don't care what
Hadley says about setwd().)

0. (optional) ~/tpp_estimates/make_minimal_pref_dataset.R
    - Creates a dataset in ~/tpp_estimates/minimal_pref_data where
      preferences are not distributed once a candidate has more than 50%
      of the vote.  This dataset is used for the benchmarking the TPP
      estimates.

1. ~/flow_estimates/estimate_flows.R
    - Compares true preference flows with estimated-from-preference-
      distributions for 1996-2013, printing correlations to the screen
      and to data and scatter plots to file.
    - If use_minimal_data == 0, then creates flow_$year.csv with 
      estimated or exact preference flows between 1919 and 1993.
    - If use_minimal_data == 1, then does likewise but outputs to
      ~/tpp_estimates/minimal_pref_data/ and estimates flows based on
      the minimal data through to 2013.

2. ~/tpp_estimates/estimate_tpp.R
    - If use_minimal_data == 0, then creates
        - party_flow_all_$party.csv, containing all (exact or estimated)
          preference flows for each party;
        - party_flow_summary_$party.csv, containing smoothed averages
          for each party's preference flows;
        - missing_flow_$year.csv, containing the candidates belonging to
          parties without any preference data for that year;
        - tpp_$year.csv, containing the TPP values, estimated or exact.
    - If use_minimal_data == 1, then, in addition to the above, creates
      scatter plots and CSV files comparing true and estimated TPP's for
      1983-2013, and prints some statistics to screen.

The missing_flow_$year.csv files are for inspection only -- the idea is
to check that there aren't any critical minor parties whose preference
flows would be needed to make good TPP estimates.

3. ~/consolidate_files.R
    - Collects the various files in exact_data, flow_estimates, and
      tpp_estimates, sometimes re-arranges the columns, and outputs the
      final set of data files to the consolidated folder:
          - prim_$year.csv, containing primary votes;
          - prefs_$year.csv, containing preference distributions;
          - t(c|p)p_$year.csv, containg TCP or TPP values, possibly
            Psephos estimates;
          - just_tpp_$year.csv, containing TPP values, either exact or
            my estimates;
          - flow_$year.csv, containing preference flows, either exact
            or estimated;
          - turnout_$year.csv, containing enrolment, turnout, and
            informal vote figures;
          - unopposed_divisions.csv, containing candidates who won
            seats unopposed.

The election_data.js file can then be reproduced by running
~/misc/create_js_data.R, which also reads some of the other files in
the misc folder.
