Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 2.51 KB

README.md

File metadata and controls

32 lines (23 loc) · 2.51 KB

Geospatial Interface

Code, documentation and instruction pertaining to construction of geospatial interface.

  1. Choose stories based on frequency of placename mentions across entire novel. Shortlist was developed from analysis of highest frequency of mentions in novels/novellas/short stories written by Australian authors. The CSV for this information is located here: Locations for the spreadsheet were extracted using Stanford’s NER 3-class classifier (NER code included in repository)

  2. Compose CSV shortlist of 50 novels/novellas/short stories and assess each according to quality of reference to ‘place’ and whether it is a feature of narrative, and also appropriateness of narratives for high school students. Criteria for selection is as follows:

  • 3000 words desirable
  • Mentions of place - something is said about the place other than the name (description, attitude, even if a couple of sentences)
  • Doesn't have to be Australian places
  • OCR - if terrible find out if there is another version or versions in different newspapers that are more legible Link to spreadsheet located here
  1. Create new spreadsheet in Google Docs of 28 narratives. Link to spreadsheet is located here

  2. Manual cleaning of placenames and extracts:

  • Identify false positives in placenames eg. ‘Miss’ or ‘French.’ Remove these rows
  • Correct OCR errors and spelling in extracts
  • Create a column that links to bibliographic information about the author preferably http://adb.anu.edu.au/biography or Auslit if this is not available.
  1. Develop extracts from narratives that mention placenames (see document ‘Extract_Rule’ and integrate into spreadsheet (extraction code included in repository)

  2. Geo-code locations. Use EZ-Geocode an add-on in Google sheets that allows 250 queries per day.

  3. Visualise in ArcGIS Developers account and create pop-ups in-browser for Trove and to To Be Continued website (code for website is titled ‘index.html’).

  4. Manual cleaning of place-names based on incorrect geo-locations.

Link to Google Drive: