Skip to content

Latest commit

 

History

History
52 lines (43 loc) · 4.13 KB

import_readme.md

File metadata and controls

52 lines (43 loc) · 4.13 KB

Importing Dictionary Data from Microsoft Excel

Author: Rainer Hind ([email protected])

The goal of this process is to take the XLSX document provided by Alex Medcalfe at the Borthwick, which has a tab for each dictionary letter, and export it to a CSV with all the entries. This is problematic since the ordering/naming of the headers in each tab is inconsistent. The VBA script mentioned below takes the set of all header names, and puts them in a single tab, and then places the fields from the rows in the appropriate column.

The import rake task then does some normalisation on the header names, so that source1 place is seen as equivalent to source 1 Place (note the capitalisation and spacing differences). It will also treat source X archival ref the same as source X ref, since the two are intended to be equivalent, and are both used in the source document.

For reference, the process was completed on a 2012 Macbook Pro running OSX 10.12.6, with Excel version 16.12. Since Excel doesn't include the same VBA libraries on OSX as it does on Windows, a Dictionary VBA class is included which emulates functionality used in the VBA script for OSX users.

Preparatory Steps (in original XLSX file)

  1. Delete any empty sheets/tabs in the Excel doc
  2. Delete the 'Cut Entries' tab if it exists

CSV Export Steps

  1. Using the source .xlsx document, enable the developer tab
    • File->Options->Customize ribbon->Customize the Ribbon->Main Tabs->Developer at time of writing, Google it if not
  2. Go to Developer tab, select Visual Basic
  3. In the opened window, File -> Import file, browse to import_files in the YHD directory & select YHD_Excel_Tab_Merger.bas
  4. OSX ONLY
    • File -> Import, browse to import_files in the YHD directory and select Dictionary.cls from VBA-Dictionary-1.4.1 directory
    • This emulates Microsoft VBA functionality which is not included in OSX Excel version Note: This step was not possible to replicate on SP MacOS.
  5. Returning to the regular Excel window, select developer tab
  6. Click Macros in ribbon at the top
  7. Select CombineSheetsWithDifferentHeaders and press Run
  • Ensure Microsoft Runtime Routine is available. Tools -> References -> select Microsoft Runtime Routine
  • The macro will run and create a new tab. This process may take several minutes, and probably won't appear to be doing anything whilst it runs. When it completes, it will display a pop up message
  1. Switch to the new, merged tab
  2. Save the merged tab to a CSV:
    • File->Save As
    • Set File Format: Select CSV UTF-8 (Comma delimited) (.csv) Note: this is not saving files in UTF-8. See following note about yhd.csv UTF-8 conversion.
    • Save As Filename: yhd
    • Press Save
    • A pop up should open explaining the workbook cannot be saved as there are multiple sheets. Press OK.
  3. You now should have yhd.csv, which can be placed in the YHD project directory (conventionally it should go in /import_files, although the import script should locate it anywhere in the project dir.)

NOTE: Be careful opening the .csv file in Excel. The default CSV parser will automatically convert fields beginning with - to a sum, which results in those fields being corrupted. If saved after opening in Excel, the CSV will contain invalid values and the import script. If you really need to open it in Excel after exporting, use the Import function of Excel instead, and preferably avoid saving it after.

Note bibliography.csv - import script might failed if the value in last column Type contains space character after the value. Run following conversion in vim to fic this issue

vim bibliography.csv
ESC:
%s/\s*$//g

Note about yhd.csv UTF-8 conversion Excell on Windows will by default export csv file encoded in CP1250. This will confuse ruby import script. One quick fix is to convert an exported csv file to expected UTF-8 encoding with the following command line

iconv -f cp1250 -t utf-8 <in-cpo1250.csv >out-utf8.csv

Importing the CSV

  1. After placing yhd.csv in the YHD project directory, open a command prompt in the YHD directory
  2. Run rails yhd:import which will import the data from the CSV