Author: Rainer Hind ([email protected])
The goal of this process is to take the XLSX document provided by Alex Medcalfe at the Borthwick, which has a tab for each dictionary letter, and export it to a CSV with all the entries. This is problematic since the ordering/naming of the headers in each tab is inconsistent. The VBA script mentioned below takes the set of all header names, and puts them in a single tab, and then places the fields from the rows in the appropriate column.
The import rake task then does some normalisation on the header names, so that source1 place
is seen as equivalent to source 1 Place
(note the capitalisation and spacing differences). It will also treat source X archival ref
the same as source X ref
, since the two are intended to be equivalent, and are both used in the source document.
For reference, the process was completed on a 2012 Macbook Pro running OSX 10.12.6
, with Excel version 16.12
. Since Excel doesn't include the same VBA libraries on OSX as it does on Windows, a Dictionary
VBA class is included which emulates functionality used in the VBA script for OSX users.
Preparatory Steps (in original XLSX file)
- Delete any empty sheets/tabs in the Excel doc
- Delete the 'Cut Entries' tab if it exists
CSV Export Steps
- Using the source .xlsx document, enable the developer tab
File->Options->Customize ribbon->Customize the Ribbon->Main Tabs->Developer
at time of writing, Google it if not
- Go to
Developer
tab, selectVisual Basic
- In the opened window,
File -> Import file
, browse toimport_files
in the YHD directory & selectYHD_Excel_Tab_Merger.bas
- OSX ONLY
File -> Import
, browse toimport_files
in the YHD directory and select Dictionary.cls fromVBA-Dictionary-1.4.1
directory- This emulates Microsoft VBA functionality which is not included in OSX Excel version Note: This step was not possible to replicate on SP MacOS.
- Returning to the regular Excel window, select developer tab
- Click
Macros
in ribbon at the top - Select
CombineSheetsWithDifferentHeaders
and pressRun
- Ensure
Microsoft Runtime Routine
is available.Tools -> References -> select Microsoft Runtime Routine
- The macro will run and create a new tab. This process may take several minutes, and probably won't appear to be doing anything whilst it runs. When it completes, it will display a pop up message
- Switch to the new, merged tab
- Save the merged tab to a CSV:
File->Save As
- Set
File Format: Select CSV UTF-8 (Comma delimited) (.csv)
Note: this is not saving files in UTF-8. See following note about yhd.csv UTF-8 conversion. - Save As
Filename: yhd
- Press
Save
- A pop up should open explaining the workbook cannot be saved as there are multiple sheets. Press
OK
.
- You now should have yhd.csv, which can be placed in the YHD project directory (conventionally it should go in
/import_files
, although the import script should locate it anywhere in the project dir.)
NOTE: Be careful opening the .csv file in Excel. The default CSV parser will automatically convert fields beginning with -
to a sum, which results in those fields being corrupted. If saved after opening in Excel, the CSV will contain invalid values and the import script. If you really need to open it in Excel after exporting, use the Import function of Excel instead, and preferably avoid saving it after.
Note bibliography.csv - import script might failed if the value in last column Type contains space character after the value. Run following conversion in vim to fic this issue
vim bibliography.csv
ESC:
%s/\s*$//g
Note about yhd.csv UTF-8 conversion Excell on Windows will by default export csv file encoded in CP1250. This will confuse ruby import script. One quick fix is to convert an exported csv file to expected UTF-8 encoding with the following command line
iconv -f cp1250 -t utf-8 <in-cpo1250.csv >out-utf8.csv
Importing the CSV
- After placing
yhd.csv
in the YHD project directory, open a command prompt in the YHD directory - Run
rails yhd:import
which will import the data from the CSV