A program to create archives of articles from microPublication.org for sending to Portico.
Authors: Michael Hucka, Tom Morrell
Repository: https://github.com/caltechlibrary/microarchiver
License: BSD/MIT derivative – see the LICENSE file for more information
- Introduction
- Installation
- Usage
- Known issues and limitations
- Getting help and support
- Contributing
- License
- Authors and history
- Acknowledgments
The Caltech Library is the publisher of the online journal microPublication and provides services to the journal that include archiving the journal in a dark archive (specifically, Portico). The archiving process involves pulling down articles from microPublication and packaging them up in a format suitable for sending to Portico. Microarchiver
is a program to automate this process.
On Linux, macOS, and Windows operating systems, you should be able to install Microarchiver directly from the GitHub repository using pip. If you don't have the pip
package or are uncertain if you do, first run the following command in a terminal command line interpreter:
sudo python3 -m ensurepip
Then, install this software by running the following command on your computer:
python3 -m pip install git+https://github.com/caltechlibrary/microarchiver.git --user --upgrade
Alternatively, you can clone this GitHub repository and then run setup.py
:
git clone https://github.com/caltechlibrary/microarchiver.git
cd microarchiver
python3 -m pip install . --user --upgrade
Microarchiver is a command-line program. The installation process should put a program named microarchiver
in a location normally searched by your shell interpreter. For help with usage at any time, run microarchiver
with the option -h
(or /h
on Windows).
microarchiver -h
The simplest use of microarchiver
involves running it without any arguments. This will make it will contact microPublication.org to get a list of current articles, and create an archive of all the articles in a subdirectory of the current directory.
microarchiver
If given the argument -o
(or /o
on Windows), the output will be written to the directory named after the -o
. For example:
microarchiver -o /tmp/micropublication-archive
The following is a screen recording of an actual run of microarchiver
:
If given the argument -a
(or /a
on Windows) followed by a file name, the given file will be read for the list of articles instead of getting the list from the server. The contents of the file must be in the same XML format as the list obtain from microPublication.org; see option -g
, described below, for a way to
get the current article list from the server.
If the option -d
is given, microarchiver
will download only articles whose publication dates are after the given date. Valid date descriptors are those accepted by the Python dateparser library. Make sure to enclose descriptions within single or double quotes. Examples:
microarchiver -d "2014-08-29" ....
microarchiver -d "12 Dec 2014" ....
microarchiver -d "July 4, 2013" ....
microarchiver -d "2 weeks ago" ....
As it works, microarchiver
writes information to the terminal about the archives it puts into the archive, including whether any problems are encountered. To save this info to a file, use the argument -r
(or /r
on Windows).
The output will be put into a single-file archive in ZIP format unless the argument -Z
(or /Z
on Windows) is given to prevent creation of the compressed archive file.
microarchiver
will print informational messages as it works. To reduce messages to only warnings and errors, use the argument -q
(or /q
on Windows). Also, output is color-coded by default unless the -C
argument (or /C
on Windows) is given; this argument can be helpful if the color control sequences create problems for your terminal emulator.
If given the argument -p
(or /p
on Windows), microarchiver
will only print a list of articles it will archive and stop short of creating the archive. This is useful to see what would be produced without actually doing it.
If given the argument -g
(or /g
on Windows), microarchiver
will only write out a file named article-list.xml
containing the complete current article list from the micropublication.org server, and exit without doing anything else. This is useful as a starting point for creating the file used by option -a
. It's probably a good idea to redirect the output to a file; e.g.,
microarchiver -g > article-list.xml
If given the -@
argument (/@
on Windows), this program will output a detailed trace of what it is doing, and will also drop into a debugger upon the occurrence of any errors. The debug trace will be written to the given destination, which can be a dash character (-
) to indicate console output, or a file path.
The following table summarizes all the command line options available. (Note: on Windows computers, /
must be used as the prefix character instead of -
):
Short | Long form opt | Meaning | Default | |
---|---|---|---|---|
-a A |
--articles A |
Get list of articles from file A | Get list from server | |
-C |
--no-color |
Don't color-code the output | Use colors in the terminal output | |
-d D |
--after-date D |
Only get articles published after date D | Get all articles | ⬥ |
-g |
--get-xml |
Print the server's article list & exit | Do other actions instead | |
-o O |
--output-dir O |
Write output in directory O | Write in current dir | |
-p |
--preview |
Preview what would be obtained | Obtain the articles | |
-q |
--quiet |
Only print important messages | Be chatty while working | |
-r R |
--report R |
Write list of article & results in file R | Don't write a report | |
-V |
--version |
Print program version info and exit | Do other actions instead | |
-Z |
--no-zip |
Don't put output into one ZIP archive | ZIP up the output | |
-@ OUT |
--debug OUT |
Debugging mode; write trace to OUT | Normal mode | ⚑ |
⬥ Enclose the date in quotes if it contains space characters; e.g., "12 Dec 2014"
.
⚑ To write to the console, use the character -
as the value of OUT; otherwise, OUT must be the name of a file where the output should be written.
Currently, the only way to indicate that a subset of articles should be obtained from microPublication.org is to use the argument -a
in combination with a file that contains the list of desired articles, or the -d
option to indicate a cut-off for the article publication date.
If you find an issue, please submit it in the GitHub issue tracker for this repository.
We would be happy to receive your help and participation with enhancing microarchiver
! Please visit the guidelines for contributing for some tips on getting started.
Copyright © 2019, Caltech. This software is freely distributed under a BSD/MIT type license. Please see the LICENSE file for more information.
Tom Morrell developed the original algorithm for extracting metadata from DataCite and creating XML files for use with Portico submissions of microPublication.org articles. Mike Hucka created the much-expanded second version now known as Microarchiver.
The vector artwork used as a starting point for the logo for this repository was created by Thomas Helbig for the Noun Project. It is licensed under the Creative Commons Attribution 3.0 Unported license. The vector graphics was modified by Mike Hucka to change the color.
Microarchiver makes use of numerous open-source packages, without which it would have been effectively impossible to develop Microarchiver with the resources we had. We want to acknowledge this debt. In alphabetical order, the packages are:
- colorama – makes ANSI escape character sequences work under MS Windows terminals
- dateparser – parser for human-readable dates
- humanize – make numbers more easily readable by humans
- ipdb – the IPython debugger
- lxml – an XML parsing library for Python
- plac – a command line argument parser
- recordclass – a mutable version of Python named tuples
- requests – an HTTP library for Python
- setuptools – library for
setup.py
- termcolor – ANSI color formatting for output in terminal
- urllib3 – a powerful HTTP library for Python
- xmltodict – a module to make working with XML feel like working with JSON
Finally, we are grateful for computing & institutional resources made available by the California Institute of Technology.