Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to process a single directory #29

Open
kwurst opened this issue Jun 13, 2014 · 5 comments
Open

Add option to process a single directory #29

kwurst opened this issue Jun 13, 2014 · 5 comments

Comments

@kwurst
Copy link
Owner

kwurst commented Jun 13, 2014

There should be an option to process a single directory, rather than all subdirectories of the main directory.
The most common case will be to process all subdirectories, but sometimes a student's repository will be pulled late or have to be reprocessed after an error, and so will need to be processed individually, without reprocessing all the other directories.

@StoneyJackson
Copy link
Collaborator

Off the top of my head... two ways to go:

  1. Allow sub-directories to be specified on the command-line. convert path/to/config.json subdir1 subdir2
  2. Track which directories have been processed with the converter. Then only process sub-directories that have not yet been processed. Then provide a command line switch to re-process all. convert -A path/to/config.json.

The first will be tricky for the user to specify sub-directories. When specifying a sub-directory, is it a path relative to the assignment folder, or is it relative to the caller's location? The later is easier for the user since s/he can use tab completion. The former is easier for the program, since it does not have to do some tricky path resolutions.

Going back to the numbered list above, I think the second approach would be easier for the user, and not so difficult to implement. We could store a list of the files (or directories) processed in a file next to the config file.

@kwurst kwurst mentioned this issue Jun 13, 2014
@kwurst
Copy link
Owner Author

kwurst commented Jun 13, 2014

If we go with option 2, what format would we use for the file tracking the directories that have been processed? JSON?
And I think it only needs to be a list of directories processed. A directory should not be processed if it is missing any of the required files.
We could have a flag that would force processing of only the files that exist, for students who have not turned in all the required files, so we could produce a PDF of the existing files so that there is something to grade...

@kwurst
Copy link
Owner Author

kwurst commented Jun 13, 2014

And we probably need a way to unmark a directory from the processed list.

@StoneyJackson
Copy link
Collaborator

A submission must be reprocessed if there was any file to be processed was changed or added since the last time the submission was processed.

For example, suppose Alice submits one file f1. Now suppose we are to process f1 and f2. When Alice's submission is processed, f1 is processed. Later, Alice turns in f2. Rerunning, we now must reprocess Alice's submission: both f1 and f2. Similarly if Alice turns in a new version of f1, we must reprocess Alice's entire submission.

@StoneyJackson
Copy link
Collaborator

For each file processed, save its path and hash to a "cache file" (better names welcome). Then when asked to process files again, check this cache file to see which files have been modified.

Below is a rough sketch of how to implement this in Python.

# source: http://stackoverflow.com/questions/1912567/python-library-to-detect-if-a-file-has-changed-    import pickle
    import hashlib #instead of md5
    try:
        l = pickle.load(open("db"))
    except IOError:
        l = []
    db = dict(l)
    path = "/etc/hosts"
    #this converts the hash to text
    checksum = hashlib.md5(open(path).read()).hexdigest() 
    if db.get(path, None) != checksum:
        print "file changed"
        db[path] = checksum
    pickle.dump(db.items(), open("db", "w"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants