-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #16 from o19s/remove_the_monty
Remove the monty - refactor data indexing to run w/o python
- Loading branch information
Showing
47 changed files
with
158 additions
and
4,034 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Generating the TMDB dataset | ||
|
||
Periodically we update the TMDB dataset as new movies come out, or new data sources are added. | ||
|
||
1. Get the latest TMDB dump using the https://github.com/o19s/tmdb_dump project. | ||
|
||
2. Create the Solr schema formatted JSON file: | ||
|
||
Pass in the TMDB extract file and the name of the resulting Solr JSON file. | ||
|
||
``` | ||
python3 createSolrTmdbDataset.py tmdb_2020-08-10.json tmdb_solr.json | ||
``` | ||
|
||
3. Zip and store the file in the root directory | ||
|
||
``` | ||
zip tmdb_solr.json.zip tmdb_solr.json | ||
cp ../../ | ||
``` | ||
|
||
|
||
https://raw.githubusercontent.com/o19s/tmdb_dump/master/tmdb_dataflows.png | ||
|
||
# Understanding Data Structure | ||
|
||
You can use `jq` to parse the JSON. Just unzip a chunk and then do: | ||
|
||
> cat tmdb_solr_2020-08-11.json | jq . | ||
Or, to look at a specific movie dataset, look it up by id: | ||
|
||
> jq '.[] | select(.id=="87381")' tmdb_solr_2020-08-11.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import json | ||
|
||
|
||
def rawTmdbMovies(tmdb_source_file): | ||
return json.load(open(tmdb_source_file)) | ||
|
||
|
||
def writeTmdbMovies(rawMoviesJson, path): | ||
with open(path, 'w') as f: | ||
json.dump(rawMoviesJson, f) | ||
|
||
def tmdbMovies(tmdb_source_file): | ||
tmdbMovies = rawTmdbMovies(tmdb_source_file) | ||
for movieId, tmdbMovie in tmdbMovies.items(): | ||
yield (movieId, tmdbMovie) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Testing TLRE examples | ||
|
||
TLRE examples are vunerable to changes in external tooling (Splainer, Quepid) and Solr itself. So to ensure things are ready to go for training we've scripted these "tests" to check all of the examples. | ||
|
||
## Splainer | ||
|
||
These tests check that changes to Splainer don't damage TLRE examples. | ||
|
||
Splainer links from the slides are stored in `splainer_links_solr.csv`. The script `splainer_puppet_solr.py` will visit each one of the links and report the HTTP status code back. | ||
|
||
These tests assume you are running the local Solr TMDB setup. | ||
|
||
Setup your virtual environment: | ||
``` | ||
python3 -m venv venv | ||
source venv/bin/activate | ||
pip install -r requirements.txt | ||
``` | ||
|
||
Run regression tests | ||
``` | ||
python3 splainer_puppet_solr.py | ||
``` | ||
|
||
This will record the status code in the CSV file and print the number of failed queries to console. | ||
|
||
## Newman | ||
|
||
These tests check that version changes in Solr don't damage TLRE examples. | ||
|
||
[Newman](https://github.com/postmanlabs/newman) is the command line tool for managing Postman collections. All examples from the class, beyond just the links to Splainer, are included in the collection `../solr-postman-collection.json` | ||
|
||
``` | ||
newman run --global-var "solr_host=localhost:8983" ../../solr-postman-collection.json | ||
``` |
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#!/bin/bash | ||
|
||
curl 'http://localhost:8983/solr/tmdb/update?commit=true' --data-binary @tmdb_solr.json -H 'Content-type:application/json' |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.