- Javascript Object Notation (JSON)
- Lightweight data-interchange format
- Easy to read and write
- Language-independent
- Two universal data structures
- A collection of name/value pairs (object)
- An ordered list of values (array)
- Squared brackets [] hold arrays
- Curly braces {} hold objects
- Data is in name/value pairs
- Data is separated by commas
- Configuration
- Serializing structured data
- Storing and exchanging data in NoSQL databases
- Logging and debugging
- Human-readable and writable
- Object
- Array
- String
- Number (Int, Float, Double, etc)
- Boolean
- Null
- Basic key/name entity
- List of entities
- Complex/referential entity
FEATURE | JSON | Python DICT |
---|---|---|
Purpose | Data Exchange | In memory DS |
Data Types | Limited set of DT | Wide range of DT |
Keys | Strings Only | Any immutable, hashable type |
Ordering | Order not guaranteed | Order of keys guaranteed |
Absence of val | Null | None |
Booleans | True, False | True, False |
Comments | Not supported | Supported |
Trailing commas | Not supported | Supported |
- Serialization: Python Object -> JSON
- Deserialization: JSON -> Python Object
- HTTP is used for transmitting requests and information between servers and browsers.
- Client makes the request, the server responds with some data.
- A Better Alternative: The 'Request' Library
- Limitations to be aware of:
- Some classes or objects cannot be deserialized (extent serialization for specific types)
- Serializing User-Defined Classes (subclassing)
- Retrieve and process JSON data from a URL containing information about books. Follow the steps:
- Use the provided URL: https://www.andybek.com/api/data/books to fetch all the books.
- Save the raw JSON data to a file named "books-original.json".
- Deserialize the JSON data and remove the "ranks" and "release dates" from each book entry.
- Save the modified books data to a new file named "books-cleaned.json".
- Schema is a formal description of the structure of a dataset (types, constraints, relationship between attributes).
- Validates that data is consistant.
- The dominant schema standard in JSON is https://json-schema.org
- Interactive definition of JSON schema: https://www.jsonschemavalidator.net/
- $ref enables schema modularization and reusability
- $defs is the conventional named section for holding definitions in a schema
- Remote review definition: https://www.andybek.com/api/data/review-schema
- Applicators allow us to apply subschemas to specific parts of the model.
- They could be used to define highly specific and conditional validation conditions.
-
Define a schena that restrictively validates the following JSON document:
-
https://www.andybek.com/api/data/contentItems
- Focus on keywords like "array", "object", "type", "enum"
- Consider using the $defs keyword to organize your schema
- For "image/jpeg", contentEncoding: "base64" ensures image data is correctly encoded. For e.g.
- {
- "type": "string",
- "contentEncoding": "base64"
- }
- {
- Define schemas more modularly, combine them by reference.
- $id is used to assign unique identifiers to schemas
- Registry: Collection of resources, where each resource is a schema or a part of schema that has a unique identifier.
- Inspect the following JSON document, which contains USD and CAD stock price information:
- Define a restrictive schema for the data that will identify records
- Using Python, read in the JSON document and validate it against the schema
- Generate a report that indicates which records are invalid, e.g.
-> Invalid Records:
- Record: ('curreny' is a required property) { "ticker": "GOOGL", "price": 100.2 }
- JSONPath is a query language for JSON documents
- Inspired by Xpath (its older sibling is XML)
- Interactive exploration: https://www.jsonpath.com/
- How do we compare two JSON documents?
- One (not so great) idea: string compare them using the built-in Difflib library
- Easier diffing with JSONDiff (lightweight alternative)
- Using DeepDiff: Design for deep comparison for complex python objects