[Kavita] New Feature - Amazon KFX Format Support #3491

rainrdx · 2025-01-04T21:48:47Z

rainrdx
Jan 4, 2025

Idea Description

Hi,

Thank you so much for creating and maintaining Kavita. I feel I might be the only one who needs the support of KFX.

KFX is a proprietary format used by Amazon / Kindle ecosystem to serve ebooks, including Graphic Novels and Comics. It is special in that, due to the history of Comixology, it has retained some really high resolution comics that cannot be procured else where legally. For that reason, I purchase primarily from Amazon, and keep the files myself.

Insofar as I know, there is no other KFX comic reader than Kindle itself. I implemented a rudimentary reader myself, although it is far away from the features that Kavita offers.

KFX is an Amazon Ion package. The library has official implementations in many languages, including C# (https://github.com/amazon-ion/ion-dotnet). And it has a 3rd party python implementation (https://github.com/kluyg/calibre-kfx-input). I did a go implementation myself and the general structure (to the degree it's relevant to comic books) and I will provide a rough flow of the structure at the end of the FR.

KFX, in the end, is a container that supports at least:

BMP,GIF,JPG,JXR,PBM,PDF,PNG,PObject,TIFF,BPG,WebP

Although almost all the comics I personally have are PNG or JPG files. Because of the availability of the transcoding feature now, I feel maybe it is more possible now to support the list of formats listed here.

Again thank you!

Regards,
R

The flow of decoding KFX structure is as follows:

I. KFX Container Processing:

Read Container Header:

Read signature (4 bytes) - Verify it's "CONT"

Read version (2 bytes) - Check if it's 1 or 2

Read header length (4 bytes)

Read container info offset (4 bytes)

Read container info length (4 bytes)

Read Container Info (Ion struct):

Deserialize the Ion data at the container info offset into an IonStruct.

Extract:

container_id

compression_type (default 0)

drm_scheme (default 0)

doc_symbol_offset (optional)

doc_symbol_length (optional)

chunk_size (default 4096)

format_capabilities_offset (optional, version > 1)

format_capabilities_length (optional, version > 1)

index_table_offset

index_table_length

Read Document Symbols (Ion annotated value): (Note: this is not important as vast majority of the comic books have an empty internal symbol table)

If doc_symbol_length > 0:

Deserialize the Ion data at doc_symbol_offset as an IonAnnotation.

Verify the annotation is $ion_symbol_table.

Adjust max_id values of imports in symbol table, if they exist, by adding number of system symbol table entries.

Create a new local symbol table based on this data.

Read Format Capabilities (Ion annotated value):

If format_capabilities_length > 0 (and version > 1):

Deserialize the Ion data at format_capabilities_offset as an IonAnnotation.

Verify the annotation is $593.

Read KFXGen Info (JSON): (Note: also mostly irrelevant to Comics books, which are primarily just images)

Extract the JSON string between container info and header end.

Deserialize the JSON to get kfxgen_package_version, kfxgen_application_version, kfxgen_payload_sha1, and kfxgen_acr.

Verify kfxgen_payload_sha1.

Read Index Table: (Note: this is where we read the content of embedded entities from the container)

Deserialize the data at index_table_offset into a list of entity entries:

For each entry:

Read id_idnum (4 bytes)

Read type_idnum (4 bytes)

Read entity_offset (8 bytes) - This is relative to the end of the header.

Read entity_len (8 bytes)

Determine Container Format:

Based on the type_idnums found in the index table entries, or if there were any document symbols, determine if it's KFX_MAIN, KFX_METADATA, or KFX_ATTACHABLE.

Create Container Info Fragment

Create an annotation fragment ($270) which is an Ion struct containing all the metadata extracted earlier, version number, and list of entities in [[type_idnum, id_idnum], ...] format.

Deserialize Entities:

For each entity entry in the index table:

Create a KfxContainerEntity object.

Call deserialize() on the entity.

II. Entity Processing

Read Entity Header:

Read signature (4 bytes) - Verify it's "ENTY"

Read version (2 bytes) - Check if it's 1

Read header length (4 bytes)

Read Entity Info (Ion struct):

Deserialize the Ion data at the beginning of the entity (up to header length) into an IonStruct.

Extract:

compression_type (default 0)

drm_scheme (default 0)

Read Entity Data:

Extract the remaining bytes after the header as entity_data.

Deserialize Entity Data (based on type):

Get fid (field ID) and ftype (fragment type) from the symbol table using id_idnum and type_idnum.

If ftype is in RAW_FRAGMENT_TYPES:

Treat entity_data as an IonBLOB.

Otherwise:

Deserialize entity_data using IonBinary.deserialize_single_value().

If the deserialized value is an IonAnnotation:

If the annotation is ftype and fid is $348, replace the value with the annotation's inner value and update fid to ftype.

III. Ion Binary Deserialization (Note: this is the general function to deserialize)

Read Descriptor:

Read the first byte as the descriptor.

Extract signature (top 4 bits) and flag (bottom 4 bits).

Determine Length:

If flag is VARIABLE_LEN_FLAG (14):

Read a variable-length unsigned integer (deserialize_vluint) to get the length.

Otherwise:

Length is equal to flag.

Deserialize based on Signature:

Use VALUE_DESERIALIZERS to find the appropriate deserialization function based on signature.

Call the function, passing the flag (or length) and the Deserializer object.

Idea Category

Feature Enhancement

Duration of Using Kavita

No response

Before submitting

I've already searched for existing ideas before posting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kavita] New Feature - Amazon KFX Format Support #3491

{{title}}

Replies: 0 comments

Select a reply

[Kavita] New Feature - Amazon KFX Format Support #3491

rainrdx Jan 4, 2025

Idea Description

Idea Category

Duration of Using Kavita

Before submitting

Replies: 0 comments

rainrdx
Jan 4, 2025