hashing¶
Base¶
- pydantic model Chunk[source]¶
A single unit of data, usually a 16KiB block, but can be a whole piece e.g. in v1 hashing
Show JSON schema
{ "title": "Chunk", "description": "A single unit of data, usually a 16KiB block, but can be a whole piece e.g. in v1 hashing", "type": "object", "properties": { "path": { "format": "path", "title": "Path", "type": "string" }, "chunk": { "format": "binary", "title": "Chunk", "type": "string" }, "idx": { "title": "Idx", "type": "integer" } }, "required": [ "path", "chunk", "idx" ] }
- pydantic model Hash[source]¶
Hash of a block or piece
Show JSON schema
{ "title": "Hash", "description": "Hash of a block or piece", "type": "object", "properties": { "type": { "enum": [ "block", "v1_piece", "v2_piece" ], "title": "Type", "type": "string" }, "path": { "format": "path", "title": "Path", "type": "string" }, "hash": { "format": "binary", "title": "Hash", "type": "string" }, "idx": { "description": "\n The index of the block for ordering.\n \n For v1 hashes, the absolute index of piece across all files.\n For v2 block and piece hashes, index within the given file\n ", "title": "Idx", "type": "integer" } }, "required": [ "type", "path", "hash", "idx" ] }
- iter_blocks(path: Path, read_size: int = 16384) Generator[Chunk, None][source]¶
Iterate 16KiB blocks
- pydantic model HasherBase[source]¶
Show JSON schema
{ "title": "HasherBase", "type": "object", "properties": { "paths": { "items": { "format": "path", "type": "string" }, "title": "Paths", "type": "array" }, "path_root": { "format": "path", "title": "Path Root", "type": "string" }, "piece_length": { "title": "Piece Length", "type": "integer" }, "n_processes": { "default": 1, "title": "N Processes", "type": "integer" }, "progress": { "default": false, "title": "Progress", "type": "boolean" }, "read_size": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "title": "Read Size" }, "memory_limit": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "title": "Memory Limit" } }, "required": [ "paths", "path_root", "piece_length" ] }
- Fields:
- Validators:
read_size_defaults_piece_size»all fields
- field memory_limit: int | None = None¶
Rough cap on outstanding memory usage (in bytes) - pauses reading more data until the number of outstanding chunks to process are smaller than this size
- Validated by:
- field path_root: Annotated[Path, AfterValidator(func=_is_abs)] [Required]¶
Directory containing paths to hash
- Constraints:
func = <function _is_abs at 0x70dff878dd00>
- Validated by:
- field paths: list[Annotated[Path, AfterValidator(func=_is_rel)]] [Required]¶
Relative paths beneath the path base to hash.
Paths should already be sorted in the order they are to appear in the torrent
- Validated by:
- field piece_length: Annotated[int, AfterValidator(func=_power_of_two)] | Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] [Required]¶
- Validated by:
- field read_size: int | None = None¶
How much of a file should be read in a single read call.
If None, set to the piece_length
- Validated by:
- complete(hashes: list[Hash]) list[Hash][source]¶
After hashing, do any postprocessing to yield the desired output
- abstractmethod update(chunk: Chunk, pool: Pool) list[ApplyResult][source]¶
- abstractmethod update(chunk: Chunk, pool: None) list[Hash]
Update hasher with a new chunk of data, returning a list of AsyncResults to fetch hashes
V1¶
- pydantic model V1Hasher[source]¶
Show JSON schema
{ "title": "V1Hasher", "type": "object", "properties": { "paths": { "items": { "format": "path", "type": "string" }, "title": "Paths", "type": "array" }, "path_root": { "format": "path", "title": "Path Root", "type": "string" }, "piece_length": { "title": "Piece Length", "type": "integer" }, "n_processes": { "default": 1, "title": "N Processes", "type": "integer" }, "progress": { "default": false, "title": "Progress", "type": "boolean" }, "read_size": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "title": "Read Size" }, "memory_limit": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "title": "Memory Limit" } }, "required": [ "paths", "path_root", "piece_length" ] }
- Fields:
- Validators:
read_size_is_piece_length»read_sizesort_paths»paths
- field piece_length: Annotated[int, AfterValidator(func=_power_of_two)] [Required]¶
- Constraints:
func = <function _power_of_two at 0x70dff2da7600>
- Validated by:
- validator read_size_is_piece_length » read_size[source]¶
If read_size not passed, make it piece_length
- validator sort_paths » paths[source]¶
v1 torrents have arbitrary file sorting, but we mimick libtorrent/qbittorrent’s sort order for consistency’s sake
- model_post_init(context: Any, /) None¶
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
V2¶
- pydantic model V2Hasher[source]¶
Show JSON schema
{ "title": "V2Hasher", "type": "object", "properties": { "paths": { "items": { "format": "path", "type": "string" }, "title": "Paths", "type": "array" }, "path_root": { "format": "path", "title": "Path Root", "type": "string" }, "piece_length": { "title": "Piece Length", "type": "integer" }, "n_processes": { "default": 1, "title": "N Processes", "type": "integer" }, "progress": { "default": false, "title": "Progress", "type": "boolean" }, "read_size": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "title": "Read Size" }, "memory_limit": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "title": "Memory Limit" } }, "required": [ "paths", "path_root", "piece_length" ] }
- Fields:
- Validators:
- field piece_length: Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] [Required]¶
- Constraints:
func = <function _power_of_two at 0x70dff2da7600>
- Validated by:
- field read_size: Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] | None = None¶
How much of a file should be read in a single read call.
If None, set to the piece_length
- Validated by:
- classmethod hash_root(hashes: list[Annotated[bytes, Len(min_length=32, max_length=32), PlainSerializer(func=_serialize_hash, return_type=PydanticUndefined, when_used=always)]]) bytes[source]¶
Given hashes within a v2 merkle tree, compute their root.
References
- classmethod read_size_is_block_size()¶
Wrap a classmethod, staticmethod, property or unbound function and act as a descriptor that allows us to detect decorated items from the class’ attributes.
This class’ __get__ returns the wrapped item’s __get__ result, which makes it transparent for classmethods and staticmethods.
- wrapped¶
The decorator that has to be wrapped.
- decorator_info¶
The decorator info.
- shim¶
A wrapper function to wrap V1 style function.
- finish_trees(hashes: list[Hash]) list[MerkleTree][source]¶
Create from a collection of leaf hashes.
If leaf hashes from multiple paths are found, return a list of merkle trees.
This method does not check that the trees are correct and complete - it assumes that the collection of leaf hashes passed to it is already complete. So e.g. it does not validate that the number of leaf hashes matches that which would be expected given the file size.
- get_root_hash(piece_hashes: list[Annotated[bytes, Len(min_length=32, max_length=32), PlainSerializer(func=_serialize_hash, return_type=PydanticUndefined, when_used=always)]], shape: MerkleTreeShape) bytes[source]¶
Compute the root hash, including any zero-padding pieces needed to balance the tree.
If n_pieces == 0, the root hash is just the hash tree of the blocks, padded with all-zero blocks to have enough blocks for a full piece.
So if shape.n_pieces == 0, then the hashes passed in should be the leaf hashes (since there are no piece hashes)
- hash_pieces(leaf_hashes: list[Annotated[bytes, Len(min_length=32, max_length=32), PlainSerializer(func=_serialize_hash, return_type=PydanticUndefined, when_used=always)]], shape: MerkleTreeShape) list[bytes] | None[source]¶
Compute the piece hashes for the layer dict
- model_post_init(context: Any, /) None¶
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
Hybrid¶
Hybrid v1/v2 torrent creation
This is not a straightforward combination of v1 and v2 hashing since each version of torrent has different optimization requirements.
Since v1 is just a linear set of hashes, and the pieces are much larger units, we can read a larger buffer and feed the whole thing into a hashing process at once. v2 works on 16KiB chunks always, so the tradeoff of reading and processing time is a bit different.
Hybrid torrents require us to do both, as well as generate padfiles, so we use routines from the v1 and v2 but build on top of them.
- add_padfiles(files: list[FileItem], piece_length: int) list[FileItem][source]¶
Modify a v1 file list to intersperse .pad files
- pydantic model HybridHasher[source]¶
Show JSON schema
{ "title": "HybridHasher", "type": "object", "properties": { "paths": { "items": { "format": "path", "type": "string" }, "title": "Paths", "type": "array" }, "path_root": { "format": "path", "title": "Path Root", "type": "string" }, "piece_length": { "title": "Piece Length", "type": "integer" }, "n_processes": { "default": 1, "title": "N Processes", "type": "integer" }, "progress": { "default": false, "title": "Progress", "type": "boolean" }, "read_size": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "title": "Read Size" }, "memory_limit": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "title": "Memory Limit" } }, "required": [ "paths", "path_root", "piece_length" ] }
- Fields:
- Validators:
sort_paths»paths
- field piece_length: Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] [Required]¶
- Constraints:
func = <function _power_of_two at 0x70dff2da7600>
- Validated by:
- field read_size: Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] | None = None¶
How much of a file should be read in a single read call.
If None, set to the piece_length
- Validated by:
- validator sort_paths » paths[source]¶
v1 torrents have arbitrary file sorting, but we mimick libtorrent/qbittorrent’s sort order for consistency’s sake
- model_post_init(context: Any, /) None¶
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- split_v1_v2(hashes: list[Hash]) tuple[PieceLayers, list[bytes]][source]¶
Split v1 and v2 hashes, returning sorted v1 pieces and v2 piece layers