hashing

Base

pydantic model Chunk[source]

A single unit of data, usually a 16KiB block, but can be a whole piece e.g. in v1 hashing

Show JSON schema
{
   "title": "Chunk",
   "description": "A single unit of data, usually a 16KiB block, but can be a whole piece e.g. in v1 hashing",
   "type": "object",
   "properties": {
      "path": {
         "format": "path",
         "title": "Path",
         "type": "string"
      },
      "chunk": {
         "format": "binary",
         "title": "Chunk",
         "type": "string"
      },
      "idx": {
         "title": "Idx",
         "type": "integer"
      }
   },
   "required": [
      "path",
      "chunk",
      "idx"
   ]
}

Fields:
field chunk: bytes [Required]
field idx: int [Required]
field path: Path [Required]

Absolute path

pydantic model Hash[source]

Hash of a block or piece

Show JSON schema
{
   "title": "Hash",
   "description": "Hash of a block or piece",
   "type": "object",
   "properties": {
      "type": {
         "enum": [
            "block",
            "v1_piece",
            "v2_piece"
         ],
         "title": "Type",
         "type": "string"
      },
      "path": {
         "format": "path",
         "title": "Path",
         "type": "string"
      },
      "hash": {
         "format": "binary",
         "title": "Hash",
         "type": "string"
      },
      "idx": {
         "description": "\n    The index of the block for ordering.\n    \n    For v1 hashes, the absolute index of piece across all files.\n    For v2 block and piece hashes, index within the given file\n    ",
         "title": "Idx",
         "type": "integer"
      }
   },
   "required": [
      "type",
      "path",
      "hash",
      "idx"
   ]
}

Fields:
field hash: bytes [Required]
field idx: int [Required]

The index of the block for ordering.

For v1 hashes, the absolute index of piece across all files. For v2 block and piece hashes, index within the given file

field path: Path [Required]
field type: Literal['block', 'v1_piece', 'v2_piece'] [Required]
iter_blocks(path: Path, read_size: int = 16384) Generator[Chunk, None][source]

Iterate 16KiB blocks

pydantic model HasherBase[source]

Show JSON schema
{
   "title": "HasherBase",
   "type": "object",
   "properties": {
      "paths": {
         "items": {
            "format": "path",
            "type": "string"
         },
         "title": "Paths",
         "type": "array"
      },
      "path_root": {
         "format": "path",
         "title": "Path Root",
         "type": "string"
      },
      "piece_length": {
         "title": "Piece Length",
         "type": "integer"
      },
      "n_processes": {
         "default": 1,
         "title": "N Processes",
         "type": "integer"
      },
      "progress": {
         "default": false,
         "title": "Progress",
         "type": "boolean"
      },
      "read_size": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Read Size"
      },
      "memory_limit": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Memory Limit"
      }
   },
   "required": [
      "paths",
      "path_root",
      "piece_length"
   ]
}

Fields:
Validators:
field memory_limit: int | None = None

Rough cap on outstanding memory usage (in bytes) - pauses reading more data until the number of outstanding chunks to process are smaller than this size

Validated by:
field n_processes: int = 1
Validated by:
field path_root: Annotated[Path, AfterValidator(func=_is_abs)] [Required]

Directory containing paths to hash

Constraints:
  • func = <function _is_abs at 0x70dff878dd00>

Validated by:
field paths: list[Annotated[Path, AfterValidator(func=_is_rel)]] [Required]

Relative paths beneath the path base to hash.

Paths should already be sorted in the order they are to appear in the torrent

Validated by:
field piece_length: Annotated[int, AfterValidator(func=_power_of_two)] | Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] [Required]
Validated by:
field progress: bool = False

Show progress

Validated by:
field read_size: int | None = None

How much of a file should be read in a single read call.

If None, set to the piece_length

Validated by:
complete(hashes: list[Hash]) list[Hash][source]

After hashing, do any postprocessing to yield the desired output

hash() list[Hash][source]

Hash all files

process() list[Hash][source]
validator read_size_defaults_piece_size  »  all fields[source]
abstractmethod update(chunk: Chunk, pool: Pool) list[ApplyResult][source]
abstractmethod update(chunk: Chunk, pool: None) list[Hash]

Update hasher with a new chunk of data, returning a list of AsyncResults to fetch hashes

property file_sizes: list[tuple[Path, int]][source]
property max_outstanding_results: int | None[source]

Total number of async result objects that can be outstanding, to limit memory usage

property total_chunks: int[source]

Total read_size chunks in all files

property total_hashes: int[source]

Total hashes that need to be computed

property total_size: int[source]
class DummyPbar(*args: Any, **kwargs: Any)[source]

pbar that does nothing so we i don’t get fined by mypy

update(n: int = 1) None[source]
close() None[source]
set_description(*args: Any, **kwargs: Any) None[source]

V1

pydantic model V1Hasher[source]

Show JSON schema
{
   "title": "V1Hasher",
   "type": "object",
   "properties": {
      "paths": {
         "items": {
            "format": "path",
            "type": "string"
         },
         "title": "Paths",
         "type": "array"
      },
      "path_root": {
         "format": "path",
         "title": "Path Root",
         "type": "string"
      },
      "piece_length": {
         "title": "Piece Length",
         "type": "integer"
      },
      "n_processes": {
         "default": 1,
         "title": "N Processes",
         "type": "integer"
      },
      "progress": {
         "default": false,
         "title": "Progress",
         "type": "boolean"
      },
      "read_size": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Read Size"
      },
      "memory_limit": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Memory Limit"
      }
   },
   "required": [
      "paths",
      "path_root",
      "piece_length"
   ]
}

Fields:
Validators:
field piece_length: Annotated[int, AfterValidator(func=_power_of_two)] [Required]
Constraints:
  • func = <function _power_of_two at 0x70dff2da7600>

Validated by:
validator read_size_is_piece_length  »  read_size[source]

If read_size not passed, make it piece_length

validator sort_paths  »  paths[source]

v1 torrents have arbitrary file sorting, but we mimick libtorrent/qbittorrent’s sort order for consistency’s sake

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

update(chunk: Chunk, pool: Pool) list[ApplyResult][source]
update(chunk: Chunk, pool: None) list[Hash]

Update hasher with a new chunk of data, returning a list of AsyncResults to fetch hashes

property total_hashes: int[source]

Total hashes that need to be computed

sort_v1(paths: list[Path]) list[Path][source]

v1 sorts top-level files first, then within that division alphabetically https://github.com/alanmcgovern/monotorrent/issues/563

V2

pydantic model V2Hasher[source]

Show JSON schema
{
   "title": "V2Hasher",
   "type": "object",
   "properties": {
      "paths": {
         "items": {
            "format": "path",
            "type": "string"
         },
         "title": "Paths",
         "type": "array"
      },
      "path_root": {
         "format": "path",
         "title": "Path Root",
         "type": "string"
      },
      "piece_length": {
         "title": "Piece Length",
         "type": "integer"
      },
      "n_processes": {
         "default": 1,
         "title": "N Processes",
         "type": "integer"
      },
      "progress": {
         "default": false,
         "title": "Progress",
         "type": "boolean"
      },
      "read_size": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Read Size"
      },
      "memory_limit": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Memory Limit"
      }
   },
   "required": [
      "paths",
      "path_root",
      "piece_length"
   ]
}

Fields:
Validators:

field piece_length: Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] [Required]
Constraints:
  • func = <function _power_of_two at 0x70dff2da7600>

Validated by:
field read_size: Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] | None = None

How much of a file should be read in a single read call.

If None, set to the piece_length

Validated by:
classmethod hash_root(hashes: list[Annotated[bytes, Len(min_length=32, max_length=32), PlainSerializer(func=_serialize_hash, return_type=PydanticUndefined, when_used=always)]]) bytes[source]

Given hashes within a v2 merkle tree, compute their root.

References

classmethod read_size_is_block_size()

Wrap a classmethod, staticmethod, property or unbound function and act as a descriptor that allows us to detect decorated items from the class’ attributes.

This class’ __get__ returns the wrapped item’s __get__ result, which makes it transparent for classmethods and staticmethods.

wrapped

The decorator that has to be wrapped.

decorator_info

The decorator info.

shim

A wrapper function to wrap V1 style function.

finish_trees(hashes: list[Hash]) list[MerkleTree][source]

Create from a collection of leaf hashes.

If leaf hashes from multiple paths are found, return a list of merkle trees.

This method does not check that the trees are correct and complete - it assumes that the collection of leaf hashes passed to it is already complete. So e.g. it does not validate that the number of leaf hashes matches that which would be expected given the file size.

Parameters:

hashes (list[Hash]) – collection of leaf hashes, from a single or multiple files

get_root_hash(piece_hashes: list[Annotated[bytes, Len(min_length=32, max_length=32), PlainSerializer(func=_serialize_hash, return_type=PydanticUndefined, when_used=always)]], shape: MerkleTreeShape) bytes[source]

Compute the root hash, including any zero-padding pieces needed to balance the tree.

If n_pieces == 0, the root hash is just the hash tree of the blocks, padded with all-zero blocks to have enough blocks for a full piece.

So if shape.n_pieces == 0, then the hashes passed in should be the leaf hashes (since there are no piece hashes)

hash_pieces(leaf_hashes: list[Annotated[bytes, Len(min_length=32, max_length=32), PlainSerializer(func=_serialize_hash, return_type=PydanticUndefined, when_used=always)]], shape: MerkleTreeShape) list[bytes] | None[source]

Compute the piece hashes for the layer dict

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

update(chunk: Chunk, pool: Pool) list[ApplyResult][source]
update(chunk: Chunk, pool: None) list[Hash]

Update hasher with a new chunk of data, returning a list of AsyncResults to fetch hashes

property total_hashes: int[source]

Total hashes that need to be computed

sort_v2(paths: list[Path]) list[Path][source]

V2 paths are sorted in tree order, alphabetically.

Mostly important for hybrid torrents, because v2 file trees are intrinsically sorted by the bencoding format

Hybrid

Hybrid v1/v2 torrent creation

This is not a straightforward combination of v1 and v2 hashing since each version of torrent has different optimization requirements.

Since v1 is just a linear set of hashes, and the pieces are much larger units, we can read a larger buffer and feed the whole thing into a hashing process at once. v2 works on 16KiB chunks always, so the tradeoff of reading and processing time is a bit different.

Hybrid torrents require us to do both, as well as generate padfiles, so we use routines from the v1 and v2 but build on top of them.

add_padfiles(files: list[FileItem], piece_length: int) list[FileItem][source]

Modify a v1 file list to intersperse .pad files

pydantic model HybridHasher[source]

Show JSON schema
{
   "title": "HybridHasher",
   "type": "object",
   "properties": {
      "paths": {
         "items": {
            "format": "path",
            "type": "string"
         },
         "title": "Paths",
         "type": "array"
      },
      "path_root": {
         "format": "path",
         "title": "Path Root",
         "type": "string"
      },
      "piece_length": {
         "title": "Piece Length",
         "type": "integer"
      },
      "n_processes": {
         "default": 1,
         "title": "N Processes",
         "type": "integer"
      },
      "progress": {
         "default": false,
         "title": "Progress",
         "type": "boolean"
      },
      "read_size": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Read Size"
      },
      "memory_limit": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Memory Limit"
      }
   },
   "required": [
      "paths",
      "path_root",
      "piece_length"
   ]
}

Fields:
Validators:
field piece_length: Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] [Required]
Constraints:
  • func = <function _power_of_two at 0x70dff2da7600>

Validated by:
field read_size: Annotated[int, AfterValidator(func=_divisible_by_16kib), AfterValidator(func=_power_of_two)] | None = None

How much of a file should be read in a single read call.

If None, set to the piece_length

Validated by:
validator sort_paths  »  paths[source]

v1 torrents have arbitrary file sorting, but we mimick libtorrent/qbittorrent’s sort order for consistency’s sake

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

split_v1_v2(hashes: list[Hash]) tuple[PieceLayers, list[bytes]][source]

Split v1 and v2 hashes, returning sorted v1 pieces and v2 piece layers

update(chunk: Chunk, pool: Pool) list[ApplyResult][source]
update(chunk: Chunk, pool: None) list[Hash]

Update hasher with a new chunk of data, returning a list of AsyncResults to fetch hashes

property blocks_per_piece: int[source]
property total_hashes: int[source]

Total hashes that need to be computed