dandi.support.digests#

Provides helper to compute digests (md5 etc) on files

class dandi.support.digests.Digester(digests: list[str] = <factory>, blocksize: int = 65536)[source]#

Helper to compute multiple digests in one pass for a file

blocksize: int = 65536#

Chunk size (in bytes) by which to consume a file.

digest_funcs: list[Callable[[], Hasher]]#
digests: list[str]#

List of any supported algorithm labels, such as md5, sha1, etc.

dandi.support.digests.checksum_zarr_dir(files: dict[str, tuple[str, int]], directories: dict[str, tuple[str, int]]) str[source]#

Calculate the Zarr checksum of a directory only from information about the files and subdirectories immediately within it.

Parameters:
  • files – A mapping from names of files in the directory to pairs of their MD5 digests and sizes

  • directories – A mapping from names of subdirectories in the directory to pairs of their Zarr checksums and the sum of the sizes of all files recursively within them

dandi.support.digests.get_dandietag(filepath: str | Path) DandiETag[source]#
dandi.support.digests.get_digest(filepath: str | Path, digest: str = 'sha256') str[source]#
dandi.support.digests.get_zarr_checksum(path: Path, known: dict[str, str] | None = None) str[source]#

Compute the Zarr checksum for a file or directory tree.

If the digests for any files in the Zarr are already known, they can be passed in the known argument, which must be a dict mapping slash-separated paths relative to the root of the Zarr to hex digests.

dandi.support.digests.md5file_nocache(filepath: str | Path) str[source]#

Compute the MD5 digest of a file without caching with fscacher, which has been shown to slow things down for the large numbers of files typically present in Zarrs