Are there any standards for storing checksums of a repository?
I have a repository with many files (mostly binary, images and raw data) and some documentation. The files are store in a hierarchical folder structure, I want to allow checking the fixity of the file, for detecting data corruption. At the moment I am generating a .json file representing the structure and containing for each file its checksum. Plus some metadata containing, for instance, the date and the algorithm that I used for calculating the checksum.
How to rebalance data across nodes?
I am implementing a message queue where messages are distributed across nodes in a cluster. The goal is to design a system to be able to auto-scale without needing to keep a global map of each message and its location.
How should I handle different hashes of identical files in .zip archive with different ‘last changed’ date?
We store zipped files in the storage of a cloud provider which contain certain fields (metadata). These files are derived from other, larger files. Every time we (re)generate these files, their ‘last changed’ date is set to the generation time, while the content of the file is identical. When we recreate one of these files, which have previously been stored in the online storage, their file hashes (md5/sha) differ. The reason for that is that the zip algorithm seems to include the ‘last changed’ information in the .zip file.
Looking for monotonically increasing (integer) hash function
I’m looking for a HashFunction(X,Y: Integer): Integer
that is monotonically increasing on X, then Y.
How deterministic are SessionIDs from SHA’d GUIDs?
Assume I’m using the following code to generate pseudo-random sessionID’s:
Optimal way to implement this specific lookup table in C#?
I want to create a lookup table for this data:
Optimal way to implement this specific lookup table in C#?
I want to create a lookup table for this data:
Measuring “novelty” of data
I have a heuristic in mind that should allow me to “score” data based on “novelty” that I would like to work in real-ish time.
Measuring “novelty” of data
I have a heuristic in mind that should allow me to “score” data based on “novelty” that I would like to work in real-ish time.
Why Num&sizeMinusOne faster than num&(size-1)
I’ve been told that when I have a hash table of size m
and m=2^k
, I can use the &
operator as num & (size-1)
instead of num % size
, to fit the hashCode to my table size.