Reduce Operations on Distributed Databases
I would like to ask you about optimizing reduce operations (e.g. count) on multiple databases.
Modulo Division of Hash Output
In a design book I was reading they describe a method to determine a database sharding scheme by taking the hash (MD5, SHA1, whatever) of a userid (integers or uuids) and then (whether encoded or not) doing mod shard number.
partitioning a table across different shards using two distinct partition keys
I do not have a ton of experience designing databases. Imagine There are 3 tables, a members table with primary key MemeberID, a books table with primary key BookID and a loans table with primary key LoanID as well as having two foreign keys BookID and MemberID. I am partitioning the members table across multiple shards using MemberID and partitioning books table across shards using BookID. My problem arise when I want to partition loans table because it contains both partition keys. My idea to reduce cross-shard queries is to store loan entries that have MemberID and BookID on lets say shard ‘A’, on the same shard. And in the case where the bookID and memberID were on different shards then I’ll store duplicates (of the same loan entry) on both shards. This way the foreign keys on the loan entries will always reference a book or member row within the same database. Do you guys think duplication is acceptable in this situation and this is good design?
Multiple Databases per Microservice
We have a scenario in which all the important and transactional fields of our business entities are highly structured and relational. The data size of these important fields is also very small. However, there is a raw JSON associated with each entity that is very rarely updated (only in exceptional cases). However, most of our read APIs require all the data including the raw JSON.