I’m working on designing a DynamoDB schema that needs to fulfill the following use cases efficiently:
Use Cases:
-
Get Top X Records Sorted by Timestamp for a User:
- I need to retrieve the most recent records for a user, sorted by timestamp.
-
Delete All Records for a User:
- I want to delete all records associated with a specific user.
-
Delete a Single Record for a User:
- I need to delete a specific record for a user based on a unique identifier, without the frontend needing to know any DynamoDB-specific details like composite keys.
-
Update an Existing Record if the Same Data is Submitted Again:
- If the same data is submitted again for a user, I want to update the existing record with the new timestamp rather than creating a new record. To achieve this, I’m generating a
record_id
using a SHA-256 hash of the record’s content to ensure that the same data always produces the samerecord_id
.
- If the same data is submitted again for a user, I want to update the existing record with the new timestamp rather than creating a new record. To achieve this, I’m generating a
Initial Designs Considered:
Design A:
-
PK:
USER#<user_id>
-
SK:
RECORD#<timestamp>
-
Pros:
- Efficient for retrieving records sorted by timestamp.
- Can delete all records for a user easily.
-
Cons:
- Deleting a single record is difficult because the SK is based on the timestamp, which makes it challenging to identify the specific record for deletion without knowing both the
PK
andSK
. - Updating a record with the same content creates a new record instead of updating the existing one, since the timestamp changes, resulting in a different SK.
- Deleting a single record is difficult because the SK is based on the timestamp, which makes it challenging to identify the specific record for deletion without knowing both the
Design B:
-
PK:
USER#<user_id>
-
SK:
RECORD#<record_id>
-
Pros:
- Easy to delete or update a single record using the unique
record_id
. - Deleting all records for a user is also straightforward.
- Easy to delete or update a single record using the unique
-
Cons:
- Retrieving records sorted by timestamp is not possible directly, as the
record_id
does not contain the timestamp. - Without a timestamp in the SK, sorting by time requires additional processing, which is not efficient.
- Retrieving records sorted by timestamp is not possible directly, as the
Hybrid Approach Considered:
-
PK:
USER#<user_id>
-
SK:
RECORD#<timestamp>#<record_id>
-
Pros:
- Allows sorting by timestamp while maintaining a unique identifier.
-
Cons:
- Still faces challenges in efficiently deleting or updating a record, as the exact SK must be known (which includes the timestamp).
- The composite SK complicates querying for specific records when only the
record_id
is known.
I need a schema design that:
- Allows efficient retrieval of records sorted by timestamp.
- Supports deletion of all records for a user.
- Facilitates deletion and updating of individual records based on a unique identifier (
record_id
), without requiring the client to be aware of the exact timestamp. - Ensures that if the same data is submitted again, the existing record is updated with a new timestamp, rather than creating a duplicate record.
How can I design a DynamoDB schema that fulfills all these use cases efficiently? I want to avoid using GSIs (because I feel like it is a waste to only achieve 1 use case unless there is really no way), and I’m looking for a solution that allows me to keep the frontend logic clean, without exposing DynamoDB-specific details like composite keys (this is also not really that important but I thought this way it is cleaner??).
What are the best practices or alternative approaches that could help me achieve this? Any guidance or suggestions would be greatly appreciated!
—
NOTE: I have an API that I receive data as a JSON, I hash it using sha256 to create a unique identifier of this data then I create a DTO that I send to my repository which then saves the data to DynamoDB, so the idea is that my ID
is generated way before the data layer which I thought it is a good practice. But even if say I was able to create my PK&SK as the ID of the document, it would still present challenges when say updating and needing to update the document with the new timestamp.
Yes the user may submit the same data multiple times, when the user does that all I want to do is save the data but the new timestamp is updated.
here is some code in NodeJs that shows what I am doing.
My controller
logic
const newRecord = {
v1: 'test',
v2: 'test2',
requesterUserId: '1',
};
await this.recordRepository.save({
id: sha256IdFromObj(newRecord),
timestamp: Math.floor(new Date().getTime() / 1000),
...newRecord,
}, requesterUserId);
My repository
logic
const command = new PutCommand({
TableName: this.tableName,
Item: {
pk: `USER#${userId}`,
sk: `RECORD#${queryRepoDto.id}`,
...queryRepoDto,
},
});
await this.dynamoDBDocumentClient.send(command);
the above are just snippets to get the idea across.