How to architect an in-app search solution that accounts for access to data?

  softwareengineering

I have a lot of data that I would like to enable end-users to search on. I plan to use ElasticSearch (but am open to other technologies). This would also be an AWS native solution. For simplicity, let’s say each record of data is a Person that looks like the following:

{
  id: int (unique),
  firstName: text,
  lastName: text
}

And given three pieces of example data:

[
  {id: 1, firstName: "Tom", lastName: "Brady"},
  {id: 2, firstName: "Michael", lastName: "Jordan"},
  {id: 3, firstName: "Brad", lastName: "Pitt"}
]

Then there are two users:

User A: Has access to id 1 (Tom Brady) and id 2 (Michael Jordan)
User B: Has access to id 2 (Michael Jordan) and id 3 (Brad Pitt)

When User A performs a search, I would only like a subset of ids {1,2} to be returned based on their search criteria. When User B performs a search, I would only like a subset of ids {2,3} to be returned based on their search criteria. There is business logic that determines which user has access to which IDs, and returns that as a collection (which I refer to as scoped data).

What is the recommended architecture to support this behavior in a highly scalable, cloud native solution? How to handle the filtering of scoped data in a highly performant way?

One solution I was thinking is to load all IDs for a given user into a filesystem (S3, EFS, DynamoDB or Redis). So user A would have a file with their Json array, and user B would have a separate file with their Json array. Then when the user makes the search request, the application would query ElasticSearch and also specify the IDs in the filter of the query. The concerns I have with this approach is if it is scalable, especially if a user has a Json array of hundreds of thousands of IDs.

New contributor

alex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

LEAVE A COMMENT