Question

Motivation:

I am building a webserver that is essentially a dashboard app. Users make requests and get analytics data and visualizations back. Multiple users can be in the same “workspace,” where they use the same underlying datasource to power dashboards.

In this sense, N users would make requests that use the same instance of an underlying database component, and a different M users might use another instance of that same db component.

Typically, webservers serve individual requests from scratch, but in this case, it seems sensible to have a collection of initialized db instances per-workspace, such that a request from any user in the same workspace would always get that instance (creating db components from scratch for any given request seems heavyweight when I could manage a collection of already-active pools, one pool per workspace). My question is, what kinds of patterns exist for this in webserver backends?

In an attempt to answer my own question, what I’ve (potentially) landed on is:

# app.py (pseudocode)
app = FastAPI()

@app.post('/{workspace_id}/dasbhoard-plz')
def fetch_dashboard(...):
    # auth the user, etc. then,
    data = data_service.get_data(workspace_id, specific_data_request)
    return dashboardify(data)


# data_service.py

def get_data(workspace_id, specific_data_request):
    sql = turn_into_sql(specific_data_request)
    workspace_datasource = _get_or_init_workspace_datasource(workspace_id)
    data = workspace_datasource.query(sql)
    return data

_cached_workspace_datasources: dict[WorkspaceId, Datasource] = {}
def _get_or_init_workspace_datasource(workspace_id):
    with lock():
        if workspace_id not in _cached_workspace_datasources:
            ds = Datasource.from_workspace_id(workspace_id)
            ds.init()
            _cached_workspace_datasources[workspace_id] = ds
    return _cached_workspace_datasources[workspace_id]

class Datasource:
    def __init__(self, db_url, **engine_kwargs):
        self._engine = create_engine(db_url, **engine_kwargs)
    @classmethod
    def from_workspace_id(workspace_id):
        # get db_url via workspace_id
        cls(db_url)
    def query(self, sql):
        # ...

Essentially, I’m applying a lazy-load/GetOrAdd-style lookup to a wrapper around workspace-level database engines, because requests from multiple users (in the same workspace) shouldn’t have to recreate such a thing. The lock is applicable in production setups where you might have multiple threads under the hood, and the workspace_id in dict is not jointly atomic with the dict assignment. I do think this all works, at the cost of process-level duplication of elements in _cached_workspace_datasources in the event prod config is using multiple procs and requests get routed (in)appropriately. However…

Again, my question is, is this kosher? Is there a standard way to do this kind of thing, where some (typically heavyweight) state can be shared across a relevant (subset) of requests?

Webapp/Webserver architecture patterns for shared components

LEAVE A COMMENT Hủy