JSON files searchable in Azure Storage

  Kiến thức lập trình

Im looking for a cost efficient and easy way to index loads of JSON files.
It will be a number of different types of JSON data, each collected in its own folder in Azure Storage Containers.

I want to be able to search for some properties, however not all properties need to be searchable.

JSON data could look something like:

{
  "schemaVersion": "1.0.0",
  "contentSource": "ABC",
  "id": "SC-366-AA",
  "lastModified": "2024-01-13T19:52:33.000+00:00",
  "eventType": "Sales",
  "eventTime": "2024-01-13T19:52:33.000+00:00",
  "data": {
    "seller": "Benny",
    "sellerId": "3982309009129",
    "title": "sales",
    "start": "2024-01-01",
    "end": "2024-12-01",
    "value": "2919"
  }
}

Note: This is just an example, will be a lot more properties.

I want to be able to do queries such as fetching all files with lastModified between x and y. Or simply find the files with a specific id.

I know this could be in a database, such as Cosmos DB, and then be queryable. But this is not what I am looking for.

My research have found these alternatives:

  • Azure AI Search (previously Azure Cognitive Search) – Allow for basically full search on all properties in JSON files
  • Blog Index Tags – Allows for up to 10 key/value pairs per blob
  • Elasticsearch for Azure – Seems very customizable

And Ive gotten some recommendations to use, since I am using Azure Storage Gen2:

  • Azure Synapse Analytics
  • Azure Databricks with Spark-based big data queries

Personally I do find Azure AI Search seems to fit the task the best, or Elasticsearch (which I have used previously in AWS). However interested what is best option in Azure, as that is somewhat new to me.

Blog Index Tags seems a bit limited, and could quickly be outgrown.

Databricks and Synapse seems more fitted for research, where you may want to do some calculations or analysis on data from time to time, and it seems powerful. But it is very UI centric and somewhat cumbersome.

LEAVE A COMMENT