ElasticSearch query with filters and occurrence number

  Kiến thức lập trình

I have an ES instance that I push logs into. Then ES is used to search those logs. This is not ideal, there are plans to change it, but it is what it is. I’m sorry for a long description, but bear with me, the question is simple.

For now the search goes like this:

  • I have an index with N log lines
  • user enters a phrase to search for
  • I construct the ES query with:
    • this phrase in query
    • size=1 (so I only find one line)
    • track_total_hits=true
    • from=0
    • sort=<something>

So, this gives me a first occurrence of a line with a particular query (because they are sorted, i.e. by timestamp). I also get the total hits, so I can present the user with:

  • the found line
  • the occurrence number (with initial search it’s always 1)
  • the total hits

So the user knows this is a 1/300 occurence and can prompt the UI to find the next one. The search is the same, but if user wants to search the next occurrence, I just pass from=1, from=2 etc. And the performance of this is pretty okay, since I only have to download one line from ES.

That’s great. However, this is all on a website that shows user the logs. What I want to do is when the user does the inital search (before going next/previous occurrence), I want to show them the first line “after their cursor position”

For example, the user sees:

58 foo
59 bar
60 baz
[...]

so I want to scroll him down to a first matching line after line 58, not before.

The problem is, I still want to display the 1/<something> occurrences found. In this case it could be that the initial search would return for example a fifth occurrence, i.e. 5/300. And the user could go to previous/next ones.

So, the solution is to download all the matching lines (without from= and size= in query). And then just do a for loop on them, find the line that has a line number higher than the one the user sees (i.e. 58), return it. And by doing that, I can also count “which occurrence” is that, so I’ll know to display for example 5/300 on UI.

The problem with that is: I have to download all the lines from ES to do that. In case of indexes that have millions and millions of lines, that could be a huge performance hit. So what I want to know is: is there a way to tell Elastic to:

  • get all the matching lines (matching phrase)
  • apply another filter here (line number > something)
  • get this line, but also return the information on “which occurrence of a matching line is that” (in all the matching lines, without the “line number” filter)

so for lines like:

54 content
55 content
56 content
57 content
58 foo
59 bar
60 baz
61 content
[...]

phrase: content, seaching “from line 58”, I’d have a response like:

{
  "line": "61 content",
  "total_hits": 300,
  "occurrence": 5
}

LEAVE A COMMENT