Find best subcombination from given combination

Each component has multiple possible states and any number of them can be active at a time. Some combinations of states have a value associated with it. At any given point, it wants to be able to get the best value for the currently active states of the component. The problem is when there isn’t a value associated with the exact combination, I can’t think of a decent way to implement finding the best value that isn’t horribly inefficient. Let me demonstrate it since I might not have made it completely clear.

For a given component, there are A-Z possible states and 0-9 possible values.

Current state: [A,B,C]

State-value associations:


The correct value would be 3 since it has the most number of common states and no uncommon states. If two state combinations are of equal similarity, like [A,C] and [B,C], ideally, the first one that is defined would be chosen.

The only solution I could think of is checking every possible combination which would could get really inefficient with higher number of states.

It’s also worth noting that this is similar to pseudo classes in CSS, though I couldn’t find any information on how it’s implemented.


We can take a trick from text search and build a reverse index from the states to the state value associations (SVA). The nice thing about this approach is that when we are looking for the best SVA, we only have to look through our input set once and don’t at all have to look at the contents of the SVAs (we just look them up in the index). By contrast the naive version has to compare the input to the SVA’s values for each SVA, which can be many times more work. Here’s a quick python implementation of the idea:

class SVA:
    def __init__(self, states, value):
        :type states: Iter
        :type value: int
        self.states = frozenset(states)
        self.value = value

    def __len__(self):
        return len(self.states)

    def __iter__(self):
        return iter(self.states)

    def __hash__(self):
        return hash((self.states, self.value))

def index(svas):
    Given a collection of SVAs return an dictionary from each value to the set of SVAs in it.
    index = dict()

    for sva in svas:
        for value in sva:
            index.setdefault(value, set())

    return index

class IndexedSVAs:
    def __init__(self, svas):
        :type svas: iter(SVA)
        self.svas = svas
        self._index = index(svas)
        self._sva_positions = dict((sva, i) for (i, sva) in enumerate(svas))

    def __getitem__(self, item):
        return self._index[item]

    def __iter__(self):
        return iter(self.svas)

    def position(self, sva):
        return self._sva_positions.get(sva, -1)

def find_best_fit(states, index):
    Given the output of index and some values, find the SVA with the most values in common with the given and
    with none not in common.
    sva_counter = dict((sva, 0) for sva in index)

    for state in states:
        for sva in index[state]:
            sva_counter[sva] += 1

    candidates = [sva for (sva, count) in sva_counter.items() if len(sva) <= count]
    return max(*candidates, key=lambda sva: (len(sva), -index.position(sva)))

svas = [SVA(['A', 'E'], 5), SVA(['A'], 1), SVA(['A', 'C'], 3), SVA(['B', 'C'], 9)]
indexed = IndexedSVAs(svas)
print(find_best_fit(['A', 'B', 'C'], indexed).value)

which prints out 3 as expected.

The only solution I could think of is checking every possible combination.

I don’t think there is much else you can do because of this requirement:

there isn’t a value associated with the exact combination

If you were looking for the exact combination, you could build some binary tree which you could search more quickly.

The only optimisation I see for your current code is to return as early as possible. Maybe you can make assumptions like, I have [A,B,C] and I know the exact combination is not there, now I found [A,C] is there a chance I could find a better match? to stop searching further. However, I not sure if you will find many that are often applicable.

which would could get really inefficient with higher number of states

Just a quick thought: If states are binary and independent, you could represent them by bits. Binary operations tend to be fast. Maybe you can find a way by &‘ing the current state with the known associations and then counting the bits of the result or something like that.

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *