Constructing a Finite State Automaton

I have an exam question that I am unsure of the answer. The question is:

In organisation X valid user names have the following structure. The
user name can be either the employee’s name followed by a colon and
then the department name or a string of one or more alphanumeric
letters consisting of lower case, upper case letters and digits from 0
to 9. All user names end with a period (“.”). The employee’s name is
expressed as ﬁrst name followed by an underscore and then the surname.
The ﬁrst name and the surname must begin with an uppercase letter and
is followed by an arbitrary number of lower case letters. All
department names are an arbitrary number of lower case letters.
Express the language for valid user names as a regular expression.

The answer I have is,

[A-Z][a-z]+”_”[A-Z][a-z]+”:”[a-z]+”.” | [A-Za-z0-9]+”.”

The next question is:

Construct a deterministic ﬁnite state automaton for recognising the
user names as described in Question 1.

For which I have this:

I am under the impression this is wrong because [A-Z] is a subgroup of [A-Za-z0-9] and a DFA cannot have the same symbol going from one state. Does anybody have an idea how to solve this / is it correct?

Thanks !

As pointed out in the comments arbitrary can include 0, so you should have * instead of + in your regular expression at the corresponding places.

From your original regular expression you correctly derived a nondeterministic finite state automaton (NFA). To adapt it to the * mentioned above, you only need to eliminate the states 2 and 5 in order to have a corresponding NFA again.

The interesting part of the exam question though is to derive a deterministic finite state machine from your NFA. This is mechanically possible via an algorithm referred to as the powerset construction. This link should give you enough information (and you should have learned it in your lessons anways, given the exam question!) to learn how to get from NFA to DFA.

The resulting DFA is usually too large due to the powerset of states, which is the reason, why one normally performs a DFA minimization afterwards. Though to be honest, I would skip that given that the question only asks for a DFA, not a minimal DFA.

I’m no expert but I think this assumption is correct.

I am under the impression this is wrong because [A-Z] is a subgroup of [A-Za-z0-9] and a DFA cannot have the same symbol going from one state.

It sounds like you have a pretty good grasp but I’ll spell it out just to be thorough.

Following are the steps to convert the current NFA to a DFA. The first step is assessing where the non-determinism exist so the problem can be broken down further.

The initial split needs to happen based one of these two conditions:

The user name can be either the employee’s name followed by a colon and then the department name

Ie. an ’employee_name’ name is encountered

a string of one or more alphanumeric letters consisting of lower case, upper case letters and digits from 0 to 9

Ie. a ‘string’ is encountered

It looks like you have identified the issue already. In that, there’s no way to determine a split on state 1 because both paths share the common subset [A-Z][a-z]*. To create a valid NFA you first need to identify the constraints.

To start I’d look at separating out cases where the initial input is not an ’employee_name’.

Such cases include:

the first char is a lower-case alpha char
a numeric char is encountered

In addition, both an ’employee_name’ and a ‘string’ can be made up of the identical subset [A-Z][a-z]*. To cover that condition there should be a two way split where either [] or [.] is encountered. Where [] is encountered you would transition what is currently state 4. Where [.] is encountered you’d transition to what is currently state 10.

Now that all of the constraints are identified, it’s just a matter of re-mapping your state diagram to represent the flow. There should be one more state containing a * repeater for processing the rest of the ‘string’ value and 3 more transitions. It’s just a matter of placing them correctly.

Following what @Frank said, you should also eliminate two of the states using * instead of + for repetition because the 1-or-more constraint is technically incorrect. For instance a valid user name could contain only one character of [A-Z] and no additional chars before the [_]. The same pattern applies to the ‘surname’

from 1 to 9 remove the [A-Z] transition and add transitions form 2 and 3 to 9 with [A-Z0-9]

Filed under: softwareengineering - @ 01:14

Thẻ: compiler, grammar, regular-expressions

Thiết kế website giá rẻ

Danh mục

Constructing a Finite State Automaton