Hi and thanks for looking!
Background
I have a project wherein I need to abstract meaning from a passage of text to determine what the text is seeking and then match that text to a list of search results.
Now, I have some context with this and the following scenario is very close, but doesn’t violate my NDA with the client:
Let’s say I have a software developer job description from a posting on Craig’s List. I want to somehow parse that description and make a guess as to whether the posting is seeking a C# developer, a DBA, a SDET, a C++ dev, etc. Even better, it would be good to distill certain parameters such as years/level of experience and experience with any particular stacks (.NET, Java, etc).
Once I have abstracted the requisition, I then need to reference a database of resumes (CVs) to find a ranked set of matches, probably using the same algorithm or at least a system of tags.
Question
How best do I attain this goal? Should I be looking into building an index of keywords? Natural Language Processing? Evolutionary Algorithms?
We have to assume that human interaction will be very minimal.
Example Job Posting
Here is an example Job Posting that I have pulled at random from Craig’s List:
We invite you to bring your career to an environment where talent is rewarded and new ideas are encouraged. At Seattle Children’s, the Pacific Northwest’s premier pediatric care center, we offer more than just state-of-the-art facilities and open career growth potential. You will also find a true commitment to our patients and families that reach far beyond the bounds of clinical expertise.
The Applications Development Senior position in the Center for Developmental Therapeutics will develop, test, standardize and implement software for high throughput data analysis. This includes system analysis, design, application support, research into standards, and customer/vendor relations management in support of production operations. For example, supporting the integration of different computational approaches into an end-to-end workflow for high throughput data analysis, including applications for proteomics and metabolomics. Participate in all phases of the project lifecycle, including architecture, design, development, alpha, beta, release and production support. The ideal candidate will have a Bachelor’s degree in the field and a minimum of three years of experience with software development; working knowledge of JAVA, Spring, AJAX, JSF, Hibernate, JQuery, and JUnit; and solid experience with SQL tuning and understanding the execution plan will be key to this role.
Please ensure to complete the job assessment questions right after you submit your application for this opening.
Required:
– Bachelor’s Degree in Computer Science, Math, Business, or related field
– Three (3) years software development experience
– Working knowledge of JAVA, Spring, AJAX, JSF, Hibernate, JQuery, and JUnit
– Solid experience with SQL tuning and understanding the execution plans
– Strong understanding of multi-tiered web applications and scalability challenges
Preferred:
– Five (5) or more years software development experience
– Experience developing large software systems
– Experience with web services (HTTP, JSON, SOAP, XML), Unix/Linux environments, and distributed systems (HADOOP)
– Experience in software and algorithmic development for molecular biology, biotechnology, microarrays or proteomics.
We offer competitive pay, generous paid time off, transportation discounts, and employee reward and recognition programs.
Thanks!
Matt
1
Generally, the language used in job ads is grounded and uses similar constructions all the time. Applying a complex NLP would be too much for it.
One possible approach is to match using tags assigned to job ads and resumes. A rule based system would be faster to implement for a limited data sets but might be harder to maintain, e.g. adding new keywords. Machine learning approach can be more flexible but needs a training set what means an additional cost.
Resumes and job ads usually use a little bit different phrases. Sometimes a candidate doesn’t mention all technologies or mention a similar one. You can enhance your system by having a taxonomy of the keywords: parent keywords (Web technology -> HTML; CSS; JavaScript) or similar keywords ( Python <=> Ruby )
If you have an existing data for matching resumes to job ads, you can leverage it using machine learning. The system learns which phrases goes together, which keywords are or aren’t important, plus you can define rules for extracting more specific features like the needed experience (junior, senior), offered experience (e.g. based on the details in experiences section of resume) and so on.
0