Relative Content

Tag Archive for regular-expressions

How should a website validate a users mailing address?

This is for a site that relies on shipping items via UPS or FedEx. I know there is software out there that does it (http://en.wikipedia.org/wiki/Coding_Accuracy_Support_System), but if you are trying to build your own solution for a simple website.

String patterns that can be used to filter and group files

One of our application filters files in certain directory, extract some data from it and export a document from the extracted data. The algorithm for extracting the data depends on the file, and so far we use regex to select the algorithm to be used, for example .*.txt will be processed by algorithm A, foo[0-5].xml will be processed by algo B, etc.

Can the csv format be defined by a regex?

A colleague and I have recently argued over whether a pure regex is capable of fully encapsulating the csv format, such that it is capable of parsing all files with any given escape char, quote char, and separator char.

How to choose a proper parser generator for PHP

Some programmers avoid regexes in some situations (see this popular @nickf comment), perhaps using a parsing framework such as Lex/Yacc. Others prefer to stay within PHP, perhaps using regular expressions, as it avoids the need for another framework.

Domain-specific language for text search/processing?

I work for an organization that does a lot of work with government data. We have a couple of different projects where we’ve abstracted out common text search/manipulation operations into reusable libraries, for things like standardizing the way politicians’ names are displayed (e.g., transforming “MCDONALD, BOB (R-VA)” into “Bob McDonald (R-VA)”), or finding legal citations in text (e.g., finding a reference to (e.g., finding occurrences of things like “1 U.S.C. 7” in text, determining that it’s a US Code citation, and returning a structure that says it’s referring to section 1 of title 7). These are relatively simple operations, and lots of collaborators in our space would like to use them, but we end up having to pick a language in which to implement each (the former is in Python; the latter, Javascript), and we freeze out potential consumers/contributors who work in different languages and don’t want to resort to hacks like shelling out to a node process to handle their text. This all seems like a shame because what we’re expressing is so simple, and ought, one would think, to be pretty easy to share.