I’m writing a Python CSS-selector library that allows one to write these kinds of expressions in Python as a pet project. The goal of the library is to represent selectors in a flat, intuitive and interesting way; all valid syntax defined by the Selectors Level 4 Draft must be supported, in one way or another.

# lorem|foo.bar[baz^="qux"]:has(:invalid)::first-line
selector = (Namespace('lorem') | Tag('foo')) 
           .bar 
           # Can also be written as [Attribute('baz').starts_with('qux')]
           [Attribute('baz', '^=', 'qux')] 
           # '>>' is used instead of ' '.
           [:'has', (Selector.SELF >> PseudoClass('invalid'),)] 
           [::'first-line']

Here’s how the hierachy looks like (/ signifies an alias, () mixin superclasses):

Selector(ABC)  # Enum too?
├── PseudoElement
├── ComplexSelector(Sequence[CompoundSelector | Combinator])
├── CompoundSelector(Sequence[SimpleSelector])
├── SimpleSelector
│   ├── TypeSelector / Tag
│   ├── UniversalSelector
│   ├── AttributeSelector / Attribute
│   ├── ClassSelector / Class
│   ├── IDSelector / ID
│   └── PseudoClass
├── SELF / PseudoClass('scope')
└── ALL / UniversalSelector()

Combinator
├── ChildCombinator: '__gt__' / '>'
├── DescendantCombinator: '__rshift__' / '>>'
├── NamespaceSeparator: '__or__' / '|'
├── NextSiblingCombinator: '__add__' / '+'
├── SubsequentSiblingsCombinator: '__sub__' / '-'
└── ColumnCombinator: '__floordiv__' / '//'

This design has some disadvantages:

  • The replacements of combinators:

    • Descendant combinator ( ) → right shift (>>)
    • Column combinator (||) → floor division (//)
    • Subsequent-siblings combinator (~) → minus/subtract (-)

    >> and // are currently not valid combinators, but may be in the future. The last is much safer, since - is already considered a valid character for <ident-token>s.

  • Functional pseudo-classes needs a comma between its name (a string/non-callable) and its arguments (a tuple):

    • [:'where', (Class('foo'), Class('bar'))]

Those disadvantages might need to be considered while modifying the design around the limitations:

  • HTML classes with hyphens cannot be added with Python dotted attribute syntax (.foo-bar); not to mention, this also means that any classes that implement this syntax using __getattr__/__getattribute__ won’t be able to have any methods.
  • Currently there is no way to add an ID in the middle of a compound selector. Since Python doesn’t have a # operator I’m at a loss. I have thought about overloading __call__ but Tag('foo').bar('baz') or Tag('foo')[Attribute('qux')]('baz') would look too much like a normal method call.

How should I go about working around these limitations?

5

Because Python has very different syntax and semantics from CSS selectors, I think these problems will only get worse. You’ll end up with something that doesn’t look like CSS does and something that doesn’t work like Python usually does. Therefore I would like to propose a different way of approaching the syntax.

CSS selectors are mostly a linear combination of simple selectors and combinators. I would suggest using that, and representing something like ns|p a:link as something like Tag('p', namespace='ns') + Descendant() + Tag('a') + PseudoClass('Link').

That is, you only use a single magic method to represent concatenation. Everything else is just regular Python objects, using regular Python constructors.

Your example could be

# lorem|foo.bar[baz^="qux"]:has(:invalid)::first-line
selector = Tag('foo', namespace='lorem) + 
           Class('bar') + 
           Attribute('baz', '^=', 'qux') + 
           PseudoClass('has', Selector.SELF + Descendant() + PseudoClass('invalid')) + 
           PseudoElement('first-line')

It may not be exactly what you were looking for, but it has the advantage that it is much easier to learn for Python users because it has much fewer rules and exceptions, and you don’t need to worry about new selectors or incompatible syntax.

You can also use & instead of +, in which case you can represent a selector list with |, for example: p.warning, #bigwarning can become Tag('p') & Class('warning') | ID('bigwarning').


An alternative idea is to use no magic at all, and represent compound and/or complex selectors using lists or wrapper objects.

foo.bar > a might be something like Child([Tag('foo'), Class('Bar')], [Tag('a')]) (compound selectors are lists, complex selectors are wrapped by combinators) or [Tag('foo'), Class('bar'), Child(), Tag('a')] (complex selectors are lists containing the selectors).

The best option depends on ergonomics, and the ergonomics depend on how users will build and manipulate selectors and for what purpose.

You want to represent CSS concepts
using valid python syntax which “looks like” the CSS source text.

Simplest approach would be stick with straight CSS source text,
which we can roundtrip through deserializers and serializers.
Representing punctuation-heavy CSS as python source
will be inherently lossy, so you’re going to have to
store the details somewhere, perhaps in a global dict
or in various “””docstrings”””.

It would be worthwhile to explicitly write down your
various goals and tradeoffs.
For example, getting IDE navigation / autocompletion “for free” might be
one of the things you find attractive about your proposed scheme.

Python notation has been exploited for representing
SI
units,
algebra,
vector math, and
pathnames.
The notation is already a good fit for these domains,
in some cases because the language strove to be a good fit.
So lossless representation can often be achieved.

There are two mature problem domains that you might
wish to take inspiration from.


sqlalchemy

The SQLAlchemy
community uses at least one python DSL, arguably multiple ones,
to represent SQL operations.

The impedance match is not perfect. Operator
precedence
is a bit of a rough edge, with a OR b turning
into (a) | (b) when the two terms are complex.
For some operators, such as IN, we resort to
.in_() method call notation despite the
in
keyword seeming to be available.

Table or column names in principle can incorporate SPACE
and many other characters, especially when “`”
or other quoting mechanisms are used.
But in practice DBAs will often choose to adhere
to a conservative regex such as r("^w+$").
Your approach might offer enough advantages that
web designers would choose to adhere to conservative
naming conventions, so e.g. “a-b” –> a_b –> “a-b”
could be safely round-tripped.

SQL JOINs are commonly more than a hundred lines long,
and a great many production queries have been recast to fit
within this DSL.


type annotation

Type hinting continues to be something of a moving
target in the python community.
An application’s source code might be read by
an “old” or “new” interpreter, or type checker.

Expressing types in a back-compatible way
for old interpreters or checkers has been
a source of tension, often relieved via
a string annotation escape valve.
Forward references sometimes raise challenges
that are resolved in the same way.
In recent years we’ve had less need for this escape valve.

We see annotations appearing in the AST,
and also in comment text.

The experiences of the type annotation community
seem most relevant to your CSS goals.


Your goals are still a bit nebulous at this point.
Several developer communities have traveled down this road,
showing what works well, or poorly, or would work better after
adopting some PEPs.
You may be able to draw inspiration, learn from mistakes,
and better predict a path to success
by looking back at this history and
incorporating some elements in your project goals.

2