| NEWS | R Documentation |
NEWS file for the selectr package
Changes in version 0.6-0
This release is a fairly major change. It comes about via a port to a Rust implementation which uses a robust browser-quality selector parsing engine. With behavioural comparison tests, several bug fixes, features, and performance enhancements were identified and resolved.
NEW FEATURES
Added support for case-sensitivity flags in attribute selectors:
[attr="value" i]matches the attribute value ASCII case-insensitively, while[attr="value" s]explicitly requests the default case-sensitive matching.The
:requiredand:optionalpseudo-classes are now supported. With thehtmlorxhtmltranslator they match form elements (inputother thantype="hidden",select,textarea) by the presence or absence of therequiredattribute; with thegenerictranslator they translate to a never-matching expression, as:checkeddoes.The
:focus-withinand:focus-visiblepseudo-classes are now accepted and, like the rest of the user-action family (:focus,:hover,:active, ...), translate to a never-matching expression, since a static document has no such dynamic state. They previously raised "The pseudo-class ... is unknown".Added support for a leading
:scopepseudo-class, anchoring the selector at the queried node:querySelectorAll(node, ":scope > a")returns only theachildren ofnode.css_to_xpath()translates such selectors with the XPathselfaxis in place of theprefixargument (":scope > a"becomes"self::*/a"). A:scopeanywhere else in a selector cannot be expressed in XPath 1.0 and is rejected with a clear error.-
:not()may now appear inside the arguments of functional pseudo-classes (e.g.:not(:not(a)),:is(:not(.x)),:nth-child(2 of :not(.foo))). -
:has()now supports leading combinators in its arguments (e.g.e:has(> img),e:has(~ p),e:has(+ p)). Functional pseudo-class arguments may now be complex selectors (contain combinators):
:is(a b),:not(a > b),e:has(> a b), andli:nth-child(2 of ol li)previously produced an "Expected an argument" error and are now translated.
MINOR CHANGES
The performance of selectr has been improved roughly 2x. There are several smaller changes that have contributed to this.
The unsupported Selectors 4 column combinator now raises "The column combinator '||' is not supported" instead of a raw tokenizer error, and an unknown functional pseudo-class is reported with the user's hyphenated spelling (
:nth-col(), previously:nth_col()).Generated XPath expressions now include parentheses only when precedence requires them (an or-expression joined with another condition):
e[id][title]translates toe[@id and @title]rather thane[(@id) and (@title)]. The expressions are semantically unchanged, but code comparingcss_to_xpath()output against stored strings will need updating.Attribute blocks, functional pseudo-classes, and strings left unclosed at the end of a selector are now auto-closed, as css-syntax requires:
[rel=stylesheetand:lang(frparse and translate exactly as their closed forms, as does a string missing its closing quote.The stringr and methods dependencies have been dropped. R6 is now the only package selectr imports, shrinking the install footprint.
-
css_to_xpath()now translates each distinct combination of selector, prefix, and translator only once per call, so duplicates in a vectorized call are not re-translated. The adjacent sibling combinator no longer emits a tautological
[self::*]predicate when the right-hand side does not name an element:h1 + *[rel=up]now translates toh1/following-sibling::*[1][@rel = 'up'].The minimum required version of R has been increased from
3.0to3.3, matching what the code (which usesstartsWith()) has in fact required for some time.
REMOVED FEATURES
The non-standard extensions inherited from the Python 'cssselect' package — the
:contains("text")pseudo-class and the[attr!=value]attribute operator — have been removed and now produce an error. Standard alternatives::not([attr=value])for the former operator, and an XPathcontains(., 'text')predicate applied outside of selectr for:contains().
BUG FIXES
With the
htmlorxhtmltranslator,:any-linknow matches the same elements as:link.:any-linkmeans ":link or :visited", a superset of:link, but it previously translated to a never-matching expression while:linkmatched - the subset relation inverted. (In a static document no link is visited, so the two pseudo-classes coincide.)Escape sequences in identifiers, hashes, and strings are now decoded in a single left-to-right pass, as css-syntax requires, so an escaped backslash followed by hex digits is no longer decoded twice:
e[foo="x\\79 z"]now matchesx\79 zrather thanxyz.A prefixed wildcard inside a pseudo-class argument (e.g.
:is(svg|*)) now translates to the node testself::svg:*rather than the never-matching comparisonname() = 'svg:*'.When
:lang()is given an invalid argument after valid ones (e.g.:lang(en, 5)), the error now reports the offending argument rather than the first one.Strings containing a raw newline are now rejected with "Unclosed string", as the CSS grammar requires.
The namespace argument of the
querySelector()family of functions now signals an error when given a list containing an element that is not a single string; previously the prefix-to-URI pairing was silently corrupted.-
:only-childand:only-of-typenow match the root element, consistent with:first-child:last-child, which Selectors defines:only-childto be equivalent to. The HTML translator's
:enabledand:disabledpseudo-classes now matchinputelements that have notypeattribute (which default totype=text).-
:dir()now enforces its CSS Selectors Level 4 argument grammar of exactly one identifier, rejecting the strings, wildcards, and lists (e.g.:dir(ltr, rtl)) it previously accepted via the shared:lang()grammar. Prefixed element names inside pseudo-class arguments (e.g.
:is(svg|g),:nth-child(2 of svg|g)) are now matched with an XPath name test (self::svg:g, or the path step.//svg:gfor:has()) instead of aname()string comparison, so the prefix resolves through the namespace map supplied at evaluation time (URI-based), just as at the top level of a selector.The no-namespace form
|enow retains its namespace constraint when the element name cannot be written as an XPath name test (e.g. a Unicode name such as|é); previously such names also matched in a default namespace.The of-type pseudo-classes now work on element names that cannot be written as an XPath name test (e.g.
é:only-of-type,*|e:first-of-type); previously these failed with the misleading error "*:only-of-type is not implemented".The
querySelector()andquerySelectorAll()methods for xml2 documents now accept a named list as thensargument, consistent with the methods for XML documents.The
querySelector()family of functions now signals an error when theselectorargument is not a single character string; previously all but the first selector were silently ignored when querying the document.-
css_to_xpath()now signals an error when any of its arguments containNAvalues; previouslyNAs were removed before recycling, silently shifting the pairing of the remaining values. -
An+Bexpressions are now matched ASCII case-insensitively, as required by CSS Syntax, so:nth-child(2N),:nth-child(ODD), and:nth-child(EVEN)are no longer rejected. -
An+Bexpressions now only permit whitespace around the sign that separates theBvalue, so an invalid selector such as:nth-child(3 7)is rejected rather than silently treated as:nth-child(37). -
An+Bexpressions now reject non-integer values (e.g.:nth-child(1.9),:nth-child(2e1)) rather than silently truncating them. Computing the specificity of a
:has()selector with a single argument (e.g.e:has(img)) no longer fails with "incorrect number of dimensions".Computing the specificity of a single-argument
:is()or:matches()selector no longer fails, and the specificity of the compound the pseudo-class is attached to is no longer dropped:div:is(.foo)now reports (0, 1, 1) rather than (0, 1, 0).Nesting
:has()inside:has()(e.g.section:has(article:has(div))) is now rejected, as required by CSS Selectors Level 4 and matching browsers; sibling uses such ase:has(a):has(b)remain valid.-
:lang()and:dir()no longer accept a lone-, which is not a valid CSS identifier. Attribute selectors with an empty value (e.g.
[attr=""]) no longer throw an error for the=and|=operators.The HTML translator no longer lowercases attribute values, only attribute names, so
[data-state="Active"]no longer silently misses matches.The any namespace selector
*|eand the no namespace selector|eno longer both collapse to the bare namee:*|enow translates to*[local-name() = 'e']so it matchesein any namespace, and similarly for[*|attr].Unicode escapes are now supported in identifiers and ID selectors, not just strings, and are decoded to the characters they represent:
#\31 23(an ID starting with a digit) no longer fails to tokenize, and"\E9"matchesérather than the literal valueE9.The alternatives of
:is(),:matches(), and:where()are now grouped as a single condition:div.foo:is(.a, .b)translates tofoo and (a or b)rather thanfoo or a or b, and stacked uses such ase:is(.a):is(.b)now require both conditions rather than either.The universal selector
*is no longer silently dropped when it appears alongside other arguments in a selector list::is(div, *)now matches every element,:not(div, *)matches nothing, and:nth-child(2 of div, *)counts all siblings.Wildcard language ranges such as
:lang(en-*)now match under the generic (XML) translator; previously they translated tolang('en-'), which can never match.
Changes in version 0.5-1
BUG FIXES
Fixed handling of CSS unicode escapes in attribute values. This would be observed when the attribute value contained hexadecimal sequences like
(abcdef)where the characters inside the parentheses were not properly escaped. This fix ensures that such sequences are correctly translated to their XPath equivalents. Thanks to André Veríssimo for reporting the issue.
Changes in version 0.5-0
NEW FEATURES
Added support for CSS Selectors Level 4 pseudo-classes
:is(),:where(), and:has(). The:is()pseudo-class matches elements against a list of selectors, taking the maximum specificity from its arguments. The:where()pseudo-class works similarly to:is()but always has zero specificity. The:has()pseudo-class represents an element if any of the relative selectors match when anchored against that element, with specificity calculated from the maximum of its arguments.Added support for complex selectors in
:nth-child()and:nth-last-child()using theof Ssyntax (e.g.,:nth-child(2 of .foo)). This allows matching the nth child that matches a specific selector or selector list.Extended
:not()to accept multiple selectors separated by commas (e.g.,:not(.foo, #bar)), following CSS Selectors Level 4. Specificity is now calculated as the maximum specificity among all arguments, rather than the sum.Added support for additional CSS Selectors Level 4 pseudo-classes:
:any-link,:target-within, and:local-link. These pseudo-classes do not match any elements in static XML/HTML documents and translate to XPath expressions that always evaluate to false. For now, most of the new Level 4 pseudo-classes that depend on dynamic document state e.g.:user-validand:placeholder-shownare not implemented, but may be at a future date to be non-matching selectors.The
:lang()and:dir()pseudo-classes now support multiple comma-separated arguments (e.g.,:lang(en, fr, de)).Added
:matches()as a backwards-compatible alias for:is().
MINOR CHANGES
Improved sibling selector translation to use a more compact form. For the adjacent sibling combinator
a + b, the generated XPath now usesa/following-sibling::*[1][self::b]instead ofa/following-sibling::*[(name() = 'b') and (position() = 1)].The descendant combinator
a bnow usesa//binstead ofa/descendant::bfor a more concise XPath. Unfortunately a similar optimisation cannot be applied to in general when attempting to replacedescendant-or-self::with.//aas it would prevent root nodes being matched correctly.Improved validation of CSS selector arguments. Better error messages are now provided when pseudo-elements appear inside functional pseudo-classes where they are not permitted (e.g., inside
:is(),:matches(),:where(), or:has()).Enhanced input validation for
:lang()and:dir()pseudo-classes to ensure proper argument formatting and to reject invalid or empty language tags.Improved handling of edge cases in selector parsing, including better validation of class selector syntax and more robust handling of null or missing element components.
Simplified method registration for XML and xml2 objects. No longer necessary to hook into package load/unload events.
Changes in version 0.4-2
MINOR CHANGES
Improve handling of vectors of length > 1 in logical comparison. Contributed by Garrick Aden-Buie.
Minor improvements to error message construction. Contributed by Michael Chirico.
Changes in version 0.4-1
BUG FIXES
When the R.oo package is attached, the use of class selectors no longer worked. This is due to the use of the
Classname for R.oo's base class object, where selectr was also using it (but not exporting) the same name ofClassfor representing a class selector. Consequently, selectr's code was changed to rename the class to avoid any clashing. Because it was not exported, this is purely an internal change. Thanks to Francois Lemaire-Sicre for reporting the issue.
Changes in version 0.4-0
MINOR CHANGES
Large rewrite of internals to use the R6 OO system instead of Reference Classes. This does not affect any external facing code as the results should be identical to the previous implementation, which is why this change is marked as minor. Initial and crude performance testing (by running the test suite) indicates that the R6 implementation is approximately twice as fast at generating XPath as the Reference Classes implementation.
The minimum required version of R for selectr has been increased from
2.15.2to3.0as that is the minimum required version of R6.Minor performance enhancements have been made. Not only is R6 faster than Reference Classes, the use of string formatting has been replaced with string concatenation. Additionally dynamic calling of methods via
do.call()has been replaced with direct method calls.
BUG FIXES
The issues in previous releases where methods can sometimes be missing should now be resolved. The bug appeared to lie in core Reference Classes code. By switching to R6, this type of issue should no longer be possible.
Changes in version 0.3-2
MINOR CHANGES
Improved method registration for XML and xml2 objects. Avoids checks on each use and is only performed once per dependent package load/unload.
BUG FIXES
In some environments, reference class methods were missing at runtime. This appears to be due to some internal behaviour in them methods package where methods are registered on an objects when the
$operator is used for a field or method. Instead, when a method is missing, they are manually bound to the object.
Changes in version 0.3-1
MINOR CHANGES
Enabled partial matching on the translator argument to
css_to_xpath(). Instead of defaulting to a generic translator, a non-matching argument will be returned with an error.Introduced many more unit tests via the covr package. This enabled dead code to be trimmed and also identified areas of code which needed improvement. Minor enhancements include: tolerate whitespace within a
:not(), more consistent results returned from parser methods, improvements to argument parsing.
BUG FIXES
The
|=attribute matching operator was not being parsed correctly for the generic translator.Handle scenario where a CSS comment is unclosed. Results in everything after the comment start to be removed (which may or may not result in a valid selector).
Changes in version 0.3-0
MAJOR CHANGES
Added support for documents from the xml2 package.
selectr now also does not strictly depend on the XML package. If either the XML or xml2 packages are present (which are required for the
querySelectormethods to work) thenquerySelectorwill begin to work for them. This also enables selectr to be used for translation-only.
BUG FIXES
Improve support for nth-*(an+b) selectors. Ported from cssselect.
Changes in version 0.2-3
MINOR CHANGES
Code cleanup contributed by Kun Ren (#1).
Updated DESCRIPTION to include URL and BugReports fields. Also update email address.
BUG FIXES
Fix behaviour for nth-*(an+b) pseudo-class selectors for negative a's. Contributed to cssselect by Paul Tremberth, ported to R.
Escape delimiting characters to support new version of the stringr package. Probably should have been done in the first place. Reported by Hadley Wickham (#5).
Changes in version 0.2-2
MINOR CHANGES
Corrected licence to BSD 3 clause. This was the licence in use previously, but has now been made more explicit.
Removed 'Enhances' field because we import functions from XML. This choice is made because XML is a required package, rather than an optional package that can be worked with. This and the previous change have been made to keep up with recent changes in R-devel.
Changes in version 0.2-1
MINOR FEATURES
Added a 'CITATION' file which cites a technical report on the package.
-
show()methods are now available on internal objects, making interactive extensibility and bug-fixing easier. This is simply wrapping therepr()methods (mirroring the Python source) that the same objects have.
BUG FIXES
Use the session character encoding to determine whether to run unicode tests. Tests break in non-unicode sessions otherwise.
Changes in version 0.2-0
NEW FEATURES
Introduced new functions
querySelectorNS()andquerySelectorAllNS()to ease the use of namespaces within a document. Previously this would have required knowledge of XPath.
BUG FIXES
Fix meaning of
:empty, whitespace is not empty.Use
lang()for XML documents with the:lang()CSS selector.-
|identno longer produces a parsing error, but is now equivalent to just 'ident'.
Changes in version 0.1-1
BUG FIXES
Now testing unicode only in non-Windows platforms on package check. Output should still be consistent, just depends on the current charset being unicode.
Changes in version 0.1-0
NEW FEATURES
Initial port of the Python 'cssselect' package. Code is very literally ported, including the test suite.
Wrapped translation functionality into a single function,
css_to_xpath().Created two convenience functions,
querySelector()andquerySelectorAll(). These mirror the behaviour of the same functions present in a web browser.querySelector()returns a node, whilequerySelectorAll()returns a list of nodes.