Discussion:
Text property searching
(too old to reply)
Lars Ingebrigtsen
2018-04-15 22:56:00 UTC
Permalink
I've suggested a few times before that it would be nice to have search
functions that are... nicer... than the ones we have now
(`text-property-any' and `next-single-property-change'), and the
maintainer(s) at the time said "sure". I think.

But I never implemented that wonderful function, because I could never
decide what it would look like.

But last night I think I got it: It should be just like search-forward,
only not. (Hm. I'm feeling a slight sense of deja vu while typing
this -- have I had this revelation before but forgotten about it?)

Anyway:

Let's say you have a region in the buffer that has the text property
`shr-url' with the value "http://fsf.org/", then:

(text-property-search-forward 'shr-url "http://fsf.org/" t)

would place point at the end of that region, and `match-beginning' and
`match-end' would point to the start and end.

The `t' there is the predicate: `t' means "equal", `nil' means "not
equal", and then you can write your own predicates for other uses.

So, to collect all urls from text properties, you'd write:

(while (text-property-search-forward 'shr-url nil nil)
(push (get-text-property (match-beginning 0) 'shr-url) urls))

and that's it. Or to collect all images:

(while (text-property-search-forward 'display 'image
(lambda (elem val)
(and (consp elem)
(eq (car elem) val))))
(push (plist-get (cdr (get-text-property (match-beginning 0) 'display)) :data)
images))

Does this look OK to everybody?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
T.V Raman
2018-04-16 00:00:22 UTC
Permalink
LGTM -- it would make a lot of the code in emacspeak for EWW support a
lot nicer to start with:-)
--
Dmitry Gutov
2018-04-16 04:40:52 UTC
Permalink
Post by Lars Ingebrigtsen
Let's say you have a region in the buffer that has the text property
(text-property-search-forward 'shr-url "http://fsf.org/" t)
would place point at the end of that region, and `match-beginning' and
`match-end' would point to the start and end.
Sounds quite nice.
Post by Lars Ingebrigtsen
The `t' there is the predicate: `t' means "equal", `nil' means "not
equal", and then you can write your own predicates for other uses.
"Equals or includes" should be another popular predicate (think faces).
Lars Ingebrigtsen
2018-04-16 12:01:23 UTC
Permalink
Post by Dmitry Gutov
Post by Lars Ingebrigtsen
The `t' there is the predicate: `t' means "equal", `nil' means "not
equal", and then you can write your own predicates for other uses.
"Equals or includes" should be another popular predicate (think faces).
Yes, that's true... We could have a special symbol for that, or would
it be confusing?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Dmitry Gutov
2018-04-16 13:04:13 UTC
Permalink
Post by Dmitry Gutov
Post by Lars Ingebrigtsen
The `t' there is the predicate: `t' means "equal", `nil' means "not
equal", and then you can write your own predicates for other uses.
"Equals or includes" should be another popular predicate (think faces).
Yes, that's true... We could have a special symbol for that<...>
I'd like that.
Lars Ingebrigtsen
2018-04-16 15:11:04 UTC
Permalink
Post by Dmitry Gutov
Post by Dmitry Gutov
"Equals or includes" should be another popular predicate (think faces).
Yes, that's true... We could have a special symbol for that<...>
I'd like that.
On the other hand, perhaps we should just have a general predicate for
this not-uncommon thing? `equal-or-member'? It would literally be

(defun equal-or-member (thing collection)
(or (equal thing collection)
(and (consp collection)
(member thing collection))))

We have a lot of data structures that can be lists or atoms, so I think
it might be generally useful...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Eli Zaretskii
2018-04-16 18:06:24 UTC
Permalink
Date: Mon, 16 Apr 2018 17:11:04 +0200
(defun equal-or-member (thing collection)
(or (equal thing collection)
(and (consp collection)
(member thing collection))))
This seems to assume a flat one-level list, but some popular
properties are more complex. E.g., see the 'display' properties.
Lars Ingebrigtsen
2018-04-16 18:30:01 UTC
Permalink
Post by Eli Zaretskii
Date: Mon, 16 Apr 2018 17:11:04 +0200
(defun equal-or-member (thing collection)
(or (equal thing collection)
(and (consp collection)
(member thing collection))))
This seems to assume a flat one-level list, but some popular
properties are more complex. E.g., see the 'display' properties.
Hm, yes that's true, so perhaps it wouldn't be all that useful here
anyway...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Drew Adams
2018-04-16 14:30:26 UTC
Permalink
Post by Lars Ingebrigtsen
Post by Dmitry Gutov
Post by Lars Ingebrigtsen
The `t' there is the predicate: `t' means "equal", `nil' means "not
equal", and then you can write your own predicates for other uses.
"Equals or includes" should be another popular predicate (think faces).
Yes, that's true... We could have a special symbol for that, or would
it be confusing?
FWIW -

My library `isearch-prop.el' has long let you Isearch zones
that have arbitrary text-property or overlay-property values.

I agree that an eq/equal-or-memq/member predicate can be
useful. But it's not really enough when it comes to dealing
with properties, including but not limited to `face' and
similar (whose values can combine for an accumulated effect).

Like what you propose, the code I use lets you use an
arbitrary predicate, but matching allows for matches that
involve overlap of property values, in this sense: If the
PROPERTY value is an atom then it must be a member of the
set of test VALUES, but if the PROPERTY value is a list,
then at least one of its elements must be a member of VALUES.

https://www.emacswiki.org/emacs/download/isearch-prop.el

---

This is the crux of the property-matching & predicate code:

(defun isearchp-property-matches-p (type property values
match-fn position)
"Return non-nil if POSITION has PROPERTY with a value matching VALUES.
TYPE is `overlay', `text', or nil, and specifies the type of property.
TYPE nil means look for both overlay and text properties. Return
non-nil if either matches.

Matching means finding text with a PROPERTY value that overlaps with
VALUES: If the value of PROPERTY is an atom, then it must be a member
of VALUES. If it is a list, then at least one list element must be a
member of VALUES.

MATCH-FN is a binary predicate that is applied to each item of VALUES
and a zone of text with property PROP. If it returns non-nil then the
zone is a search hit."
(let* ((ov-matches-p nil)
(txt-matches-p nil)
(ovchk-p (and (or (not type) (eq type 'overlay))))
(ovs (and ovchk-p (overlays-at position))))
(when ovchk-p
(setq ov-matches-p
(catch 'i-p-m-p
(dolist (ov ovs)
(when (isearchp-some
values (overlay-get ov property) match-fn)
(throw 'i-p-m-p t)))
nil)))
(when (and (or (not type) (eq type 'text)))
(setq txt-matches-p
(isearchp-some
values (get-text-property position property) match-fn)))
(or ov-matches-p txt-matches-p)))

(defun isearchp-property-filter-pred (type property values)
"Return a predicate that uses `isearchp-property-matches-p'.
TYPE, PROPERTY, and VALUES are used by that function.
The predicate is suitable as a value of `isearch-filter-predicate'."
(let ((tag (make-symbol "isearchp-property-filter-pred")))
`(lambda (beg end)
(and (or (not (boundp 'isearchp-reg-beg))
(not isearchp-reg-beg)
(>= beg isearchp-reg-beg))
(or (not (boundp 'isearchp-reg-end))
(not isearchp-reg-end)
(< end isearchp-reg-end))
(or (isearch-filter-visible beg end)
(not (or (eq search-invisible t)
(not (isearch-range-invisible beg end)))))
(catch ',tag
(while (< beg end)
(let ((matches-p
(isearchp-property-matches-p
',type ',property
',values
(isearchp-property-default-match-fn ',property)
beg)))
(unless (if matches-p
(not isearchp-complement-domain-p)
isearchp-complement-domain-p)
(throw ',tag nil)))
(setq beg (1+ beg)))
t)))))
Eli Zaretskii
2018-04-16 17:52:25 UTC
Permalink
Date: Mon, 16 Apr 2018 14:01:23 +0200
Post by Dmitry Gutov
"Equals or includes" should be another popular predicate (think faces).
Yes, that's true... We could have a special symbol for that, or would
it be confusing?
An alternative would be to define "meta-properties", like
'foreground-color', 'font', 'weight', etc.
Lars Ingebrigtsen
2018-04-16 18:31:03 UTC
Permalink
Post by Eli Zaretskii
An alternative would be to define "meta-properties", like
'foreground-color', 'font', 'weight', etc.
Yes, that might also be nice...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
João Távora
2018-04-16 15:16:01 UTC
Permalink
Hi,
Post by Lars Ingebrigtsen
(text-property-search-forward 'shr-url "http://fsf.org/" t)
would place point at the end of that region, and `match-beginning' and
`match-end' would point to the start and end.
Great idea, I've wanted this badly in the past, too. Two cents:

1. What should happen if search starts in the region where the
property is already set?

2. Can we generalize this to work for searches for regions where the
property is set to some constant value and also for regions where the
property is just present. What about "not-present"? Or do you envision
this to be handled by the second and third arguments? Perhaps, in
addition to the other type of value, both could also be passed a
function: the second one a function of one arg, the buffer position,
producing a value, and the third one a function of two values
returning a boolean (this is vaguely CL's :key and :test, obviously).

Bye,
João
Lars Ingebrigtsen
2018-04-16 16:09:55 UTC
Permalink
Post by João Távora
1. What should happen if search starts in the region where the
property is already set?
I think it should give you a match starting at point and ending where
the property ends.
Post by João Távora
2. Can we generalize this to work for searches for regions where the
property is set to some constant value and also for regions where the
property is just present. What about "not-present"?
Well, that's what the two arguments do -- the match and the predicate,
so those are covered...
Post by João Távora
Or do you envision this to be handled by the second and third
arguments? Perhaps, in addition to the other type of value, both could
also be passed a function: the second one a function of one arg, the
buffer position, producing a value, and the third one a function of
two values returning a boolean (this is vaguely CL's :key and :test,
obviously).
Hm... I don't quite see the need for the single-value function (i.e.,
the :key function) because the predicate can do whatever it wants.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
João Távora
2018-04-16 16:54:29 UTC
Permalink
[I missed emacs-devel in my last email, sorry. Should use Gnus :-)]
Just a CL-style convenience (wouldn't your reasoning apply to
the second argument in general?). Perhaps, as an example,
you could clarify a bit better what exactly is passed to the third
function in case the second property [i meant argument] is nil.
The arguments to the predicate will be the same in any case -- the first
argument is the VALUE (in this case nil) and the second is the text
property value.
It'll probably become a bit clearer after I've written
the function and some documentation and added some examples.
I'm typing away as we speak. I mean mail. :-)
OK. In your example tho, I think will need to distinguish the case where
the property's value is nil from the case where the property isn't set at
all.
Lars Ingebrigtsen
2018-04-16 16:57:21 UTC
Permalink
Post by João Távora
OK. In your example tho, I think will need to distinguish the case where
the property's value is nil from the case where the property isn't set at
all.
Hm... is that a distinction that makes a difference anywhere?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
João Távora
2018-04-16 17:30:10 UTC
Permalink
Post by Lars Ingebrigtsen
Post by João Távora
OK. In your example tho, I think will need to distinguish the case where
the property's value is nil from the case where the property isn't set at
all.
Hm... is that a distinction that makes a difference anywhere?
Perhaps we are misunderstanding each other. I asked you earlier
if the new function can grab regions of the buffer where a particular
property isn't present, which is different from that property being nil.
Lars Ingebrigtsen
2018-04-16 17:35:23 UTC
Permalink
Post by João Távora
Perhaps we are misunderstanding each other. I asked you earlier
if the new function can grab regions of the buffer where a particular
property isn't present, which is different from that property being nil.
I misunderstood. No, I wasn't aware that there was any difference
between "not being present" and "being nil" for text properties.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Lars Ingebrigtsen
2018-04-16 18:26:59 UTC
Permalink
Below is a draft of the documentation of this function. Does it all
make sense? :-)

Should we perhaps go for a shorter name for this function? It's a bit
of a mouthful, but I don't really have any ideas for a good, snappy name
here...

-- Function: text-property-search-forward prop value predicate
Search for the next region that has text property PROP set to VALUE
according to PREDICATE.

This function is modelled after ‘search-forward’ and friends in
that it moves point, but it returns a structure that describes the
match instead of returning it in ‘match-beginning’ and friends.

If the text property can’t be found, the function returns ‘nil’.
If it’s found, point is placed at the end of the region that has
this text property match, and a ‘prop-match’ structure is returned.

PREDICATE can either be ‘t’ (which is a synonym for ‘equal’), ‘nil’
(which means “not equal”), or a predicate that will be called with
two parameters: The first is VALUE, and the second is the value of
the text property we’re inspecting.

In the examples below, imagine that you’re in a buffer that looks
like this:

This is a bold and here's bolditalic and this is the end.

That is, the “bold” words are the ‘bold’ face, and the “italic”
word is in the ‘italic’ face.

With point at the start:

(while (setq match (text-property-search-forward 'face 'bold t))
(push (buffer-substring (prop-match-beginning match) (prop-match-end match))
words))

This will pick out all the words that use the ‘bold’ face.

(while (setq match (text-property-search-forward 'face nil t))
(push (buffer-substring (prop-match-beginning match) (prop-match-end match))
words))

This will pick out all the bits that have no face properties, which
will result in the list ‘("This is a " "and here's " "and this is
the end")’ (only reversed, since we used ‘push’).

(while (setq match (text-property-search-forward 'face nil nil))
(push (buffer-substring (prop-match-beginning match) (prop-match-end match))
words))

This will pick out all the regions where ‘face’ is set to
something, but this is split up into where the properties change,
so the result here will be ‘"bold" "bold" "italic"’.

For a more realistic example where you might use this, consider
that you have a buffer where certain sections represent URLs, and
these are tagged with ‘shr-url’.

(while (setq match (text-property-search-forward 'shr-url nil nil))
(push (prop-match-value match) urls))

This will give you a list of all those URLs.

---

Hm... it strikes me now that the two last parameters should be
optional, since (text-property-search-forward 'shr-url) would then be
even more obvious in its meaning.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Eli Zaretskii
2018-04-16 18:52:23 UTC
Permalink
Date: Mon, 16 Apr 2018 20:26:59 +0200
-- Function: text-property-search-forward prop value predicate
Search for the next region that has text property PROP set to VALUE
according to PREDICATE.
This function is modelled after ‘search-forward’ and friends in
that it moves point, but it returns a structure that describes the
match instead of returning it in ‘match-beginning’ and friends.
I thought you were designing a command (since search-forward is a
command). But t looks like this is just another API, in which case I
must ask how is it different from the existing primitives.

A command will make a lot more sense to me.

Thanks.
Lars Ingebrigtsen
2018-04-16 19:01:08 UTC
Permalink
Post by Eli Zaretskii
I thought you were designing a command (since search-forward is a
command). But t looks like this is just another API, in which case I
must ask how is it different from the existing primitives.
The existing primitives are really awkward to work with. Whenever I
have to implement something that picks out data based on text
properties, it's an awful experience. There's so much you have to do by
hand based on whether you're already in a region, or after it, and
getting all the details right with `next-single-property-change' is so
enervating that I usually resort to just looping over all the characters
in the region and examine them one by one.

The new function allows a method of working that's natural if you've
ever worked on Emacs before (i.e., `search-forward').
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Eli Zaretskii
2018-04-16 19:48:49 UTC
Permalink
Date: Mon, 16 Apr 2018 21:01:08 +0200
The existing primitives are really awkward to work with. Whenever I
have to implement something that picks out data based on text
properties, it's an awful experience. There's so much you have to do by
hand based on whether you're already in a region, or after it, and
getting all the details right with `next-single-property-change' is so
enervating that I usually resort to just looping over all the characters
in the region and examine them one by one.
The new function allows a method of working that's natural if you've
ever worked on Emacs before (i.e., `search-forward').
If this is a convenience function, we don't need to discuss it so
much. Just whip up whatever you need and see if it makes things
easier for you. Given enough time, we can see if it's time-proven
enough to be honored into subr.el and.or to be extended.
Lars Ingebrigtsen
2018-04-16 19:53:50 UTC
Permalink
Post by Eli Zaretskii
If this is a convenience function, we don't need to discuss it so
much. Just whip up whatever you need and see if it makes things
easier for you. Given enough time, we can see if it's time-proven
enough to be honored into subr.el and.or to be extended.
Ok; should I put it in subr-x.el in the meantime?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Eli Zaretskii
2018-04-16 19:59:44 UTC
Permalink
Date: Mon, 16 Apr 2018 21:53:50 +0200
Post by Eli Zaretskii
If this is a convenience function, we don't need to discuss it so
much. Just whip up whatever you need and see if it makes things
easier for you. Given enough time, we can see if it's time-proven
enough to be honored into subr.el and.or to be extended.
Ok; should I put it in subr-x.el in the meantime?
No objections from me.
Clément Pit-Claudel
2018-04-16 21:56:32 UTC
Permalink
Post by Eli Zaretskii
Date: Mon, 16 Apr 2018 21:53:50 +0200
Post by Eli Zaretskii
If this is a convenience function, we don't need to discuss it so
much. Just whip up whatever you need and see if it makes things
easier for you. Given enough time, we can see if it's time-proven
enough to be honored into subr.el and.or to be extended.
Ok; should I put it in subr-x.el in the meantime?
No objections from me.
I have one concern with subr-x: it's not particularly clear that it's qualitatively different from subr. If we add something to subr-x, isn't that mostly the same as adding it to subr?

Also, it'd be great if this new feature was distributed separately, maybe in an ELPA package, to be usable in older Emacsen.

Clément.
Lars Ingebrigtsen
2018-04-16 21:58:37 UTC
Permalink
Post by Clément Pit-Claudel
Also, it'd be great if this new feature was distributed separately,
maybe in an ELPA package, to be usable in older Emacsen.
In addition to being in core? shr is going to start relying on it like
tomorrow. :-)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
João Távora
2018-04-16 22:06:49 UTC
Permalink
Post by Lars Ingebrigtsen
Post by Clément Pit-Claudel
Also, it'd be great if this new feature was distributed separately,
maybe in an ELPA package, to be usable in older Emacsen.
In addition to being in core?
Possibly, provided a version conflict can be avoided later on (by
package.el) when
the package in used in emacs 27. It would be useful tomorrow also for
other programs not in emacs core.
Clément Pit-Claudel
2018-04-16 22:21:00 UTC
Permalink
Post by Lars Ingebrigtsen
Post by Clément Pit-Claudel
Also, it'd be great if this new feature was distributed separately,
maybe in an ELPA package, to be usable in older Emacsen.
In addition to being in core? shr is going to start relying on it like
tomorrow. :-)
Yup, just like 'seq'.
Lars Ingebrigtsen
2018-04-16 19:02:22 UTC
Permalink
But it could be a command, too, of course. I can just slap an
`interactive' spec on it...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Eli Zaretskii
2018-04-16 19:50:18 UTC
Permalink
Date: Mon, 16 Apr 2018 21:02:22 +0200
But it could be a command, too, of course.
Not with a predicate as the main method of finding "interesting"
properties, it can't. Compare with the Isearch "sub-modes" to see
what's probably needed.
Lars Ingebrigtsen
2018-04-16 19:56:01 UTC
Permalink
Post by Eli Zaretskii
Not with a predicate as the main method of finding "interesting"
properties, it can't. Compare with the Isearch "sub-modes" to see
what's probably needed.
In the simplest case, like "find the next text portion with a face
property", it can...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Drew Adams
2018-04-16 20:05:28 UTC
Permalink
Post by Lars Ingebrigtsen
Below is a draft of the documentation of this function. Does it all
make sense? :-)
(I'm going only by your doc/description, not the code, which
I don't have and won't bother to try to access.)

What if someone doesn't want to gather strings but instead
wants the match-zone limits?

E.g., instead of returning buffer substrings for the matches,
return conses (beg . end). This is (should be) mainly about
searching the _buffer_. It is not (should not be) mainly
about gathering a list of matching strings (or a defstruct
holding such a list).

IOW, this sounds wrong, to me:

This function is modelled after ‘search-forward’ and friends in
that it moves point, but it returns a structure that describes the
match instead of returning it in ‘match-beginning’ and friends.

And better than it returning (beg . end) conses is for it
to just provide access, on demand, to the matched text and
its positions using `match-data' - the usual Emacs approach.

IOW, better for it to _really_ be "modeled after
`search-forward'" - to find and return a buffer position.
(`search-forward' does not just "move point" - it returns
it.)

With `search-forward' the side effect of matching lets you
easily do various things with the `match-data' (always
only on demand). Why return a structure here? Why even
build a structure and put the relevant info into it?

Why not let the usual kind of `search-forward'-using code
work just as well with your minor variant: get whatever
info you want, on demand, from the `match-data'?

The current design sounds a bit analogous to tossing out
`match-data' in favor of just `match-string'. Except that
you even _return_ the strings, in a defstruct no less.

That might seem to be convenient for someone who always wants
the strings, but it sounds less useful generally.

Similarly, I'd think we would want all of the same optional
args and behavior as are provided by `search-forward':
limiting the search scope, raising or suppressing an error,
and repeating for a given count. That's a proven and widely
used Emacs interface.

In sum, why isn't `search-forward' a proper model in all
respects?
Post by Lars Ingebrigtsen
-- Function: text-property-search-forward prop value predicate
Search for the next region that has text property PROP set to VALUE
according to PREDICATE.
This function is modelled after ‘search-forward’ and friends in
that it moves point, but it returns a structure that describes the
match instead of returning it in ‘match-beginning’ and friends.
If the text property can’t be found, the function returns ‘nil’.
If it’s found, point is placed at the end of the region that has
this text property match, and a ‘prop-match’ structure is returned.
PREDICATE can either be ‘t’ (which is a synonym for ‘equal’), ‘nil’
(which means “not equal”), or a predicate that will be called with
two parameters: The first is VALUE, and the second is the value of
the text property we’re inspecting.
In the examples below, imagine that you’re in a buffer that looks
This is a bold and here's bolditalic and this is the end.
That is, the “bold” words are the ‘bold’ face, and the “italic”
word is in the ‘italic’ face.
(while (setq match (text-property-search-forward 'face 'bold t))
(push (buffer-substring (prop-match-beginning match) (prop-
match-end match))
words))
This will pick out all the words that use the ‘bold’ face.
(while (setq match (text-property-search-forward 'face nil t))
(push (buffer-substring (prop-match-beginning match) (prop-
match-end match))
words))
This will pick out all the bits that have no face properties, which
will result in the list ‘("This is a " "and here's " "and this is
the end")’ (only reversed, since we used ‘push’).
(while (setq match (text-property-search-forward 'face nil nil))
(push (buffer-substring (prop-match-beginning match) (prop-
match-end match))
words))
This will pick out all the regions where ‘face’ is set to
something, but this is split up into where the properties change,
so the result here will be ‘"bold" "bold" "italic"’.
For a more realistic example where you might use this, consider
that you have a buffer where certain sections represent URLs, and
these are tagged with ‘shr-url’.
(while (setq match (text-property-search-forward 'shr-url nil nil))
(push (prop-match-value match) urls))
This will give you a list of all those URLs.
---
Hm... it strikes me now that the two last parameters should be
optional, since (text-property-search-forward 'shr-url) would then be
even more obvious in its meaning.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: https://urldefense.proofpoint.com/v2/url?u=http-
3A__lars.ingebrigtsen.no&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65ea
pI_JnE&r=kI3P6ljGv6CTHIKju0jqInF6AOwMCYRDQUmqX22rJ98&m=Yw3C0DwmaGuclCaCVP
qf0h4uc8nQ0WGIsKOuB6erSDk&s=AD99bU7m0KQGk9biPMMiyY0fEF5YLeA2s_8c-
nbYakQ&e=
Lars Ingebrigtsen
2018-04-16 20:11:22 UTC
Permalink
Post by Drew Adams
What if someone doesn't want to gather strings but instead
wants the match-zone limits?
Then use the match-zone limits that are returned by the function? I
don't quite understand what you're asking here.
Post by Drew Adams
The current design sounds a bit analogous to tossing out
`match-data' in favor of just `match-string'. Except that
you even _return_ the strings, in a defstruct no less.
No, no strings are returned in the structs, only the start/end points
and the text property found.
Post by Drew Adams
Similarly, I'd think we would want all of the same optional
limiting the search scope, raising or suppressing an error,
and repeating for a given count. That's a proven and widely
used Emacs interface.
Yes, and it's rather superfluous since you already have primitives like
`narrow-to-region'. If the search functions didn't have those weird
parameters, the source of Emacs would be reduced by at least 67% by just
eliding all those " nil t"'s from every `{re-,}search-forward' call.

My math may be off by a couple of percent.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Lars Ingebrigtsen
2018-04-16 20:40:56 UTC
Permalink
I've done some testing.

(progn
(insert "foo bar zot")
(put-text-property 2 3 'face nil)
(put-text-property 5 6 'face 'bold)
(goto-char (point-min))
(goto-char (next-single-property-change (point) 'face)))

This goes to 5, not to 2, so `next-single-property-change' doesn't seem
to consider a nil value to be different from a missing value, either.

So I think it might make sense to just leave that complication out of
this function, because it's rather obscure (categories and stuff that
sounds way too complicate :-)).
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Stefan Monnier
2018-04-16 20:48:08 UTC
Permalink
Post by Lars Ingebrigtsen
This goes to 5, not to 2, so `next-single-property-change' doesn't seem
to consider a nil value to be different from a missing value, either.
There are circumstances where the absence of a property is distinguished
from a value nil for that property, but we try to keep this unusual.


Stefan
João Távora
2018-04-16 21:17:06 UTC
Permalink
Post by Lars Ingebrigtsen
So I think it might make sense to just leave that complication out of
this function, because it's rather obscure (categories and stuff that
sounds way too complicate :-)).
FWIW this goes to 2

(with-temp-buffer
(progn
(insert "foo bar zot")
(put-text-property 2 3 'face nil)
(put-text-property 5 6 'face 'bold)
(goto-char (point-min))
(goto-char (next-property-change (point)))))

but fair enough, it can always be added later with minimum breakage I
think. Though
if it's not super-complicated to implement we would be missing a good
oportunity
to cover a corner case, however obscure. Precisely one that
next-single-property-change makes it hard to handle.
Lars Ingebrigtsen
2018-04-16 21:21:07 UTC
Permalink
Though if it's not super-complicated to implement we would be missing
a good oportunity to cover a corner case, however obscure. Precisely
one that next-single-property-change makes it hard to handle.
But without relying on that function (which is what
text-property-search-forwards does), it'll have to examine every single
character "by hand" to locate the properties. And that'll be slow in
large buffers...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Lars Ingebrigtsen
2018-04-16 21:32:37 UTC
Permalink
Why slower than next-single-property-change? Because that seems to be
exactly that function does, albeit in C.
Emacs Lisp is slower than C, and calling a large number of Emacs Lisp
functions is just slow. You avoid that in the C implementation.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Alan Mackenzie
2018-04-17 19:10:24 UTC
Permalink
Hello, Lars.
Post by Lars Ingebrigtsen
I've done some testing.
(progn
(insert "foo bar zot")
(put-text-property 2 3 'face nil)
(put-text-property 5 6 'face 'bold)
(goto-char (point-min))
(goto-char (next-single-property-change (point) 'face)))
This goes to 5, not to 2, so `next-single-property-change' doesn't seem
to consider a nil value to be different from a missing value, either.
So I think it might make sense to just leave that complication out of
this function, because it's rather obscure (categories and stuff that
sounds way too complicate :-)).
"Stuff like that" is part of Emacs. Please take care that the new
functions handle category properties correctly (for whatever value of
"correctly" is appropriate). If the functions are written in Lisp, this
will probably happen automatically (but needs to be checked). If
they're written in C, category properties will need to be handled
explicitly.

If you fail to do this, people will curse you. The functions, instead
of being correct, will merely be functions which sort-of work most of
the time. And the correct advice given to hackers wishing to write
correct programs would be to avoid the new functions altogether and use
the currently existing text property searching functions instead. The
amount of effort required to document the deficiencies of the new and
other functions in the Elisp manual would be non-trivial.

Please do this properly.
Post by Lars Ingebrigtsen
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
--
Alan Mackenzie (Nuremberg, Germany).
Lars Ingebrigtsen
2018-04-17 19:16:37 UTC
Permalink
Post by Alan Mackenzie
If you fail to do this, people will curse you. The functions, instead
of being correct, will merely be functions which sort-of work most of
the time. And the correct advice given to hackers wishing to write
correct programs would be to avoid the new functions altogether and use
the currently existing text property searching functions instead.
But as the experiment shows, the new function (dis)regards the same
subset of things (i.e., nil and the absence of a value) as the old
functions. So that doesn't seem likely to happen.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Alan Mackenzie
2018-04-17 20:31:14 UTC
Permalink
Hello, Lars.
Post by Lars Ingebrigtsen
Post by Alan Mackenzie
If you fail to do this, people will curse you. The functions, instead
of being correct, will merely be functions which sort-of work most of
the time. And the correct advice given to hackers wishing to write
correct programs would be to avoid the new functions altogether and use
the currently existing text property searching functions instead.
But as the experiment shows, the new function (dis)regards the same
subset of things (i.e., nil and the absence of a value) as the old
functions. So that doesn't seem likely to happen.
Why did you distort my message by snipping essential context?

The "this" I was talking about was the correct handling of category
properties in the new functions, not the distinction between an absent
property and a value of nil.

Let me put it to you again: you need to deal with category text
properties properly. If you don't, the things I listed in the paragraph
above will surely happen.
Post by Lars Ingebrigtsen
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
--
Alan Mackenzie (Nuremberg, Germany).
Lars Ingebrigtsen
2018-04-17 20:42:55 UTC
Permalink
Post by Alan Mackenzie
Why did you distort my message by snipping essential context?
Sorry, I did not know that that was the essential context.
Post by Alan Mackenzie
The "this" I was talking about was the correct handling of category
properties in the new functions, not the distinction between an absent
property and a value of nil.
I misunderstood. I thought you were talking about category text
properties that apparently (according to what someone wrote here) there
is a distinction between nil and absence.
Post by Alan Mackenzie
Let me put it to you again: you need to deal with category text
properties properly. If you don't, the things I listed in the paragraph
above will surely happen.
The new function deals with any text property just the same as
`next-single-property-change' and all its friends do; no more nor less,
so I may be misunderstanding what you're saying once again. :-)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Alan Mackenzie
2018-04-16 19:40:34 UTC
Permalink
Hello, Lars.
Post by Lars Ingebrigtsen
Post by João Távora
OK. In your example tho, I think will need to distinguish the case where
the property's value is nil from the case where the property isn't set at
all.
Hm... is that a distinction that makes a difference anywhere?
Very much so. If there is a category text property at some point, and
the symbol it uses has a foo property, that will normally get seen by
the text property primitives. However, if you put a foo text property
there with a nil value, that nil value will mask the intent of the
category property.

It might be worth considering whether the new search function will take
account of category properties, and if so, how.
Post by Lars Ingebrigtsen
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
--
Alan Mackenzie (Nuremberg, Germany).
Lars Ingebrigtsen
2018-04-16 19:49:11 UTC
Permalink
Post by Alan Mackenzie
Very much so. If there is a category text property at some point, and
the symbol it uses has a foo property, that will normally get seen by
the text property primitives. However, if you put a foo text property
there with a nil value, that nil value will mask the intent of the
category property.
Hm. I am completely unfamiliar with the category stuff -- how is this
used in practice?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Alan Mackenzie
2018-04-16 20:07:45 UTC
Permalink
Hello, Lars.
Post by Lars Ingebrigtsen
Post by Alan Mackenzie
Very much so. If there is a category text property at some point, and
the symbol it uses has a foo property, that will normally get seen by
the text property primitives. However, if you put a foo text property
there with a nil value, that nil value will mask the intent of the
category property.
Hm. I am completely unfamiliar with the category stuff -- how is this
used in practice?
I don't really know, in general.

In C++ and Java Modes, < and > which are template/generic delimiters are
given category text properties, symbols foo and bar, whose syntax-table
properties gives the < and > parenthesis syntax. This parenthesis syntax
is regularly "switched off" on all such characters simply by changing
the value of foo's and bar's syntax-table properties to punctuation.
This enables syntactic searching where template/generic < and > need not
to be seen.

Having implemented this, I don't recommend the technique. It has
disadvantages, in that it collides with syntax-ppss. To be fair, Stefan
advised me not to go ahead with it at an early stage. I'll probably
remove it from CC Mode at some stage.
Post by Lars Ingebrigtsen
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
--
Alan Mackenzie (Nuremberg, Germany).
Lars Ingebrigtsen
2018-04-16 20:31:59 UTC
Permalink
Post by Alan Mackenzie
In C++ and Java Modes, < and > which are template/generic delimiters are
given category text properties, symbols foo and bar, whose syntax-table
properties gives the < and > parenthesis syntax. This parenthesis syntax
is regularly "switched off" on all such characters simply by changing
the value of foo's and bar's syntax-table properties to punctuation.
This enables syntactic searching where template/generic < and > need not
to be seen.
I see; thanks for the explanation.

So it would be nice to use this function to search for characters that
have a property set to nil (as opposed to not being set at all). But
how to express that to the predicate function? Hm... We need more
forms of `nil'. :-)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Lars Ingebrigtsen
2018-04-16 21:18:54 UTC
Permalink
Post by João Távora
1. What should happen if search starts in the region where the
property is already set?
I've reconsidered slightly after rewriting a couple of functions to use
this.

A typical use case is the user hitting TAB to go to the next button in a
buffer. That would then be something like

(when (get-text-property (point) 'shr-url)
(text-property-search-forward 'shr-url nil t))
(text-property-search-forward 'shr-url nil nil)

I mean, it's not awful, but it's a common use case, and that's boring to
have to type.

So I added a flag, and you can now skip matches under point.

This function:

(defun shr-next-link ()
"Skip to the next link."
(interactive)
(let ((current (get-text-property (point) 'shr-url))
(start (point))
skip)
(while (and (not (eobp))
(equal (get-text-property (point) 'shr-url) current))
(forward-char 1))
(cond
((and (not (eobp))
(get-text-property (point) 'shr-url))
;; The next link is adjacent.
(message "%s" (get-text-property (point) 'help-echo)))
((or (eobp)
(not (setq skip (text-property-not-all (point) (point-max)
'shr-url nil))))
(goto-char start)
(message "No next link"))
(t
(goto-char skip)
(message "%s" (get-text-property (point) 'help-echo))))))

Can now be rewritten as:

(defun shr-next-link ()
"Skip to the next link."
(interactive)
(if (not (text-property-search-forward 'shr-url nil nil t))
(message "No next link")
(message "%s" (get-text-property (point) 'help-echo))))

:-)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
João Távora
2018-04-16 21:28:56 UTC
Permalink
Post by Lars Ingebrigtsen
(defun shr-next-link ()
"Skip to the next link."
(interactive)
(if (not (text-property-search-forward 'shr-url nil nil t))
(message "No next link")
(message "%s" (get-text-property (point) 'help-echo))))
:-)
Looks super. BTW how would you go about on using this on software that is
not in Emacs core?
Would you consider making this a separate package for (GNU|M)ELPA that
other packages can include as a dependency?

João
Lars Ingebrigtsen
2018-04-16 22:09:00 UTC
Permalink
Post by João Távora
Would you consider making this a separate package for (GNU|M)ELPA
that other packages can include as a dependency?
I won't, but if others want to, be my guest.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Stefan Monnier
2018-04-17 13:01:27 UTC
Permalink
Post by Lars Ingebrigtsen
Post by João Távora
Would you consider making this a separate package for (GNU|M)ELPA
that other packages can include as a dependency?
I won't, but if others want to, be my guest.
If you put the code in its own file, then we can distribute it as a GNU
ELPA package (built directly from the emacs.git master branch), like we
do for cl-print, let-alist, ntlm, python, and soap-client.


Stefan
Lars Ingebrigtsen
2018-04-17 13:04:36 UTC
Permalink
Post by Stefan Monnier
If you put the code in its own file, then we can distribute it as a GNU
ELPA package (built directly from the emacs.git master branch), like we
do for cl-print, let-alist, ntlm, python, and soap-client.
Yeah, the code grew longer than anticipated, so putting it in its own
file is probably a good idea anyway.

I'll do so later today and check it in on master with complete
documentation.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Lars Ingebrigtsen
2018-04-16 16:59:33 UTC
Permalink
Post by Lars Ingebrigtsen
(while (text-property-search-forward 'shr-url nil nil)
(push (get-text-property (match-beginning 0) 'shr-url) urls))
Hm, I'm writing the documentation now, and I wonder whether it would be
cleaner and more convenient to just return a data structure here to
avoid messing with the match state... It could be, for instance, a
structure with nice accessors like

(prop-match-start match)

and stuff...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Lars Ingebrigtsen
2018-04-16 18:03:56 UTC
Permalink
I've now implemented this on the scratch/prop-search branch.

It got a bit more convoluted than I originally thought, but I think it
should do what you'd expect now. The subtleties are between searching
for things that don't match, and searching for nothing that doesn't
match.

The known unknowns and the unknown knowns.

I'm sure.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Loading...