Discussion:
Question about (gui-get-selection nil 'text/html)
(too old to reply)
Lars Ingebrigtsen
2018-04-13 19:00:52 UTC
Permalink
So, we can yank HTML that we cut from Firefox like so:

(gui-get-selection nil 'text/html)

... sort of.

I've put the result into a binary file so it'll hopefully survive the
email transport.
Lars Ingebrigtsen
2018-04-13 19:07:27 UTC
Permalink
Oh, wow. If I just do

(decode-coding-region (point-min) (point-max) 'utf-16-le)

instead of utf-16, I get the HTML I expect instead of a bunch of Chinese
characters. :-)

There's a byte order mark at the start -- isn't utf-16 supposed to use
that to get the byte order?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Eli Zaretskii
2018-04-13 20:27:36 UTC
Permalink
Date: Fri, 13 Apr 2018 21:07:27 +0200
Oh, wow. If I just do
(decode-coding-region (point-min) (point-max) 'utf-16-le)
instead of utf-16, I get the HTML I expect instead of a bunch of Chinese
characters. :-)
There's a byte order mark at the start -- isn't utf-16 supposed to use
that to get the byte order?
The file you attached has no BOM.
Lars Ingebrigtsen
2018-04-13 20:29:41 UTC
Permalink
Post by Eli Zaretskii
Date: Fri, 13 Apr 2018 21:07:27 +0200
Oh, wow. If I just do
(decode-coding-region (point-min) (point-max) 'utf-16-le)
instead of utf-16, I get the HTML I expect instead of a bunch of Chinese
characters. :-)
There's a byte order mark at the start -- isn't utf-16 supposed to use
that to get the byte order?
The file you attached has no BOM.
The first four bytes were

\303\277\303\276

Isn't that the BOM? Or do I misremember?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Andreas Schwab
2018-04-13 22:07:37 UTC
Permalink
Post by Lars Ingebrigtsen
The first four bytes were
\303\277\303\276
Isn't that the BOM? Or do I misremember?
It's a BOM encoded as UTF-16 encoded as UTF-8.

Andreas.
--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Lars Ingebrigtsen
2018-04-13 22:18:59 UTC
Permalink
Post by Andreas Schwab
Post by Lars Ingebrigtsen
The first four bytes were
\303\277\303\276
Isn't that the BOM? Or do I misremember?
It's a BOM encoded as UTF-16 encoded as UTF-8.
Heh heh. Beautiful.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Eli Zaretskii
2018-04-14 06:42:27 UTC
Permalink
Date: Fri, 13 Apr 2018 22:29:41 +0200
Post by Eli Zaretskii
Post by Lars Ingebrigtsen
There's a byte order mark at the start -- isn't utf-16 supposed to use
that to get the byte order?
The file you attached has no BOM.
The first four bytes were
\303\277\303\276
Isn't that the BOM? Or do I misremember?
Look at it with hexl-find-file or with "od -x", and you will see it's
not a BOM (which should be either FFFE or FEFF).

Stefan Monnier
2018-04-13 21:42:04 UTC
Permalink
Post by Lars Ingebrigtsen
(gui-get-selection nil 'text/html)
I've opened a bug report for that: bug#31149


Stefan
Loading...