Discussion:
using libmagic in Emacs?
(too old to reply)
j***@verona.se
2009-08-18 18:35:14 UTC
Permalink
(This is probably a FAQ but my google skills seem to fail me)

There are some operations in Emacs which tries to do the same thing as
the "libmagic" library, which is the core of the "file" utility, does.

For instance, in "image.el" there is functionality to look at magic
numbers in image files.

Also, I often whish that files would open in Emacs with correct mode
more often when there is no file extension.

Would there be interest in an Emacs patch for libmagic, or is there some
obvious reason this havent been done yet? I envision this as being an
inteface with 2 implementations, a lisp fallback like today, and
libmagic if available. I did a libmagick wrapper for Ocaml using Swig
before so I have some familiarity with the API.
--
Joakim Verona
Stefan Monnier
2009-08-18 19:23:06 UTC
Permalink
Post by j***@verona.se
(This is probably a FAQ but my google skills seem to fail me)
There are some operations in Emacs which tries to do the same thing as
the "libmagic" library, which is the core of the "file" utility, does.
For instance, in "image.el" there is functionality to look at magic
numbers in image files.
Also, I often whish that files would open in Emacs with correct mode
more often when there is no file extension.
Would there be interest in an Emacs patch for libmagic, or is there some
obvious reason this havent been done yet? I envision this as being an
inteface with 2 implementations, a lisp fallback like today, and
libmagic if available. I did a libmagick wrapper for Ocaml using Swig
before so I have some familiarity with the API.
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.


Stefan
Chong Yidong
2009-08-18 20:01:09 UTC
Permalink
Post by Stefan Monnier
Post by j***@verona.se
Would there be interest in an Emacs patch for libmagic, or is there some
obvious reason this havent been done yet? I envision this as being an
inteface with 2 implementations, a lisp fallback like today, and
libmagic if available. I did a libmagick wrapper for Ocaml using Swig
before so I have some familiarity with the API.
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
This development would probably have to take place in a separate branch.
j***@verona.se
2009-08-18 20:35:47 UTC
Permalink
Post by Chong Yidong
Post by Stefan Monnier
Post by j***@verona.se
Would there be interest in an Emacs patch for libmagic, or is there some
obvious reason this havent been done yet? I envision this as being an
inteface with 2 implementations, a lisp fallback like today, and
libmagic if available. I did a libmagick wrapper for Ocaml using Swig
before so I have some familiarity with the API.
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
This development would probably have to take place in a separate branch.
I will work in my local git repos, and publish a patch here, much like
the imagemagick patch and the xwidget patch. I can switch to bzr
whenever that works.

The core libmagic lisp api should, however, be rather stand-alone and
non-intrusive. Client code such as the image type recognition code can
then be ported sucessively.
--
Joakim Verona
Stefan Monnier
2009-08-18 21:11:02 UTC
Permalink
Post by j***@verona.se
Post by Chong Yidong
Post by Stefan Monnier
Post by j***@verona.se
Would there be interest in an Emacs patch for libmagic, or is there some
obvious reason this havent been done yet? I envision this as being an
inteface with 2 implementations, a lisp fallback like today, and
libmagic if available. I did a libmagick wrapper for Ocaml using Swig
before so I have some familiarity with the API.
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
This development would probably have to take place in a separate branch.
I don't expect it to be too intrusive, so I think it can be done on the
trunk, tho of course, each step needs to be planned with care.
Post by j***@verona.se
I will work in my local git repos, and publish a patch here, much like
the imagemagick patch and the xwidget patch. I can switch to bzr
whenever that works.
That's fine as well.
Post by j***@verona.se
The core libmagic lisp api should, however, be rather stand-alone and
non-intrusive. Client code such as the image type recognition code can
then be ported sucessively.
I think it's OK to install the code in CVS as soon as the Lisp API to
libmagic is ready. Once that is done, we can decide which next steps
to take.


Stefan
Eli Zaretskii
2009-08-19 02:58:13 UTC
Permalink
Date: Tue, 18 Aug 2009 17:11:02 -0400
Post by Chong Yidong
Post by Stefan Monnier
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
This development would probably have to take place in a separate branch.
I don't expect it to be too intrusive, so I think it can be done on the
trunk, tho of course, each step needs to be planned with care.
So what is the rule for new features that can be installed on the
trunk at this time? I thought only relatively minor and safe ones,
but this one seems to break that rule, at least in my book. If this
one is okay, then why not something like bidirectional editing, for
example?

Maybe we should simply decide right here and now that Emacs 23.2 will
be delivered from the RC branch, and open the trunk for all changes,
even not-so-safe ones?
Stefan Monnier
2009-08-19 03:21:13 UTC
Permalink
Post by Eli Zaretskii
Post by Stefan Monnier
Post by Chong Yidong
Post by Stefan Monnier
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
This development would probably have to take place in a separate branch.
I don't expect it to be too intrusive, so I think it can be done on the
trunk, tho of course, each step needs to be planned with care.
So what is the rule for new features that can be installed on the
trunk at this time?
The rule is: anything is possible, but ones that aren't simple and safe
need to get confirmation here first.
Post by Eli Zaretskii
I thought only relatively minor and safe ones,
but this one seems to break that rule, at least in my book.
It looks pretty safe: the first step is to add the Lisp API, which
should not impact any other code (tho it may cause temporary build
failures, I guess). After that, set-auto-mode (and/or image.el, ...)
will need to be tweaked to also take libmagic into account
when available. This should also be fairly simple.
Post by Eli Zaretskii
If this one is okay, then why not something like bidirectional
editing, for example?
I was thinking of bidi for Emacs-24, but if you have code ready for it,
and if it's not too intrusive, I'd be willing to consider it.
Post by Eli Zaretskii
Maybe we should simply decide right here and now that Emacs 23.2 will
be delivered from the RC branch, and open the trunk for all changes,
even not-so-safe ones?
Yes, that's pretty much where we're at, I think, yes.


Stefan
Chong Yidong
2009-08-19 13:47:45 UTC
Permalink
Post by Stefan Monnier
Post by Eli Zaretskii
I thought only relatively minor and safe ones,
but this one seems to break that rule, at least in my book.
It looks pretty safe: the first step is to add the Lisp API, which
should not impact any other code (tho it may cause temporary build
failures, I guess). After that, set-auto-mode (and/or image.el, ...)
will need to be tweaked to also take libmagic into account
when available. This should also be fairly simple.
I don't think that sounds simple or safe; however, there's not enough
information to know for sure, until Joakim posts the patch.
Post by Stefan Monnier
Post by Eli Zaretskii
Maybe we should simply decide right here and now that Emacs 23.2 will
be delivered from the RC branch, and open the trunk for all changes,
even not-so-safe ones?
Yes, that's pretty much where we're at, I think, yes.
Actually, if people want to start including more intrusive changes, I
think we should cut a new branch from the current trunk. This would
postphone the CEDET merge to 23.3.
j***@verona.se
2009-08-19 15:57:50 UTC
Permalink
Post by Chong Yidong
Post by Stefan Monnier
Post by Eli Zaretskii
I thought only relatively minor and safe ones,
but this one seems to break that rule, at least in my book.
It looks pretty safe: the first step is to add the Lisp API, which
should not impact any other code (tho it may cause temporary build
failures, I guess). After that, set-auto-mode (and/or image.el, ...)
will need to be tweaked to also take libmagic into account
when available. This should also be fairly simple.
I don't think that sounds simple or safe; however, there's not enough
information to know for sure, until Joakim posts the patch.
Post by Stefan Monnier
Post by Eli Zaretskii
Maybe we should simply decide right here and now that Emacs 23.2 will
be delivered from the RC branch, and open the trunk for all changes,
even not-so-safe ones?
Yes, that's pretty much where we're at, I think, yes.
Actually, if people want to start including more intrusive changes, I
think we should cut a new branch from the current trunk. This would
postphone the CEDET merge to 23.3.
Please dont postpone CEDET on my behalf! That would feel terrible.
--
Joakim Verona
Dan Nicolaescu
2009-08-19 19:46:51 UTC
Permalink
Post by Chong Yidong
Post by Stefan Monnier
Post by Eli Zaretskii
I thought only relatively minor and safe ones,
but this one seems to break that rule, at least in my book.
It looks pretty safe: the first step is to add the Lisp API, which
should not impact any other code (tho it may cause temporary build
failures, I guess). After that, set-auto-mode (and/or image.el, ...)
will need to be tweaked to also take libmagic into account
when available. This should also be fairly simple.
I don't think that sounds simple or safe; however, there's not enough
information to know for sure, until Joakim posts the patch.
Post by Stefan Monnier
Post by Eli Zaretskii
Maybe we should simply decide right here and now that Emacs 23.2 will
be delivered from the RC branch, and open the trunk for all changes,
even not-so-safe ones?
Yes, that's pretty much where we're at, I think, yes.
Actually, if people want to start including more intrusive changes, I
think we should cut a new branch from the current trunk.
Are there any plans for the next release?

IMO we need to make a bug fix release ASAP, this bug:

http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=4146

warrants it.

Not being able to set the C style in a file (or directory) local
variable is a major annoyance for users.
Chong Yidong
2009-08-19 21:06:29 UTC
Permalink
Post by Dan Nicolaescu
Are there any plans for the next release?
The plan (and Stefan agrees) is to spend about the same amount of time
as we did between 22.1 and 22.2. This would put 23.2 around April next
year.
Post by Dan Nicolaescu
http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=4146
warrants it. Not being able to set the C style in a file (or
directory) local variable is a major annoyance for users.
This bug is pretty serious. Happily, it only affects the trunk; the
23.1 release is unaffected.

The most likely culprit is the 2009-07-18 change to cc-mode.el, which we
did not apply to the release branch.
Dan Nicolaescu
2009-08-19 21:53:53 UTC
Permalink
Post by Chong Yidong
Post by Dan Nicolaescu
http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=4146
warrants it. Not being able to set the C style in a file (or
directory) local variable is a major annoyance for users.
This bug is pretty serious. Happily, it only affects the trunk; the
23.1 release is unaffected.
Good to hear that! (I don't have a 23.1 handy, just the CVS build...) I'll
make a note of this in the bug.
Alan Mackenzie
2009-08-19 22:56:45 UTC
Permalink
Hi, Everybody!
Post by Chong Yidong
Post by Dan Nicolaescu
Are there any plans for the next release?
The plan (and Stefan agrees) is to spend about the same amount of time
as we did between 22.1 and 22.2. This would put 23.2 around April next
year.
Post by Dan Nicolaescu
http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=4146
warrants it. Not being able to set the C style in a file (or
directory) local variable is a major annoyance for users.
OK. This is for me to fix, so I'm acknowledging having seen it.
Post by Chong Yidong
This bug is pretty serious. Happily, it only affects the trunk; the
23.1 release is unaffected.
Phew! Thanks for that!
Post by Chong Yidong
The most likely culprit is the 2009-07-18 change to cc-mode.el, which
we did not apply to the release branch.
Yes. It's one of these @dfn{wallpaper paste} bugs, where when you press
an area of wallpaper firmly to the wall, the paste under it pops up
another bit of wallpaper somewhere else. It's going to be a horrible bug
to fix. Indeed, it may not be fixable, in the sense of doing the Right
Thing under every reasonable circumstance.
--
Alan Mackenzie (Nuremberg, Germany).
Nick Roberts
2009-08-19 23:16:18 UTC
Permalink
Post by Alan Mackenzie
an area of wallpaper firmly to the wall, the paste under it pops up
another bit of wallpaper somewhere else. It's going to be a horrible bug
to fix. Indeed, it may not be fixable, in the sense of doing the Right
Thing under every reasonable circumstance.
Such 'bubbles', usually called regressions, might be less likely to appear
if, as Cyd suggested, there was a CC mode equivalent of compilation.txt.
I can't find this post now, so apologies if it reached a logical conclusion.
It might also be harder to implement than compilation.txt, as expressions
are probably not as self contained, but some kind of testuite seems essential
to prevent these issues from recurring.
--
Nick http://www.inet.net.nz/~nickrob
Lennart Borgman
2009-08-20 09:02:41 UTC
Permalink
 > an area of wallpaper firmly to the wall, the paste under it pops up
 > another bit of wallpaper somewhere else.  It's going to be a horrible bug
 > to fix.  Indeed, it may not be fixable, in the sense of doing the Right
 > Thing under every reasonable circumstance.
Such 'bubbles', usually called regressions, might be less likely to appear
if, as Cyd suggested, there was a CC mode equivalent of compilation.txt.
I can't find this post now, so apologies if it reached a logical conclusion.
It might also be harder to implement than compilation.txt, as expressions
are probably not as self contained, but some kind of testuite seems essential
to prevent these issues from recurring.
Is not some kind of unit tests what we want here?

Adding a unit test for at least every serious and difficult bug seems
to me the right thing to do.


There are some unit tests frameworks on EmacsWiki. I have one modified
version of one of these in nXhtml. And CEDET has some unit tests
(which I have not looked at but I have run the test suite).
Eric M. Ludlam
2009-08-20 11:19:58 UTC
Permalink
Post by Lennart Borgman
Post by Nick Roberts
Post by Alan Mackenzie
an area of wallpaper firmly to the wall, the paste under it pops up
another bit of wallpaper somewhere else. It's going to be a horrible bug
to fix. Indeed, it may not be fixable, in the sense of doing the Right
Thing under every reasonable circumstance.
Such 'bubbles', usually called regressions, might be less likely to appear
if, as Cyd suggested, there was a CC mode equivalent of compilation.txt.
I can't find this post now, so apologies if it reached a logical conclusion.
It might also be harder to implement than compilation.txt, as expressions
are probably not as self contained, but some kind of testuite seems essential
to prevent these issues from recurring.
Is not some kind of unit tests what we want here?
Adding a unit test for at least every serious and difficult bug seems
to me the right thing to do.
There are some unit tests frameworks on EmacsWiki. I have one modified
version of one of these in nXhtml. And CEDET has some unit tests
(which I have not looked at but I have run the test suite).
Just as another vote for testing, CEDET floundered for a long time with
performance and accuracy issues (say from 1996 through 2007 or so) until
I started adding test suites. Ever since, I've been able to do sweeping
changes to the underpinnings for performance, bugs or new features, and
know that everything still works by running a single "make" command.
Every user reported bug I fix gets a new test in one of the pre-existing
test suites as I work on the bug, and several folks on the mailing list
now are very good at providing test snippets for me, keeping maintenance
and overhead low.

I highly recommend doing the same for any complex task in Emacs.

Eric
Alan Mackenzie
2009-08-20 15:13:55 UTC
Permalink
Hi, Nick,
press an area of wallpaper firmly to the wall, the paste under it
pops up another bit of wallpaper somewhere else. It's going to be
a horrible bug to fix. Indeed, it may not be fixable, in the sense
of doing the Right Thing under every reasonable circumstance.
Such 'bubbles', usually called regressions, ....
No, that's not what I meant (although it is also a regression). I was
talking about the high complexity caused by the number of things that
all have to work right at the same time. The "fix" I made to cc-mode.el
in July fixed one problem but created another. Getting them all working
simultaneously is going to be hard.

This complexity has increased recently due to the new feature "directory
locals". I didn't become aware of this when it was introduced (my bad),
and the person who wrote it wasn't aware of the trouble it would cause
CC Mode (why should he be?).

The trouble is, there are too many ways of setting a CC Mode "style
variable" (such as c-basic-offset), @xref{Config Basics,,, ccmode}. It
is not always the last setting which should prevail over previous ones.
It is a complexity which nobody would design; it has emerged as such
over CC Mode's lifetime, and is now a mess. the .dir-locals feature may
have pushed the complexity over the edge of what is manageable.
...., might be less likely to appear if, as Cyd suggested, there was a
CC mode equivalent of compilation.txt.
Er, what's .../etc/compilation.txt about? It has an alleged explanation
at the top, but that only makes sense if you already have some context.
For example, you need to know why you'd "need matchers", and what sort
of "matchers" they are.
I can't find this post now, so apologies if it reached a logical
conclusion. It might also be harder to implement than
compilation.txt, as expressions are probably not as self contained,
but some kind of testuite seems essential to prevent these issues from
recurring.
CC Mode has an extensive test suite for (static) indentation and
fontification. It doesn't have any such tests for things like
initialisation of the mode, execution of CC Mode commands, or for
indentation/fontification after buffer changes. I would very much
welcome anybody stepping forward who had the time and energy to write
these tests.
Nick
--
Alan Mackenzie (Nuremberg, Germany).
Lennart Borgman
2009-08-20 15:47:19 UTC
Permalink
Post by Alan Mackenzie
CC Mode has an extensive test suite for (static) indentation and
fontification.  It doesn't have any such tests for things like
initialisation of the mode, execution of CC Mode commands, or for
indentation/fontification after buffer changes.  I would very much
welcome anybody stepping forward who had the time and energy to write
these tests.
The tests I use for nXhtml tries to emulate commands (taking into
pre/post etc) which might help for doing this. It also tries to
fontify by explicitly calling the timers. Maybe this can help. If
someone wants to try I can explain more.
Eli Zaretskii
2009-08-19 19:05:33 UTC
Permalink
Date: Tue, 18 Aug 2009 23:21:13 -0400
Post by Eli Zaretskii
I thought only relatively minor and safe ones,
but this one seems to break that rule, at least in my book.
It looks pretty safe
As you see, even Yidong is not sure he agrees, and neither am I.
I was thinking of bidi for Emacs-24
If history is of any significance, I may not live until Emacs 24. And
for some strange reason, the burden of adding this feature seems to be
on my shoulders and no one else's: no development happened in this
direction for the last several years, even though most of the
low-level code was sitting on a branch (courtesy of Handa-san) for the
last 4 years.

So I'd prefer it to happen sooner rather than later, at least to the
point where the foundations are in place and others can contribute the
rest.
but if you have code ready for it
and if it's not too intrusive, I'd be willing to consider it.
It is not ``ready'' in the sense that it is not yet production
quality. It does not yet support all the features of the Emacs
display engine. But it can already display bidirectional text, for
now only in a left-to-right paragraph and only if the text has no
faces and overlays. The code that reorders characters for display
isn't activated until you flip a buffer-local variable, and then only
in that buffer. Is that ``not too intrusive'' enough?
Stefan Monnier
2009-08-21 18:59:24 UTC
Permalink
Post by Eli Zaretskii
Post by Stefan Monnier
I was thinking of bidi for Emacs-24
If history is of any significance, I may not live until Emacs 24. And
for some strange reason, the burden of adding this feature seems to be
on my shoulders and no one else's: no development happened in this
direction for the last several years, even though most of the
low-level code was sitting on a branch (courtesy of Handa-san) for the
last 4 years.
Has this branch been kept up-to-date w.r.t the trunk?
I'd guess not. In that case, someone should do it.

I just took a look at that branch, and it doesn't look too terrible.
It made me discover the variable direction-reversed which I didn't
even know existed (and it seems that it currently has no effect :-(
Post by Eli Zaretskii
So I'd prefer it to happen sooner rather than later, at least to the
point where the foundations are in place and others can contribute
the rest.
Agreed. The more I think about it, the more I think we need to open
a new branch for "what will become emacs-24". Kind of like what we did
with the emacs-unicode branch. I think bidi should be one of the first
features to install on that branch.
Post by Eli Zaretskii
Post by Stefan Monnier
but if you have code ready for it
and if it's not too intrusive, I'd be willing to consider it.
It is not ``ready'' in the sense that it is not yet production
quality. It does not yet support all the features of the Emacs
display engine. But it can already display bidirectional text, for
now only in a left-to-right paragraph and only if the text has no
faces and overlays. The code that reorders characters for display
isn't activated until you flip a buffer-local variable, and then only
in that buffer. Is that ``not too intrusive'' enough?
I think it will stay unstable for too long, so it's not good enough for
the current trunk (which I'd like to keep for shorter-term changes).


Stefan
Eli Zaretskii
2009-08-21 20:44:11 UTC
Permalink
Date: Fri, 21 Aug 2009 14:59:24 -0400
Has this branch been kept up-to-date w.r.t the trunk?
No.
I'd guess not. In that case, someone should do it.
I already did. I have a source tree where the bidi code is merged
with the current trunk, and I merge them as needed all the time.
That's where I do all the development beyond what's on the bidi branch
(which is dead, as far as I'm concerned). I can keep it that way
forever, but I'd prefer to have it in the repository soon, because
others might wish to work on it.
Agreed. The more I think about it, the more I think we need to open
a new branch for "what will become emacs-24". Kind of like what we did
with the emacs-unicode branch. I think bidi should be one of the first
features to install on that branch.
Why not the other way around: make a branch for Emacs 23.x, and leave
Emacs 24 on the trunk? I think Yidong suggested that, and I think
it's a better idea. We never left the mainline of our development on
a branch before.
I think it will stay unstable for too long, so it's not good enough for
the current trunk (which I'd like to keep for shorter-term changes).
I guess that's a NO, but please note that this code, however unstable,
is never executed unless the user flips a variable. So I don't see
how it can destabilize the default configuration by being dead
ballast.
Stefan Monnier
2009-08-22 03:39:02 UTC
Permalink
Post by Eli Zaretskii
I already did. I have a source tree where the bidi code is merged
with the current trunk, and I merge them as needed all the time.
Good, thank you.
Post by Eli Zaretskii
Post by Stefan Monnier
Agreed. The more I think about it, the more I think we need to open
a new branch for "what will become emacs-24". Kind of like what we did
with the emacs-unicode branch. I think bidi should be one of the first
features to install on that branch.
Why not the other way around: make a branch for Emacs 23.x, and leave
Emacs 24 on the trunk? I think Yidong suggested that, and I think
it's a better idea.
Sure. I tend to forget about CVS's idea that one of the branches is
deemed special. So yes, "open a new branch for emacw-24" here would
mean "create a new CVS branch for Emacs-23.2 and use the trunk for
Emacs-24".
[ I'm eagerly waiting to switch over to a system where
branches are easier to use. ]
Post by Eli Zaretskii
We never left the mainline of our development on a branch before.
Actually we did for emacs-unicode ;-)
Not that it matters, tho.
Post by Eli Zaretskii
Post by Stefan Monnier
I think it will stay unstable for too long, so it's not good enough for
the current trunk (which I'd like to keep for shorter-term changes).
I guess that's a NO, but please note that this code, however unstable,
is never executed unless the user flips a variable. So I don't see
how it can destabilize the default configuration by being
dead ballast.
I know, but somehow having such experimental code there makes me uneasy.


Stefan
Jason Rumney
2009-08-22 08:18:21 UTC
Permalink
Post by Eli Zaretskii
Why not the other way around: make a branch for Emacs 23.x, and leave
Emacs 24 on the trunk? I think Yidong suggested that, and I think
it's a better idea. We never left the mainline of our development on
a branch before.
We did for both unicode, and for multi-tty. Once the code was usable on
GNU/Linux and at least compilable and didn't break on all other
platforms (which involved removing Carbon support, because noone wanted
to work on it), it was merged back to the trunk.

We should probably aim to keep the branch period short, but it would be
useful to check what you have now into a new branch so others can try it
before it goes on the trunk.
Stephen J. Turnbull
2009-08-22 05:39:59 UTC
Permalink
Post by Stefan Monnier
Agreed. The more I think about it, the more I think we need to open
a new branch for "what will become emacs-24".
That's what I thought, too, about XEmacs 21.5. I was wrong. We ended
up having to uproot the trunk and move it to a branch, and graft the
21.5 branch back as the trunk. Long-term development belongs either
on the trunk, or in feature branches. Not on a long-term development
branch which collects several features.

Once you move to bzr, this will become less costly, but remember that
in bzr (like CVS, but not so disastrously as CVS) some branches are
more equal than others in the way they are presented to the user.
It's not like git where you just do

git branch emacs-23 master
git branch -f master emacs-24
git branch -D emacs-24 # optional

and everything's alright.
Post by Stefan Monnier
I think it will stay unstable for too long, so it's not good enough for
the current trunk (which I'd like to keep for shorter-term changes).
I know how you feel, but doing things this way is either going to be a
lot of work (synching "short-term changes" from the trunk to the
long-term branch -- in my experience, people work on features like
unicode and bidi in spurts, and they're pervasive changes so conflicts
are frequent if you come back every month or so), or discourage work
on the long-term branch (conflict resolution is enthusiasm-draining).

It also fails to encourage work on the instabilities by third parties.

I may be all wet; ask Ken'ichi and Miles how they feel about the long
stalls on the unicode and lexbind branches. And maybe bzr will be
flexible enough to handle it, but you won't know that until Christmas,
I would guess. I'm still finding new things I dislike about Mercurial
20 months after our conversion....
Eli Zaretskii
2009-08-22 07:31:15 UTC
Permalink
Date: Sat, 22 Aug 2009 14:39:59 +0900
I know how you feel, but doing things this way is either going to be a
lot of work (synching "short-term changes" from the trunk to the
long-term branch -- in my experience, people work on features like
unicode and bidi in spurts, and they're pervasive changes so conflicts
are frequent if you come back every month or so), or discourage work
on the long-term branch (conflict resolution is enthusiasm-draining).
I merge once a week, for this very reason. So far, no conflicts,
since the changes are limited to the display engine, where no active
development happened for quite some time.

Also, all the changes for now have the form

if (!bidi)
{
old code
}
else
{
new code
}

and changes only happen in the `else' branch. This decreases the
probability of conflicts even more (and also avoids destabilizing the
stable code of yore).

Once the infrastructure part is over, and people start changing Lisp
packages, then yes, I guess the probability of conflict will soar.

And as I wrote, I also think the trunk should be Emacs 24, while 23.x
should be on a branch.
Kenichi Handa
2009-08-24 01:45:05 UTC
Permalink
Post by Stefan Monnier
Agreed. The more I think about it, the more I think we need to open
a new branch for "what will become emacs-24".
That's what I thought, too, about XEmacs 21.5. I was wrong. We ended
up having to uproot the trunk and move it to a branch, and graft the
21.5 branch back as the trunk. Long-term development belongs either
on the trunk, or in feature branches. Not on a long-term development
branch which collects several features.
Having an separate branch has at least one merit. As far as
it is branched from a fairly stable version, a person
working on the branch can be sure that any problem in that
branch is caused by his change.

For the case of bidi, if there's a plan of another big
change in the display engine, the above merit is big. With
a separate branch, people working on bidi code don't have to
be annoyed by bugs of that another change. Otherwise, I
think having bidi code in the trunk is better.

By the way, the case of emacs-unicode is very special. It
simply can't be in the trunk while developing because the
new unicode feature can't be toggled.

---
Kenichi Handa
***@m17n.org
Eli Zaretskii
2009-08-24 03:12:44 UTC
Permalink
Date: Mon, 24 Aug 2009 10:45:05 +0900
By the way, the case of emacs-unicode is very special. It
simply can't be in the trunk while developing because the
new unicode feature can't be toggled.
Exactly, and that's precisely why the bidi development _can_ be done
on the trunk without the burden of distinguishing its bugs from the
others: toggle the feature off, and if the bug persists, it's not from
bidi.

OTOH, the HUGE advantage of working on the trunk is that you don't
need to merge all the time, or risk falling out of sync. You also
don't inherit random bugs that existed at the time of the branch
creation.
Kenichi Handa
2009-08-24 07:17:52 UTC
Permalink
Post by Eli Zaretskii
Post by Kenichi Handa
By the way, the case of emacs-unicode is very special. It
simply can't be in the trunk while developing because the
new unicode feature can't be toggled.
Exactly, and that's precisely why the bidi development _can_ be done
on the trunk without the burden of distinguishing its bugs from the
others: toggle the feature off, and if the bug persists, it's not from
bidi.
Even if that bug is not from bidi, as far as bidi support
suffers from it, the work for bidi support must be suspended
until that bug is fixed. For instance, please consider the
situation that face-handling code gets unstable at some
point. If we are working on a bug that bidi-text can't be
displayed by a correct face (just a hypothetical bug), that
work must be suspended.
Post by Eli Zaretskii
OTOH, the HUGE advantage of working on the trunk is that you don't
need to merge all the time, or risk falling out of sync. You also
don't inherit random bugs that existed at the time of the branch
creation.
I understand that merit too. I even tend to agree on having
the bidi code in the trunk. I just wanted to point out the
possibility of the above demerit.

---
Kenichi Handa
***@m17n.org
Stephen J. Turnbull
2009-08-24 03:25:18 UTC
Permalink
Post by Kenichi Handa
Post by Stefan Monnier
Agreed. The more I think about it, the more I think we need to open
a new branch for "what will become emacs-24".
That's what I thought, too, about XEmacs 21.5. I was wrong. We ended
up having to uproot the trunk and move it to a branch, and graft the
21.5 branch back as the trunk. Long-term development belongs either
on the trunk, or in feature branches. Not on a long-term development
branch which collects several features.
Having an separate branch has at least one merit. As far as
it is branched from a fairly stable version,
What you describe is what I mean by "feature branch". Features
branches are a tried and true way to work; I do not mean to say "don't
use feature branches".
Post by Kenichi Handa
For the case of bidi, if there's a plan of another big
change in the display engine, the above merit is big.
bidi already has a branch or a repository or something. It only needs
to be canonized as "accepted in principle" for v24, and given an
official URL. My understanding of what Stefan proposed is something
different: that as the maintainers decide that some features are
important to add, they be merged to the "for v24" branch: bidi with
lexbind with ....
Post by Kenichi Handa
By the way, the case of emacs-unicode is very special. It
simply can't be in the trunk while developing because the
new unicode feature can't be toggled.
By "can't", I guess you mean you didn't design it to be toggled? Ben
Wing worked out how to have toggle-able buffer formats in 2002 (then
disappeared from XEmacs, unfortunately, but the infrastructure is
present). I'm not saying it's a good idea, but it's possible.
Juri Linkov
2009-08-19 00:57:21 UTC
Permalink
Post by Stefan Monnier
Post by j***@verona.se
There are some operations in Emacs which tries to do the same thing as
the "libmagic" library, which is the core of the "file" utility, does.
For instance, in "image.el" there is functionality to look at magic
numbers in image files.
image.el doesn't recognize some rare JPEG formats, so libmagic will be
useful here.
Post by Stefan Monnier
Post by j***@verona.se
Also, I often whish that files would open in Emacs with correct mode
more often when there is no file extension.
libmagick could supplement `magic-mode-alist' and `magic-fallback-mode-alist'.
Post by Stefan Monnier
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
gnus/mailcap.el contains a table mapping MIME-types to major modes.
--
Juri Linkov
http://www.jurta.org/emacs/
Richard Stallman
2009-08-20 03:42:07 UTC
Permalink
Post by j***@verona.se
For instance, in "image.el" there is functionality to look at magic
numbers in image files.
image.el doesn't recognize some rare JPEG formats, so libmagic will be
useful here.

Extending image.el mightbe a lot easier than other solutions.
Juri Linkov
2009-08-22 23:36:22 UTC
Permalink
Post by Juri Linkov
Post by j***@verona.se
For instance, in "image.el" there is functionality to look at
magic numbers in image files.
image.el doesn't recognize some rare JPEG formats, so libmagic
will be useful here.
Extending image.el mightbe a lot easier than other solutions.
The problem is that `image-jpeg-p' in image.el refuses to accept
non-JFIF JPEG image files whereas Emacs can correctly display them
when tests in `image-jpeg-p' are ignored.

Using libmagic means looking only for 2 bytes 0xffd8 (a magic number
of JPEG files) as described by the magic number file:

0 beshort 0xffd8 JPEG image data

It seems this is enough to determine JPEG files. But I'm not confident
about removing additional tests from `image-jpeg-p'. We could keep the
current rules in image.el as a fall-back when libmagic is not available.
--
Juri Linkov
http://www.jurta.org/emacs/
Richard Stallman
2009-08-24 00:07:18 UTC
Permalink
The problem is that `image-jpeg-p' in image.el refuses to accept
non-JFIF JPEG image files whereas Emacs can correctly display them
when tests in `image-jpeg-p' are ignored.

Using libmagic means looking only for 2 bytes 0xffd8 (a magic number
of JPEG files) as described by the magic number file:

0 beshort 0xffd8 JPEG image data

It seems this is enough to determine JPEG files. But I'm not confident
about removing additional tests from `image-jpeg-p'. We could keep the
current rules in image.el as a fall-back when libmagic is not available.

Whatever we do with the function `image-jpeg-p', we could easily make
Emacs test these two bytes. It makes no sense to install code to link
with libmagic just to handle that and a few other similar things.

Meanwhile, for operations less common and important than visiting a file,
running `file' is easy to do.

Combining those two approaches seems much better than adding code to
link with libmagic.
Juri Linkov
2009-08-24 00:17:18 UTC
Permalink
Post by Richard Stallman
Whatever we do with the function `image-jpeg-p', we could easily make
Emacs test these two bytes. It makes no sense to install code to link
with libmagic just to handle that and a few other similar things.
Meanwhile, for operations less common and important than visiting a file,
running `file' is easy to do.
Combining those two approaches seems much better than adding code to
link with libmagic.
Of course, before adding code to link with libmagic we should analyze
how useful it would be. I see its usefulness at least in the following
areas:

1. Archive file types

A popular way to create new archive file types nowadays is to register
a new file extension with the old data compression and archive format.

For instance, Java archive files have the .jar extension but build on
the ZIP file format, so they can be visited in Emacs with the help of
`archive-mode'. Enterprise Java archives with the .ear extension
and Web application Java archives with the .war extension all are
based on the ZIP file format as well as OpenDocument files with
extensions .odt .ods .odb .odp .odg .odf, Firefox add-ons (.xpi),
Keyhole Markup (.kmz), and many other file types that can be potentially
opened in Emacs if were identified as archive files by libmagic.

We can't track and add all new formats. This is the main task of libmagic.

2. Image file types

Using ImageMagick in Emacs can support over 100 image file formats.
It won't possible to recognize all them without libmagic.

3. MIME-types handling

Emacs can process different MIME-type detected by libmagic.
Even when Emacs has no special handling for a file type, it is
still useful to let Emacs run an external program associated
with its MIME-type for users who prefer running programs
(including GUI programs) from Emacs instead of using a window
manager's application menu.
--
Juri Linkov
http://www.jurta.org/emacs/
j***@verona.se
2009-08-24 07:33:33 UTC
Permalink
Post by Juri Linkov
Of course, before adding code to link with libmagic we should analyze
how useful it would be. I see its usefulness at least in the following
1. Archive file types
...
Post by Juri Linkov
We can't track and add all new formats. This is the main task of libmagic.
2. Image file types
Using ImageMagick in Emacs can support over 100 image file formats.
It won't possible to recognize all them without libmagic.
3. MIME-types handling
Emacs can process different MIME-type detected by libmagic.
Even when Emacs has no special handling for a file type, it is
still useful to let Emacs run an external program associated
with its MIME-type for users who prefer running programs
(including GUI programs) from Emacs instead of using a window
manager's application menu.
I agree completely with Juri, it is these two cases that motivates me to
work on the libmagic support.

There are a great deal of formats Emacs is able to open, but not to
recognize, such as all the many different Java archive formats Juri
mentions. There are compressed image formats like SVG.

Also, if we decide to merge the imagemagick patch, hundreds of new
image file formats will be supported, that libmagic will help identify.

Notice that I do not propose to replace Emacs current file recognition,
only to expand it when libmagic is available. We can also expand a bit
of the current handling to take care of a few well known cases, such as
the jpeg one. If no libmagic, then Emacs will behave as today, or
slightly better. If libmagic, many new file formats will open
correctly.

I honestly cant see any drawbacks with this aproach, other than me
consuming some list resources while developing the patch. I hope to be
able to reciprocate with improved documentation for developing Emacs
primitives.

Lets also not forget the friendly competition from other free text
editors. Emacs IMHO does the right thing with files more often than
other editors, lets improve on this strength!


--
Joakim Verona
Richard Stallman
2009-08-25 02:08:02 UTC
Permalink
For instance, Java archive files have the .jar extension but build on
the ZIP file format, so they can be visited in Emacs with the help of
`archive-mode'. Enterprise Java archives with the .ear extension
and Web application Java archives with the .war extension all are
based on the ZIP file format as well as OpenDocument files with
extensions .odt .ods .odb .odp .odg .odf, Firefox add-ons (.xpi),
Keyhole Markup (.kmz), and many other file types that can be potentially
opened in Emacs if were identified as archive files by libmagic.

How hard would it be to change the code in Emacs to recognize these
using the existing mechanism?

Using ImageMagick in Emacs can support over 100 image file formats.
It won't possible to recognize all them without libmagic.

Maybe this is useful. Is there an easy way to recognize files that
could be passed to ImageMagick?

Emacs can process different MIME-type detected by libmagic.

That is not useful for visiting files in Emacs, since Emacs has no
special handling for many of these mime types.

If some other Lisp code is interested in the mime type of a file,
there is a much easier way to find it out: run `file'.
To complicate Emacs with another library just to make that operation
a little faster is a step in the wrong direction.
Miles Bader
2009-08-25 02:19:37 UTC
Permalink
Post by Richard Stallman
If some other Lisp code is interested in the mime type of a file,
there is a much easier way to find it out: run `file'.
Well, not entirely trivial of course, since Emacs then needs to
interpret the results from the file command.
Post by Richard Stallman
To complicate Emacs with another library just to make that operation
a little faster is a step in the wrong direction.
I wonder how hard it would be to have some elisp that understands the
"magic" rules that the file command uses... at first glance, they don't
seem particularly complex; e.g., this is the first entry from
"/usr/share/file/magic":

0 lelong 0xc3cbc6c5 RISC OS Chunk data
Post by Richard Stallman
12 string OBJ_ \b, AOF object
12 string LIB_ \b, ALF library
If there was such elisp code, Emacs could use any magic rules file on
the system, and in addition, could distribute it's own (perhaps smaller)
set.

-Miles
--
自らを空にして、心を開く時、道は開かれる
j***@verona.se
2009-08-25 05:09:03 UTC
Permalink
Post by Miles Bader
Post by Richard Stallman
If some other Lisp code is interested in the mime type of a file,
there is a much easier way to find it out: run `file'.
Well, not entirely trivial of course, since Emacs then needs to
interpret the results from the file command.
Post by Richard Stallman
To complicate Emacs with another library just to make that operation
a little faster is a step in the wrong direction.
I wonder how hard it would be to have some elisp that understands the
"magic" rules that the file command uses... at first glance, they don't
seem particularly complex; e.g., this is the first entry from
0 lelong 0xc3cbc6c5 RISC OS Chunk data
Post by Richard Stallman
12 string OBJ_ \b, AOF object
12 string LIB_ \b, ALF library
If there was such elisp code, Emacs could use any magic rules file on
the system, and in addition, could distribute it's own (perhaps smaller)
set.
I agree that the main benefit of using libmagic is getting access to the
libmagic database.
Post by Miles Bader
-Miles
--
Joakim Verona
James Cloos
2009-08-25 13:27:06 UTC
Permalink
Miles> I wonder how hard it would be to have some elisp that understands
Miles> the "magic" rules that the file command uses...

It shouldn't be too hard, but note that the current versions of file do
not install the text-format magic file, but rather a compiled .mgc file
instead.

The format of the text file is documented in the magic(4) man page.
(Which ought to be in section 5....) I don't see any documentation
of the .mgc files, but there may be in the src.

-JimC
--
James Cloos <***@jhcloos.com> OpenPGP: 1024D/ED7DAEA6
Thien-Thi Nguyen
2009-08-25 21:41:24 UTC
Permalink
() Miles Bader <***@gnu.org>
() Tue, 25 Aug 2009 11:19:37 +0900

If there was such elisp code, Emacs could use any magic rules
file on the system, and in addition, could distribute it's own
(perhaps smaller) set.

Some proof-of-concept grade Scheme code that (re)processes the
magic rules format into sexps, and also does the non-char-encoding
side of file(1) is at [0]. Converted rules (v1, v2) are at [1].

(Aside) I think the best candidate for libfoo integration with
Emacs would be libguile. In contrast, when i read of other
libfoo proposals, i can't help but feel somewhat deflated.

thi

_______________________________________________
[0] http://www.gnuvola.org/software/ttn-do/
(see file magic.scm in the tarball)
[1] http://www.gnuvola.org/data/index.html
(see entry "de-uglified magic file")
Stefan Monnier
2009-08-25 17:36:27 UTC
Permalink
Post by Richard Stallman
If some other Lisp code is interested in the mime type of a file,
there is a much easier way to find it out: run `file'.
Actually, running `file' might be a useful fallback (especially if run
via process-file so it also works on remote files, contrary to
libmagic).

Also, looking at the way the magic database works, it seems that we
might want to run `file-type-by-magic' on a buffer's content as well.

As for Miles's suggestion to interpret the magic file directly, I think
it's a bad idea: let's not reinvent the wheel, the code that parses this
database and uses it already exists, it's in libmagic, so we should just
use it.


Stefan
Juri Linkov
2009-08-25 20:37:19 UTC
Permalink
Post by Juri Linkov
For instance, Java archive files have the .jar extension but build on
the ZIP file format, so they can be visited in Emacs with the help of
`archive-mode'. Enterprise Java archives with the .ear extension
and Web application Java archives with the .war extension all are
based on the ZIP file format as well as OpenDocument files with
extensions .odt .ods .odb .odp .odg .odf, Firefox add-ons (.xpi),
Keyhole Markup (.kmz), and many other file types that can be potentially
opened in Emacs if were identified as archive files by libmagic.
How hard would it be to change the code in Emacs to recognize these
using the existing mechanism?
Not hard at all. For the ZIP file format it's just one line
in `magic-fallback-mode-alist'. Unlike `image-type-auto-detected-p'
from the preloaded image.el, we can't use `archive-find-type' in
`magic-fallback-mode-alist' because arc-mode.el is not preloaded.
So we have to copy archive magic numbers manually from arc-mode.el
to `magic-fallback-mode-alist'.

Index: lisp/files.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/files.el,v
retrieving revision 1.1069
diff -u -r1.1069 files.el
--- lisp/files.el 17 Aug 2009 23:40:22 -0000 1.1069
+++ lisp/files.el 25 Aug 2009 20:36:28 -0000
@@ -2399,6 +2399,7 @@

(defvar magic-fallback-mode-alist
`((image-type-auto-detected-p . image-mode)
+ ("\\(PK00\\)?[P]K\003\004" . archive-mode) ; zip
;; The < comes before the groups (but the first) to reduce backtracking.
;; TODO: UTF-16 <?xml may be preceded by a BOM 0xff 0xfe or 0xfe 0xff.
;; We use [ \t\r\n] instead of `\\s ' to make regex overflow less likely.
--
Juri Linkov
http://www.jurta.org/emacs/
Juri Linkov
2009-08-29 23:19:12 UTC
Permalink
Post by Richard Stallman
If some other Lisp code is interested in the mime type of a file,
there is a much easier way to find it out: run `file'.
I agree that running `file' is a simpler solution. So I propose
at least the following patch to add content-based MIME-type identification
to one particular function `mailcap-file-default-commands' that needs
this when filename extension-based identification fails:

Index: lisp/gnus/mailcap.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/gnus/mailcap.el,v
retrieving revision 1.26
diff -c -r1.26 mailcap.el
*** lisp/gnus/mailcap.el 5 Jan 2009 03:22:07 -0000 1.26
--- lisp/gnus/mailcap.el 29 Aug 2009 23:18:45 -0000
***************
*** 1030,1037 ****
;; All unique MIME types from file extensions
(mailcap-delete-duplicates
(mapcar (lambda (file)
! (mailcap-extension-to-mime
! (file-name-extension file t)))
files)))
(all-mime-info
;; All MIME info lists
--- 1030,1042 ----
;; All unique MIME types from file extensions
(mailcap-delete-duplicates
(mapcar (lambda (file)
! (or (mailcap-extension-to-mime
! (file-name-extension file t))
! (replace-regexp-in-string
! ".*:\\s-*\\(.*\\)\\s-*" "\\1"
! (shell-command-to-string
! (concat "file --mime "
! (shell-quote-argument file))))))
files)))
(all-mime-info
;; All MIME info lists
--
Juri Linkov
http://www.jurta.org/emacs/
Eli Zaretskii
2009-08-30 03:09:23 UTC
Permalink
Date: Sun, 30 Aug 2009 02:19:12 +0300
I agree that running `file' is a simpler solution.
PLEASE do not base Emacs infrastructure on external programs, unless
they come with Emacs. `file' is not available on every platform, and
even on those it is, the quality and extent of its database is unclear
and so cannot be relied upon.

I really don't understand why linking against a simple free library is
an issue, but if it is, we should find a different solution using some
database internal to Emacs, as we did until now.

In any case, invoking external programs without being smart about
their non-existence is not something we should have in Emacs.
Juri Linkov
2009-08-30 20:54:34 UTC
Permalink
Post by Eli Zaretskii
Post by Juri Linkov
I agree that running `file' is a simpler solution.
PLEASE do not base Emacs infrastructure on external programs, unless
they come with Emacs.
There are many features in Emacs that depend on external programs,
e.g. `ls' for dired, `find' and `grep', `man', `ispell', `shell',
`gdb', `diff', VCS tools, etc.
Post by Eli Zaretskii
`file' is not available on every platform, and even on those it is,
When `libmagic' is available, then usually `file' is available as well
on the same platform.
Post by Eli Zaretskii
the quality and extent of its database is unclear and so cannot be
relied upon.
I really don't understand why linking against a simple free library is
an issue, but if it is, we should find a different solution using some
database internal to Emacs, as we did until now.
If someone thinks adding the magic number database to Emacs is important,
then fine, let's do it. But this doesn't preclude from using `file'
since we already have many such pairs of external programs and their
emulations for the case when an external program is not available, e.g.
`ls' and ls-lisp, `grep' and multi-occur, `shell' and eshell, etc.
Post by Eli Zaretskii
In any case, invoking external programs without being smart about
their non-existence is not something we should have in Emacs.
My patch fails gracefully when `file' is not available, I tried
removing `file' without any problem. The function just returns nil.
--
Juri Linkov
http://www.jurta.org/emacs/
Juri Linkov
2009-08-25 20:36:04 UTC
Permalink
Post by Juri Linkov
The problem is that `image-jpeg-p' in image.el refuses to accept
non-JFIF JPEG image files whereas Emacs can correctly display them
when tests in `image-jpeg-p' are ignored.
Using libmagic means looking only for 2 bytes 0xffd8 (a magic number
0 beshort 0xffd8 JPEG image data
It seems this is enough to determine JPEG files. But I'm not confident
about removing additional tests from `image-jpeg-p'. We could keep the
current rules in image.el as a fall-back when libmagic is not available.
Whatever we do with the function `image-jpeg-p', we could easily make
Emacs test these two bytes. It makes no sense to install code to link
with libmagic just to handle that and a few other similar things.
The following patch changes `image-type-header-regexps' to test only
two bytes of the JPEG magic number:

Index: lisp/image.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/image.el,v
retrieving revision 1.87
diff -u -r1.87 image.el
--- lisp/image.el 24 Feb 2009 10:29:00 -0000 1.87
+++ lisp/image.el 25 Aug 2009 20:33:16 -0000
@@ -43,7 +43,7 @@
static \\(unsigned \\)?char \\1_bits" . xbm)
("\\`\\(?:MM\0\\*\\|II\\*\0\\)" . tiff)
("\\`[\t\n\r ]*%!PS" . postscript)
- ("\\`\xff\xd8" . (image-jpeg-p . jpeg))
+ ("\\`\xff\xd8" . jpeg)
(,(let* ((incomment-re "\\(?:[^-]\\|-[^-]\\)")
(comment-re (concat "\\(?:!--" incomment-re "*-->[ \t\r\n]*<\\)")))
(concat "\\(?:<\\?xml[ \t\r\n]+[^>]*>\\)?[ \t\r\n]*<"
--
Juri Linkov
http://www.jurta.org/emacs/
j***@verona.se
2009-08-19 22:49:53 UTC
Permalink
Post by Stefan Monnier
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
Stefan
I attach an early draft filemagic patch.

Some notes:

- The mime type info usualy is less granular than the free
text info:

file --mime /tmp/tst.xcf
/tmp/tst.xcf: application/octet-stream; charset=binary

file /tmp/tst.xcf
/tmp/tst.xcf: GIMP XCF image data, version 0, 640 x 480, RGB Color

This is dependent on the file magic info file used.

- We can probably have much fun debating what the interface should look
like at the lisp level. Any ideas?
Dan Nicolaescu
2009-08-19 23:20:21 UTC
Permalink
Post by j***@verona.se
Post by Stefan Monnier
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
Stefan
I attach an early draft filemagic patch.
- The mime type info usualy is less granular than the free
file --mime /tmp/tst.xcf
/tmp/tst.xcf: application/octet-stream; charset=binary
file /tmp/tst.xcf
/tmp/tst.xcf: GIMP XCF image data, version 0, 640 x 480, RGB Color
This is dependent on the file magic info file used.
- We can probably have much fun debating what the interface should look
like at the lisp level. Any ideas?
diff --git a/configure.in b/configure.in
index f4096db..cb74523 100644
--- a/configure.in
+++ b/configure.in
@@ -137,6 +137,8 @@ OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([libotf],[don't use libotf for OpenType font support])
OPTION_DEFAULT_ON([m17n-flt],[don't use m17n-flt for text shaping])
+OPTION_DEFAULT_ON([filemagic],[don't compile with filemagic support])
+
OPTION_DEFAULT_ON([toolkit-scroll-bars],[don't use Motif or Xaw3d scroll bars])
OPTION_DEFAULT_ON([xaw3d],[don't use Xaw3d])
OPTION_DEFAULT_ON([xim],[don't use X11 XIM])
@@ -2223,6 +2225,19 @@ if test x"$ac_cv_func_alloca_works" != xyes; then
AC_MSG_ERROR( [a system implementation of alloca is required] )
fi
+
+HAVE_LIBMAGIC=no
+if test "${with_filemagic}" != "no"; then
+ #libmagic support
+ AC_CHECK_HEADERS(magic.h, [ AC_CHECK_LIB(magic,magic_open,HAVE_LIBMAGIC=yes) ])
+fi
+
+if test "${HAVE_LIBMAGIC}" = "yes"; then
+ LIBMAGIC=-lmagic
+ AC_SUBST(LIBMAGIC)
+ AC_DEFINE(HAVE_LIBMAGIC, 1, [Define to 1 if using libmagic.])
+fi
+
# fmod, logb, and frexp are found in -lm on most systems.
# On HPUX 9.01, -lm does not contain logb, so check for sqrt.
AC_CHECK_LIB(m, sqrt)
@@ -2954,6 +2969,7 @@ echo " Does Emacs use -lpng? ${HAVE_PNG}"
echo " Does Emacs use -lrsvg-2? ${HAVE_RSVG}"
echo " Does Emacs use -lgpm? ${HAVE_GPM}"
echo " Does Emacs use -ldbus? ${HAVE_DBUS}"
+echo " Does Emacs use -lmagic? ${HAVE_LIBMAGIC}"
echo " Does Emacs use -lfreetype? ${HAVE_FREETYPE}"
echo " Does Emacs use -lm17n-flt? ${HAVE_M17N_FLT}"
diff --git a/src/Makefile.in b/src/Makefile.in
index 425cf98..b80255a 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -420,6 +420,7 @@ LIBX= $(LIBXMENU) LD_SWITCH_X_SITE
#endif /* not HAVE_LIBRESOLV */
@@ -511,6 +512,12 @@ MSDOS_OBJ = dosfns.o msdos.o w16select.o xmenu.o
#endif
#endif
+#ifdef HAVE_LIBMAGIC
+FILEMAGIC_OBJ = filemagic.o
+#else
+FILEMAGIC_OBJ =
+#endif
Can you please avoid adding new #ifdefs here? (we are trying to get rid
of them).
Maybe use @FILEMAGIC_OBJ@ ?

Or even use the file unconditionally, just add the proper #ifdefs to
make it be empty if not HAVE_LIBMAGIC ?
Stephen J. Turnbull
2009-08-20 01:03:24 UTC
Permalink
Post by j***@verona.se
- We can probably have much fun debating what the interface should look
like at the lisp level. Any ideas?
I think the main interface should be just "file-magic", even if it's
not in file.el. It's analogous to "file-attributes", etc.
"file-magic-file" sounds like it returns the magic file (file*s*
possible?) used.
Eli Zaretskii
2009-08-20 03:12:23 UTC
Permalink
Date: Thu, 20 Aug 2009 10:03:24 +0900
Post by j***@verona.se
- We can probably have much fun debating what the interface should look
like at the lisp level. Any ideas?
I think the main interface should be just "file-magic", even if it's
not in file.el. It's analogous to "file-attributes", etc.
Actually, I think the interface should be `file-type' or some such.
Like `file-attributes' that is a wrapper for `stat', the API name
should have a good semantic value, instead of just inheriting the name
of the low-level C functions it uses to do the job.
Stephen J. Turnbull
2009-08-20 04:50:13 UTC
Permalink
Post by Eli Zaretskii
Post by Stephen J. Turnbull
I think the main interface should be just "file-magic", even if it's
not in file.el. It's analogous to "file-attributes", etc.
Actually, I think the interface should be `file-type' or some such.
Seriously, "file-type" is a terrible, ambiguous name. DOS vs. Unix,
the extension of the file's name (!!), what program uses it, text
vs. binary, MIME type, endianness of the platform, Unicode vs. legacy
coding, copyleft vs. permissive vs. proprietary vs. public domain, I
could go on. Sure, it's about "files", but the "type" is what file(1)
infers from magic numbers in the file, no more and no less ... and
exactly the people you expect to say "huh?" will proceed to guess
anything but the truth about the semantics of `file-type'.
Post by Eli Zaretskii
Like `file-attributes' that is a wrapper for `stat', the API name
should have a good semantic value, instead of just inheriting the name
of the low-level C functions it uses to do the job.
`file-magic' does have a good semantic value. It means "look at the
first few bytes of a file and infer various metadata about it, based
on a published database of 'magic numbers'." It tells not only the
purpose but the exact semantics.

I suppose the more explicit `file-type-by-magic' might be better.
Eli Zaretskii
2009-08-20 18:20:13 UTC
Permalink
Date: Thu, 20 Aug 2009 13:50:13 +0900
Post by Eli Zaretskii
Post by Stephen J. Turnbull
I think the main interface should be just "file-magic", even if it's
not in file.el. It's analogous to "file-attributes", etc.
Actually, I think the interface should be `file-type' or some such.
Seriously, "file-type" is a terrible, ambiguous name. DOS vs. Unix,
the extension of the file's name (!!), what program uses it, text
vs. binary, MIME type, endianness of the platform, Unicode vs. legacy
coding, copyleft vs. permissive vs. proprietary vs. public domain, I
could go on. Sure, it's about "files", but the "type" is what file(1)
infers from magic numbers in the file, no more and no less ... and
exactly the people you expect to say "huh?" will proceed to guess
anything but the truth about the semantics of `file-type'.
Maybe so, but still "man file" shows this at the very first line:

file - determine file type
I suppose the more explicit `file-type-by-magic' might be better.
I'm okay with that as well, but maybe `file-type-by-magic-signature'
is even better (if we indeed want to advertise its inner workings).
But if this function will fall back on something else if libmagic is
not available, then I think this name is not a good one.
Stephen J. Turnbull
2009-08-21 00:19:48 UTC
Permalink
Post by Eli Zaretskii
file - determine file type
Sure. file does more than invoke libmagic, though, and in the Unix
context the ambiguous "type" is not so problematic; it is basically
defined by "what file(1) does." For the broader audience, if "magic"
is unacceptable, they probably don't know what file(1) does.
Post by Eli Zaretskii
But if this function will fall back on something else if libmagic is
not available, then I think this name is not a good one.
This function should be available in a form which only uses libmagic.
What other tests should be done should be done in LISP, and I see no
reason to hide the libmagic functionality in an "-internal" or
otherwise "not for regular use" form.
Richard Stallman
2009-08-20 18:32:08 UTC
Permalink
Actually, I think the interface should be `file-type' or some such.

If it will return the MIME type, `file-mime-type' seems like a good
name.
Stefan Monnier
2009-08-21 19:10:18 UTC
Permalink
Post by Eli Zaretskii
Actually, I think the interface should be `file-type' or some such.
If it will return the MIME type, `file-mime-type' seems like a good
name.
We need two functions:
- one implemented in C which provides the info from libmagic, no more no less.
- one implemented in Elisp which uses the previous one as well as
other techniques (maybe even auto-mode-alist).

So yes, the second one can be called `file-mime-type', but the first
would better be called `file-magic' or somesuch.


Stefan
Stephen J. Turnbull
2009-08-22 05:03:42 UTC
Permalink
Post by Stefan Monnier
- one implemented in C which provides the info from libmagic, no more no less.
+1
Post by Stefan Monnier
So yes, the second one can be called `file-mime-type', but the first
would better be called `file-magic' or somesuch.
Eli doesn't like that, and he's convinced me. `file-type-by-magic'
or some such, I think. Even if the reader hasn't had the benefit of a
classical education, they will have some idea of what's going on.
Stefan Monnier
2009-08-23 01:03:15 UTC
Permalink
Post by Stephen J. Turnbull
Post by Stefan Monnier
So yes, the second one can be called `file-mime-type', but the first
would better be called `file-magic' or somesuch.
Eli doesn't like that, and he's convinced me. `file-type-by-magic'
or some such, I think. Even if the reader hasn't had the benefit of a
classical education, they will have some idea of what's going on.
As long as `magic' is in the name, that's OK.


Stefan
Stefan Monnier
2009-08-20 13:57:41 UTC
Permalink
Post by j***@verona.se
Post by Stefan Monnier
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
I attach an early draft filemagic patch.
- The mime type info usualy is less granular than the free
We can provide both. I think we'd want 2 functions: one to get the free
text info, which just returns a string (or nil), and another to get the
MIME info, which returns a cons, whose car is a symbol such as
application/octet-stream, and whose cdr is an alist representing
the additional optional info.
Post by j***@verona.se
file --mime /tmp/tst.xcf
/tmp/tst.xcf: application/octet-stream; charset=binary
This would look like (application/octet-stream (charset . "binary"))

A few more comments:
- please follow the GNU coding convention. I.e. put spaces where
they need to be (e.g. around parens and operators).
- don't bother with a new file. Just put it into fileio.c.
- as someone else mentioned, CPP macros in the Makefile.in are things
we'd like to get rid of, so please don't put more of them there.
Use autoconf's m4 macros instead, thank you.


Stefan
j***@verona.se
2009-08-20 19:19:28 UTC
Permalink
Post by Stefan Monnier
Post by j***@verona.se
Post by Stefan Monnier
I think it's a good idea. It may require some non-trivial changes on
the Lisp side, since libmagic's information is not quite the same as
what Emacs currently uses: we'll probably want to use libmagic to get
a MIME-type and then have a table mapping mime-types to major modes or
some such.
I attach an early draft filemagic patch.
- The mime type info usualy is less granular than the free
We can provide both. I think we'd want 2 functions: one to get the free
text info, which just returns a string (or nil), and another to get the
MIME info, which returns a cons, whose car is a symbol such as
application/octet-stream, and whose cdr is an alist representing
the additional optional info.
Post by j***@verona.se
file --mime /tmp/tst.xcf
/tmp/tst.xcf: application/octet-stream; charset=binary
This would look like (application/octet-stream (charset . "binary"))
- please follow the GNU coding convention. I.e. put spaces where
they need to be (e.g. around parens and operators).
- don't bother with a new file. Just put it into fileio.c.
- as someone else mentioned, CPP macros in the Makefile.in are things
we'd like to get rid of, so please don't put more of them there.
Use autoconf's m4 macros instead, thank you.
Stefan
Find attached a new slightly improved patch according to the suggestions
of the list.

However, I see now I didnt properly read your text above, the
function I wrote now returns a list with 3 elements, (MIME_TYPE
MIME_ENCODING DESCRIPTION). Do we still need 2 functions as you write
above?
Andreas Schwab
2009-08-20 22:08:21 UTC
Permalink
Post by j***@verona.se
diff --git a/configure.in b/configure.in
index f4096db..cb74523 100644
--- a/configure.in
+++ b/configure.in
@@ -137,6 +137,8 @@ OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([libotf],[don't use libotf for OpenType font support])
OPTION_DEFAULT_ON([m17n-flt],[don't use m17n-flt for text shaping])
+OPTION_DEFAULT_ON([filemagic],[don't compile with filemagic support])
IMHO the option should be named libmagic, since that's how the library
is named.
Post by j***@verona.se
diff --git a/src/config.in b/src/config.in
index 404e00b..c966a09 100644
--- a/src/config.in
+++ b/src/config.in
@@ -262,6 +262,9 @@ along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. */
/* Define to 1 if you have the gpm library (-lgpm). */
#undef HAVE_GPM
+/* Define to 1 if you have the filemagic library (-lmagic). */
+#undef HAVE_LIBMAGIC
+
/* Define to 1 if you have the `grantpt' function. */
#undef HAVE_GRANTPT
This is generated by autoheader.
Post by j***@verona.se
diff --git a/src/fileio.c b/src/fileio.c
index 3702d4c..375502e 100644
--- a/src/fileio.c
+++ b/src/fileio.c
@@ -205,6 +205,10 @@ Lisp_Object Vdirectory_sep_char;
int write_region_inhibit_fsync;
#endif
+#ifdef HAVE_LIBMAGIC
+#include <magic.h>
+#endif
+
/* Non-zero means call move-file-to-trash in Fdelete_file or
Fdelete_directory. */
int delete_by_moving_to_trash;
@@ -2997,6 +3001,45 @@ DEFUN ("unix-sync", Funix_sync, Sunix_sync, 0, 0, "",
#endif /* HAVE_SYNC */
+#ifdef HAVE_LIBMAGIC
+DEFUN ("file-magic-file", Ffile_magic_file, Sfile_magic_file, 1,1,0,
+ doc: /* Return (MIME_TYPE MIME_ENCODING DESCRIPTION) for FILENAME.
+Return nil on error. */)
+ (filename)
+ Lisp_Object filename;
+{
+ magic_t cookie=NULL;
+ if (!STRINGP (filename)) goto libmagic_error;
Just use CHECK_STRING.
Post by j***@verona.se
+ char* f = SDATA (filename);
+ char* rvs;
No C99 features yet. Be careful with raw string pointers and GC.
Post by j***@verona.se
+ cookie = magic_open (MAGIC_NONE);
+ magic_load (cookie,NULL); //load default database
if (cookie == NULL) ?
Post by j***@verona.se
+
+ magic_setflags (cookie, MAGIC_MIME_TYPE);
+ rvs = magic_file (cookie, f);
+ if (rvs == NULL) goto libmagic_error;
Use report_file_error, provided that magic_file sets errno appropriately.
Post by j***@verona.se
+ Lisp_Object file_freetext = make_specified_string (rvs, strlen(rvs), strlen(rvs), NULL);
Use build_string.

Andreas.
--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
j***@verona.se
2009-08-21 09:55:45 UTC
Permalink
New libmagic patch, mostly fixing Andreas concerns, and some more error
handling.

I dont understand the autoheader comment below though. When I originaly
compiled, the config.h wasnt generated with the libmagic info included,
did I do something wrong? Is autoheader supposed to generate config.in?
When does that happen?

/Joakim
Post by Andreas Schwab
Post by j***@verona.se
diff --git a/configure.in b/configure.in
index f4096db..cb74523 100644
--- a/configure.in
+++ b/configure.in
@@ -137,6 +137,8 @@ OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([libotf],[don't use libotf for OpenType font support])
OPTION_DEFAULT_ON([m17n-flt],[don't use m17n-flt for text shaping])
+OPTION_DEFAULT_ON([filemagic],[don't compile with filemagic support])
IMHO the option should be named libmagic, since that's how the library
is named.
Post by j***@verona.se
diff --git a/src/config.in b/src/config.in
index 404e00b..c966a09 100644
--- a/src/config.in
+++ b/src/config.in
@@ -262,6 +262,9 @@ along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. */
/* Define to 1 if you have the gpm library (-lgpm). */
#undef HAVE_GPM
+/* Define to 1 if you have the filemagic library (-lmagic). */
+#undef HAVE_LIBMAGIC
+
/* Define to 1 if you have the `grantpt' function. */
#undef HAVE_GRANTPT
This is generated by autoheader.
Post by j***@verona.se
diff --git a/src/fileio.c b/src/fileio.c
index 3702d4c..375502e 100644
--- a/src/fileio.c
+++ b/src/fileio.c
@@ -205,6 +205,10 @@ Lisp_Object Vdirectory_sep_char;
int write_region_inhibit_fsync;
#endif
+#ifdef HAVE_LIBMAGIC
+#include <magic.h>
+#endif
+
/* Non-zero means call move-file-to-trash in Fdelete_file or
Fdelete_directory. */
int delete_by_moving_to_trash;
@@ -2997,6 +3001,45 @@ DEFUN ("unix-sync", Funix_sync, Sunix_sync, 0, 0, "",
#endif /* HAVE_SYNC */
+#ifdef HAVE_LIBMAGIC
+DEFUN ("file-magic-file", Ffile_magic_file, Sfile_magic_file, 1,1,0,
+ doc: /* Return (MIME_TYPE MIME_ENCODING DESCRIPTION) for FILENAME.
+Return nil on error. */)
+ (filename)
+ Lisp_Object filename;
+{
+ magic_t cookie=NULL;
+ if (!STRINGP (filename)) goto libmagic_error;
Just use CHECK_STRING.
Post by j***@verona.se
+ char* f = SDATA (filename);
+ char* rvs;
No C99 features yet. Be careful with raw string pointers and GC.
Post by j***@verona.se
+ cookie = magic_open (MAGIC_NONE);
+ magic_load (cookie,NULL); //load default database
if (cookie == NULL) ?
Post by j***@verona.se
+
+ magic_setflags (cookie, MAGIC_MIME_TYPE);
+ rvs = magic_file (cookie, f);
+ if (rvs == NULL) goto libmagic_error;
Use report_file_error, provided that magic_file sets errno appropriately.
Post by j***@verona.se
+ Lisp_Object file_freetext = make_specified_string (rvs, strlen(rvs), strlen(rvs), NULL);
Use build_string.
Andreas.
Eli Zaretskii
2009-08-21 11:01:35 UTC
Permalink
Date: Fri, 21 Aug 2009 11:55:45 +0200
Is autoheader supposed to generate config.in?
Yes.
When does that happen?
Either run autoheader by hand, or run autoreconf (which AFAIK is
supposed to run autoheader as well).
+DEFUN ("libmagic-file-internal", Flibmagic_file_internal, Slibmagic_file_internal, 1,1,0,
+ doc: /* Return (MIME_TYPE MIME_ENCODING DESCRIPTION) for FILENAME_OR_BUFFER.
+Return nil on error. */)
This doc string "needs work"(TM). Please use the doc string of
visited-file-name as an example.
+ (filename_or_buffer)
+ Lisp_Object filename_or_buffer;
Using a `_' in an argument is un-Lisp'y (IMO).
+{
+ CHECK_STRING_OR_BUFFER (filename_or_buffer);
+ magic_t cookie=NULL;
+ char* f = NULL;
+ const char* rvs;
+
+ if (STRINGP (filename_or_buffer))
+ f = SDATA (filename_or_buffer);
+ if (BUFFERP (filename_or_buffer))
+ f = SDATA (XBUFFER (filename_or_buffer)->filename);
+ cookie = magic_open (MAGIC_ERROR);
+ if (cookie == NULL) goto libmagic_error;
+ magic_load (cookie, NULL); //load default database
+
+ magic_setflags (cookie, MAGIC_MIME_TYPE | MAGIC_ERROR);
+ rvs = magic_file (cookie, f);
You need to encode file names before you pass them to C APIs. Use
ENCODE_FILE to do that; see file-attributes for an example of how this
is done.
+ if (rvs == NULL) goto libmagic_error;
+ Lisp_Object file_mime = intern (rvs);
You cannot declare variables in the middle of a block: Emacs does not
require a C99 compiler yet and need to support C90 or even older
compilers, which will reject this code.
+ magic_setflags (cookie, MAGIC_MIME_ENCODING | MAGIC_ERROR);
+ rvs=magic_file (cookie, f);
+ if (rvs == NULL) goto libmagic_error;
+ Lisp_Object file_encoding = intern(rvs);
Is file_encoding supposed to be a valid encoding, one of those for
which Emacs has a coding-system? If so, perhaps you should make sure
you indeed return a valid coding-system or its alias, or otherwise
tell in the doc string that it's not guaranteed to be valid (so that
the caller should validate it before using).
j***@verona.se
2009-08-21 17:38:15 UTC
Permalink
New filemagic patch mostly fixing Eli:s concerns.
Post by Eli Zaretskii
+DEFUN ("libmagic-file-internal", Flibmagic_file_internal, Slibmagic_file_internal, 1,1,0,
+ doc: /* Return (MIME_TYPE MIME_ENCODING DESCRIPTION) for FILENAME_OR_BUFFER.
+Return nil on error. */)
Renamed entry point to libmagic-file-internal since its meant to be
of internal usage for a lisp wrapper, yet to be written. Should that be
a new file BTW?
Post by Eli Zaretskii
This doc string "needs work"(TM). Please use the doc string of
visited-file-name as an example.
I worked on this
Post by Eli Zaretskii
+ (filename_or_buffer)
+ Lisp_Object filename_or_buffer;
Using a `_' in an argument is un-Lisp'y (IMO).
Ok.
Post by Eli Zaretskii
You need to encode file names before you pass them to C APIs. Use
ENCODE_FILE to do that; see file-attributes for an example of how this
is done.
Ok.
Post by Eli Zaretskii
+ if (rvs == NULL) goto libmagic_error;
+ Lisp_Object file_mime = intern (rvs);
You cannot declare variables in the middle of a block: Emacs does not
require a C99 compiler yet and need to support C90 or even older
compilers, which will reject this code.
I'm habing trouble remembering not to use c99. Is there some convenient
compiler flag to force lower versions? Fixed the errors I saw.
Post by Eli Zaretskii
Is file_encoding supposed to be a valid encoding, one of those for
which Emacs has a coding-system? If so, perhaps you should make sure
you indeed return a valid coding-system or its alias, or otherwise
tell in the doc string that it's not guaranteed to be valid (so that
the caller should validate it before using).
I described a bit more in the doc string. Ok?
Rupert Swarbrick
2009-08-21 17:46:34 UTC
Permalink
Post by j***@verona.se
I'm habing trouble remembering not to use c99. Is there some convenient
compiler flag to force lower versions? Fixed the errors I saw.
Maybe -ansi combined with -pedantic, for gcc.

***@hake:~/tmp gcc -ansi -pedantic -o test test.c
test.c: In function ‘main’:
test.c:10: warning: ISO C90 forbids mixed declarations and code

Rupert
Andreas Schwab
2009-08-21 18:31:50 UTC
Permalink
Post by j***@verona.se
+#ifdef HAVE_LIBMAGIC
+DEFUN ("libmagic-file-internal", Flibmagic_file_internal, Slibmagic_file_internal, 1,1,0,
+ doc: /* Return (MIME-TYPE MIME-ENCODING DESCRIPTION) for
The first doc line needs to be a complete sentence fitting in about 75
columns. You should only mention the argument here, and explain the
structure of the return value in the second sentence.
Post by j***@verona.se
+ char* f = NULL;
+ const char* rvs;
+ Lisp_Object file_freetext;
+ Lisp_Object rv;
+ Lisp_Object file_mime;
+ Lisp_Object file_encoding;
+
+ Lisp_Object filename, absname, encoded_absname;
+ struct gcpro gcpro1;
+
+ GCPRO1 (f);
You cannot GCPRO pointers, only Lisp_Objects.
Post by j***@verona.se
+ if (STRINGP (filename_or_buffer))
+ filename = filename_or_buffer;
+ if (BUFFERP (filename_or_buffer))
+ filename = XBUFFER (filename_or_buffer)->filename;
+ absname = Fexpand_file_name (filename, current_buffer->directory);
+ f = SDATA(ENCODE_FILE (absname));
Since ENCODE_FILE can GC you need to protect every Lisp_Object variable
used around the call, especially all that are Lisp_Strings.
Post by j***@verona.se
+ if (cookie != NULL) magic_close (cookie);
+ report_file_error("Libmagic error",Qnil);
You need to make sure that errno is preserved from the failed operation
and that you get a meaningful errno in the first place.

Andreas.
--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Drew Adams
2009-08-21 19:13:09 UTC
Permalink
Post by Andreas Schwab
The first doc line needs to be a complete sentence
fitting in about 75 columns.
No, <= 67 chars, not 75 chars.

This is especially important for resizing of windows or frames to fit their
widest line.
Eli Zaretskii
2009-08-21 18:42:35 UTC
Permalink
Date: Fri, 21 Aug 2009 19:38:15 +0200
Renamed entry point to libmagic-file-internal since its meant to be
of internal usage for a lisp wrapper, yet to be written. Should that be
a new file BTW?
files.el sounds good enough to me.
I'm habing trouble remembering not to use c99. Is there some convenient
compiler flag to force lower versions?
I think -std=c89 should do what you want (assuming you use GCC).
I described a bit more in the doc string. Ok?
I suggest the following variation of it:

doc: /* Return a list describing the argument FILE-OR-BUFFER.

If FILE-OR-BUFFER is a file name, return information about that file.
If FILE-OR-BUFFER is a buffer, return information about the buffer's file.

The return value is a list of the form

(MIME-TYPE MIME-ENCODING DESCRIPTION)

MIME-TYPE and MIME-ENCODING are the MIME type and encoding suitable
for the file's contents, as determined by libmagic.
DESCRIPTION is the human readable descripton of the file type offered by
libmagic.

The function throws a file-error if libmagic cannot determine one of
the elements of the above list.

The default libmagic database is used, and the quality of information
given depends on your version of that database. Often the MIME type is
less exact than the description. */)

Two more comments:

. I am not sure you need to push the file-or-buffer dichotomy to the
C level. It is easy enough to do that in Lisp, or even in the
application, for that matter. Why complicate a primitive to do
such a simple job?

. You didn't say in the doc string whether MIME-ENCODING is
guaranteed to be a valid Emacs coding-system. I think a user will
be desperate to know that.

Thanks for working on this.
j***@verona.se
2009-08-21 21:48:30 UTC
Permalink
Post by Eli Zaretskii
doc: /* Return a list describing the argument FILE-OR-BUFFER.
If FILE-OR-BUFFER is a file name, return information about that file.
...
Post by Eli Zaretskii
given depends on your version of that database. Often the MIME type is
less exact than the description. */)
I used this description.
Post by Eli Zaretskii
. I am not sure you need to push the file-or-buffer dichotomy to the
C level. It is easy enough to do that in Lisp, or even in the
application, for that matter. Why complicate a primitive to do
such a simple job?
Ok, removed it.
Post by Eli Zaretskii
. You didn't say in the doc string whether MIME-ENCODING is
guaranteed to be a valid Emacs coding-system. I think a user will
be desperate to know that.
Did like Stefan sugested, return a string.

Also improved GCPRO stuff, like Andreas suggested.
--
Joakim Verona
Andreas Schwab
2009-08-21 22:46:59 UTC
Permalink
+ GCPRO6 (file_description, file_mime, file_encoding, rv, absname, encoded_absname);
That's too much. You only need to protect variables used around calls
that can GC. Arguments to lisp functions are implicitly protected. For
example, there are no function calls during the lifetime of absname.
And encoded_absname is completely unused.
+ report_file_error("Libmagic error",Qnil);
+ if (cookie != NULL) magic_close (cookie);
report_file_error throws, so you leak a resource.

Andreas.
--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
j***@verona.se
2009-08-22 20:18:26 UTC
Permalink
Post by Andreas Schwab
+ GCPRO6 (file_description, file_mime, file_encoding, rv, absname, encoded_absname);
That's too much. You only need to protect variables used around calls
that can GC. Arguments to lisp functions are implicitly protected. For
example, there are no function calls during the lifetime of absname.
And encoded_absname is completely unused.
It seems to me I only need to protect f, which I would do by GCPRO:ing
absname. Since this is aparently wrong, I will leave it like it is,
since it doesnt hurt to GCPRO too much. (?)
Post by Andreas Schwab
+ report_file_error("Libmagic error",Qnil);
+ if (cookie != NULL) magic_close (cookie);
report_file_error throws, so you leak a resource.
Fixed I think.
Post by Andreas Schwab
Andreas.
--
Joakim Verona
Ken Raeburn
2009-08-22 23:13:47 UTC
Permalink
Post by j***@verona.se
Post by Andreas Schwab
+ GCPRO6 (file_description, file_mime, file_encoding, rv,
absname, encoded_absname);
That's too much. You only need to protect variables used around calls
that can GC. Arguments to lisp functions are implicitly
protected. For
example, there are no function calls during the lifetime of absname.
And encoded_absname is completely unused.
It seems to me I only need to protect f, which I would do by GCPRO:ing
absname. Since this is aparently wrong, I will leave it like it is,
since it doesnt hurt to GCPRO too much. (?)
If ENCODE_FILE returns a new lisp string object, you need to GCPRO
that object, not absname. After the call to ENCODE_FILE, absname is
unused, so it won't need protection. In fact, it looks to me like
after that point, GC isn't possible, so I'm not sure anything needs
GCPRO'tection here.
Post by j***@verona.se
Post by Andreas Schwab
+ report_file_error("Libmagic error",Qnil);
+ if (cookie != NULL) magic_close (cookie);
report_file_error throws, so you leak a resource.
Fixed I think.
No, you're still trying to call magic_close after report_file_error
returns, which it won't.
Post by j***@verona.se
+{
+ CHECK_STRING (filename);
+ magic_t cookie=NULL;
+ char* f = NULL;
CHECK_STRING is executable code, and should be moved down after the
variable declarations.
Post by j***@verona.se
+ const char* rvs;
+ Lisp_Object file_description;
+ Lisp_Object file_mime;
+ Lisp_Object file_encoding;
+ Lisp_Object rv;
+
+ Lisp_Object absname, encoded_absname;
+ struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5, gcpro6;
+
+ GCPRO6 (file_description, file_mime, file_encoding, rv, absname, encoded_absname);
It seems to be common -- I'm not sure if it's required, but I would
conservatively assume so -- for any local variable GCPRO'd to get
initialized before any possible call to garbage collection, so the
precise(?) garbage collector won't be scanning random stack values and
thinking they're lisp objects; if the initialization is unclear, often
that means simply assigning Qnil just before or after the GCPRO call.

Different GC marking strategies are used on different platforms, so
the lack of an obvious problem on one platform doesn't mean the code
will work okay on another.

Ken
j***@verona.se
2009-08-23 23:38:26 UTC
Permalink
Post by Ken Raeburn
Post by j***@verona.se
Post by Andreas Schwab
+ GCPRO6 (file_description, file_mime, file_encoding, rv,
absname, encoded_absname);
That's too much. You only need to protect variables used around calls
that can GC. Arguments to lisp functions are implicitly protected.
For
example, there are no function calls during the lifetime of absname.
And encoded_absname is completely unused.
It seems to me I only need to protect f, which I would do by GCPRO:ing
absname. Since this is aparently wrong, I will leave it like it is,
since it doesnt hurt to GCPRO too much. (?)
If ENCODE_FILE returns a new lisp string object, you need to GCPRO
that object, not absname. After the call to ENCODE_FILE, absname is
unused, so it won't need protection. In fact, it looks to me like
after that point, GC isn't possible, so I'm not sure anything needs
GCPRO'tection here.
I seem to be having trouble with GCPRO. Now I've marked the places I
believe might gc in the code.
Post by Ken Raeburn
Post by j***@verona.se
Post by Andreas Schwab
report_file_error throws, so you leak a resource.
Fixed I think.
No, you're still trying to call magic_close after report_file_error
returns, which it won't.
Maybe I sent the wrong patch revision last time, better now then?
Post by Ken Raeburn
CHECK_STRING is executable code, and should be moved down after the
variable declarations.
same here.
Post by Ken Raeburn
It seems to be common -- I'm not sure if it's required, but I would
conservatively assume so -- for any local variable GCPRO'd to get
initialized before any possible call to garbage collection, so the
precise(?) garbage collector won't be scanning random stack values and
thinking they're lisp objects; if the initialization is unclear, often
that means simply assigning Qnil just before or after the GCPRO call.
Different GC marking strategies are used on different platforms, so
the lack of an obvious problem on one platform doesn't mean the code
will work okay on another.
Ken
--
Joakim Verona
Eli Zaretskii
2009-08-24 03:05:15 UTC
Permalink
Date: Mon, 24 Aug 2009 01:38:26 +0200
+ f = SDATA(ENCODE_FILE (absname));//might gc
No C99 comments, please.
j***@verona.se
2009-08-24 12:30:33 UTC
Permalink
Post by Eli Zaretskii
Date: Mon, 24 Aug 2009 01:38:26 +0200
+ f = SDATA(ENCODE_FILE (absname));//might gc
No C99 comments, please.
This time I compiled with -std=c89.
Eli Zaretskii
2009-08-23 03:24:16 UTC
Permalink
Date: Sat, 22 Aug 2009 22:18:26 +0200
+ magic_setflags (cookie, MAGIC_MIME_TYPE | MAGIC_ERROR);
+ rvs = magic_file (cookie, f);
+ if (rvs == NULL) goto libmagic_error;
+ file_mime = intern (rvs);
+
+ magic_setflags (cookie, MAGIC_MIME_ENCODING | MAGIC_ERROR);
+ rvs=magic_file (cookie, f);
+ if (rvs == NULL) goto libmagic_error;
+ file_encoding = build_string(rvs);
Since you are returning strings for MIME type and MIME encoding, not
symbols, I suggest to state that in the doc string. Normally, when we
say "SOMETHING is an encoding", we mean it's a symbol.

Alternatively, use MIME-ENCODING-NAME etc., to indicate that it's just
a name of the thing, not the thing itself.
Stefan Monnier
2009-08-21 19:18:33 UTC
Permalink
Post by Eli Zaretskii
+ magic_setflags (cookie, MAGIC_MIME_ENCODING | MAGIC_ERROR);
+ rvs=magic_file (cookie, f);
+ if (rvs == NULL) goto libmagic_error;
+ Lisp_Object file_encoding = intern(rvs);
Is file_encoding supposed to be a valid encoding, one of those for
which Emacs has a coding-system? If so, perhaps you should make sure
you indeed return a valid coding-system or its alias, or otherwise
tell in the doc string that it's not guaranteed to be valid (so that
the caller should validate it before using).
The simplest route is to return a string rather than a symbol.
That should clearly convey the idea that this may or may not be a valid
coding-system.


Stefan
Andreas Schwab
2009-08-21 13:19:10 UTC
Permalink
Post by j***@verona.se
I dont understand the autoheader comment below though. When I originaly
compiled, the config.h wasnt generated with the libmagic info included,
did I do something wrong? Is autoheader supposed to generate config.in?
When does that happen?
Configure with --enable-maintainer-mode.

Andreas.
--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Richard Stallman
2009-08-20 18:32:17 UTC
Permalink
If we go this route, we should not load gnus/mailcap.el.
It contains lots of other stuff. So we ought to separate
out and preload the right part of it, such as the variable
`mailcap-mime-data'.
Reiner Steib
2009-08-20 20:27:50 UTC
Permalink
Post by Richard Stallman
If we go this route, we should not load gnus/mailcap.el.
(mailcap.el doesn't load anything else, AFAICS.)
Post by Richard Stallman
It contains lots of other stuff.
Ironically last year some stuff from dired-aux.el was moved to
mailcap.el and is used in `minibuffer-default-add-shell-commands' from
simple.el.
Post by Richard Stallman
So we ought to separate out and preload the right part of it, such
as the variable `mailcap-mime-data'.
The initial value of `mailcap-mime-data' is a fall-back (for systems
without proper mailcap files). To make it useful, probably
`mailcap-parse-mailcaps' and related functions are necessary:

$ emacs-23-1 -Q -f ielm -l mailcap
...
ELISP> (with-temp-buffer
(insert
(pp-to-string mailcap-mime-data))
(point-max))
4993
ELISP> (mailcap-parse-mailcaps)
t
ELISP> (with-temp-buffer
(insert
(pp-to-string mailcap-mime-data))
(point-max))
40322
ELISP> system-type
gnu/linux

Bye, Reiner.
--
,,,
(o o)
---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/
Richard Stallman
2009-08-21 14:08:57 UTC
Permalink
The initial value of `mailcap-mime-data' is a fall-back (for systems
without proper mailcap files). To make it useful, probably
`mailcap-parse-mailcaps' and related functions are necessary:

There is no need for all that to be standardly loaded/used in Emacs.

All that Emacs needs is to recognize the file types that Emacs
has special handling for.
Stefan Monnier
2009-08-21 19:16:57 UTC
Permalink
Post by Richard Stallman
If we go this route, we should not load gnus/mailcap.el.
Nobody suggested otherwise until now (actually noone mentioned anything
close to mailcap in this thread, AFAICT). Could you expand on what use
you were thinking of making of mailcap in this context?


Stefan
Juri Linkov
2009-08-28 00:27:35 UTC
Permalink
I often wish that files would open in Emacs with correct mode
more often when there is no file extension.
In `auto-mode-alist' you can see that with the exception of
`archive-mode', `doc-view-mode' and `image-mode', all remaining
modes are programming text modes. It would be more useful
to identify file types for these modes that libmagic can't do.
Do you know a library that identifies programming languages?
Such a library might be implemented using a Bayesian classifier
trained on a sufficiently large corpus of different programming
languages.
--
Juri Linkov
http://www.jurta.org/emacs/
Stefan Monnier
2009-08-28 04:58:42 UTC
Permalink
Post by Juri Linkov
I often wish that files would open in Emacs with correct mode
more often when there is no file extension.
In `auto-mode-alist' you can see that with the exception of
`archive-mode', `doc-view-mode' and `image-mode', all remaining
modes are programming text modes. It would be more useful
to identify file types for these modes that libmagic can't do.
Do you know a library that identifies programming languages?
Such a library might be implemented using a Bayesian classifier
trained on a sufficiently large corpus of different programming
languages.
OTOH, how often do you see a file containg programming language code and
yet without ny extension?


Stefan
Stephen J. Turnbull
2009-08-28 09:00:25 UTC
Permalink
Post by Stefan Monnier
OTOH, how often do you see a file containg programming language code and
yet without ny extension?
Extremely frequently. The great majority that I see are correctly
identified by file(1) (I believe using libmagic), however, by parsing
the shebang.

There are also cases of multiple extensions, where I've seen (for
example) foo.c.inc used for C implementation code that is used in
multiple contexts (perhaps with different behavior according to
#ifdefs). This would not be recognized by typical Emacs extension
parsing since although it matches something like "\.c\>", it doesn't
match the more usual idioms of "\.c$" or "\.c\'".
Stefan Monnier
2009-08-28 14:56:14 UTC
Permalink
Post by Stephen J. Turnbull
Post by Stefan Monnier
OTOH, how often do you see a file containg programming language code and
yet without ny extension?
Extremely frequently.
In what kind of circumstance?
Post by Stephen J. Turnbull
The great majority that I see are correctly identified by file(1) (I
believe using libmagic), however, by parsing the shebang.
Oh, so they're executables with a shebang. That's OK we don't need
`file' for that since we have interpreter-mode-alist. Emacs should
already DTRT for them.
Post by Stephen J. Turnbull
There are also cases of multiple extensions, where I've seen (for
example) foo.c.inc used for C implementation code that is used in
multiple contexts (perhaps with different behavior according to
#ifdefs). This would not be recognized by typical Emacs extension
parsing since although it matches something like "\.c\>", it doesn't
match the more usual idioms of "\.c$" or "\.c\'".
I've had

(setq auto-mode-alist (append auto-mode-alist '(("\\.[^/.]+\\'" ignore t))))

in my .emacs for eons to cover such cases.


Stefan
Stephen J. Turnbull
2009-08-29 04:11:27 UTC
Permalink
Post by Stefan Monnier
Post by Stephen J. Turnbull
The great majority that I see are correctly identified by file(1) (I
believe using libmagic), however, by parsing the shebang.
Oh, so they're executables with a shebang. That's OK we don't need
`file' for that since we have interpreter-mode-alist. Emacs should
already DTRT for them.
Sure. Maybe there's a better way. Maybe libmagic is it. Maybe not.

However, you asked "how often do you see files containing programming
languages without an extension?" The answer is, it's very common, but
the most common case is Unix command scripts with shebangs, which file
handles just as well as Emacs does.
Post by Stefan Monnier
I've had
(setq auto-mode-alist (append auto-mode-alist '(("\\.[^/.]+\\'" ignore t))))
in my .emacs for eons to cover such cases.
Well, maybe it's time to move it from your .emacs to core emacs?
Chong Yidong
2009-08-29 14:21:28 UTC
Permalink
Post by Stephen J. Turnbull
However, you asked "how often do you see files containing programming
languages without an extension?" The answer is, it's very common, but
the most common case is Unix command scripts with shebangs, which file
handles just as well as Emacs does.
I guess the question should be, how often do you see such files that
Emacs can't handle as well as libmagic?

Even in such situations, I think the response should be to improve
Emacs' file handling anyway, since libmagic is not available on all
platforms.
Richard Stallman
2009-08-29 00:46:36 UTC
Permalink
Post by Stefan Monnier
OTOH, how often do you see a file containg programming language code and
yet without ny extension?
Extremely frequently.

Why do these files not identify the language explicitly? It is easy
to do.

The great majority that I see are correctly
identified by file(1) (I believe using libmagic), however, by parsing
the shebang.

That statement suggests that there are exceptions. I would expect
there are, because guessing the programming language is an unreliable
solution.

Emacs uses a reliable solution: users should identify the language
either with the file name, or inside the file with a -*- line or a
local variables list. It takes very little work to make a file say
what its language is, and the result is to identify the language
reliably from then on.

I don't think we should switch from our reliable to solution to
guessing.

Is there a reason why users don't use the existing reliable mechanism?
Is there a real difficulty with using it?
Stephen J. Turnbull
2009-08-29 04:13:13 UTC
Permalink
Post by Richard Stallman
Is there a reason why users don't use the existing reliable mechanism?
Many of the authors don't use Emacs, is my guess.
Stefan Monnier
2009-08-29 15:28:10 UTC
Permalink
Post by Stephen J. Turnbull
Post by Richard Stallman
Is there a reason why users don't use the existing reliable mechanism?
Many of the authors don't use Emacs, is my guess.
You mean they're still using `ed'? Boggles the mind!


Stefan
Stephen J. Turnbull
2009-08-29 16:27:54 UTC
Permalink
Post by Stefan Monnier
Post by Stephen J. Turnbull
Post by Richard Stallman
Is there a reason why users don't use the existing reliable mechanism?
Many of the authors don't use Emacs, is my guess.
You mean they're still using `ed'? Boggles the mind!
Sure, you could have too, and that probably would have been for the
best. But freedom to choose is what GNU's all about, and I'm glad you
had the choice (even though you failed to choose ed(1), the Godfather
of editors).

If-my-tongue-were-any-further-in-cheek-it-would-be-in-Mozambique-ly y'rs
Richard Stallman
2009-08-30 16:01:31 UTC
Permalink
Post by Richard Stallman
Is there a reason why users don't use the existing reliable mechanism?
Many of the authors don't use Emacs, is my guess.

We can't help that, except in the long term. But when an Emacs user
works on these files he can add a -*- line to them.
Juri Linkov
2009-08-28 19:16:30 UTC
Permalink
Post by Stefan Monnier
Post by Juri Linkov
I often wish that files would open in Emacs with correct mode
more often when there is no file extension.
In `auto-mode-alist' you can see that with the exception of
`archive-mode', `doc-view-mode' and `image-mode', all remaining
modes are programming text modes. It would be more useful
to identify file types for these modes that libmagic can't do.
Do you know a library that identifies programming languages?
Such a library might be implemented using a Bayesian classifier
trained on a sufficiently large corpus of different programming
languages.
OTOH, how often do you see a file containg programming language code and
yet without ny extension?
More often with a non-standard extension than without any extension.

Also there are conflicting extensions like e.g. ".pl" for both
Perl and Prolog (esp. SWI-Prolog).
--
Juri Linkov
http://www.jurta.org/emacs/
Stefan Monnier
2009-08-29 01:12:41 UTC
Permalink
Post by Juri Linkov
Also there are conflicting extensions like e.g. ".pl" for both
Perl and Prolog (esp. SWI-Prolog).
That's indeed an interesting case, where content-based mode choice might
make sense. Thanks for reminding me of it.


Stefan
Richard Stallman
2009-08-30 16:01:38 UTC
Permalink
Post by Juri Linkov
Also there are conflicting extensions like e.g. ".pl" for both
Perl and Prolog (esp. SWI-Prolog).
That's indeed an interesting case, where content-based mode choice might
make sense. Thanks for reminding me of it.

I'd prefer a very specific feature for choosing between these two
languages to use of a general mechanism. The specific feature
would probably be more reliable, and people would not be tempted
to use it for other issues where it is not the right approach.

But it would be even better to convince people to use distinct
extensions.
Richard Stallman
2009-08-29 20:20:39 UTC
Permalink
Post by Stefan Monnier
OTOH, how often do you see a file containg programming language code and
yet without ny extension?
More often with a non-standard extension than without any extension.

So why not rename the files, or put in -*- lines?

Also there are conflicting extensions like e.g. ".pl" for both
Perl and Prolog (esp. SWI-Prolog).

Perhaps we should promote .plg for Prolog.
Juri Linkov
2009-08-29 22:48:35 UTC
Permalink
Post by Juri Linkov
Post by Stefan Monnier
OTOH, how often do you see a file containg programming language code and
yet without ny extension?
More often with a non-standard extension than without any extension.
So why not rename the files, or put in -*- lines?
Often this is not possible when files are not under my control.
Post by Juri Linkov
Also there are conflicting extensions like e.g. ".pl" for both
Perl and Prolog (esp. SWI-Prolog).
Perhaps we should promote .plg for Prolog.
I'd rather prefer to promote changing the Perl file extension since
Prolog is older than Perl :) But I think neither is realistic.

Currently I use this hack in .emacs to distinguish between Perl and Prolog:

(add-hook 'find-file-hooks
(lambda ()
(when (and (looking-at "#") (string-match "Prolog" mode-name))
(perl-mode))))

since almost all Perl files begin with a comment, even library files
that have no shebangs.

But I agree such guessing is unreliable.
--
Juri Linkov
http://www.jurta.org/emacs/
Alex Ott
2009-08-28 06:45:05 UTC
Permalink
Hello

N-Gram algorithms is could be used to identify languages - it simpler than
bayes, and requires smaller database
I often wish that files would open in Emacs with correct mode
more often when there is no file extension.
JL> In `auto-mode-alist' you can see that with the exception of
JL> `archive-mode', `doc-view-mode' and `image-mode', all remaining
JL> modes are programming text modes. It would be more useful
JL> to identify file types for these modes that libmagic can't do.
JL> Do you know a library that identifies programming languages?
JL> Such a library might be implemented using a Bayesian classifier
JL> trained on a sufficiently large corpus of different programming
JL> languages.
--
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/ http://xtalk.msk.su/~ott/
http://alexott-ru.blogspot.com/
Alex Ott
2009-08-28 06:46:06 UTC
Permalink
Sorry, I skipped, that this was about programming languages, not real
languages.
I often wish that files would open in Emacs with correct mode
more often when there is no file extension.
JL> In `auto-mode-alist' you can see that with the exception of
JL> `archive-mode', `doc-view-mode' and `image-mode', all remaining
JL> modes are programming text modes. It would be more useful
JL> to identify file types for these modes that libmagic can't do.
JL> Do you know a library that identifies programming languages?
JL> Such a library might be implemented using a Bayesian classifier
JL> trained on a sufficiently large corpus of different programming
JL> languages.
--
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/ http://xtalk.msk.su/~ott/
http://alexott-ru.blogspot.com/
Juri Linkov
2009-08-28 19:08:28 UTC
Permalink
Post by Alex Ott
Post by Alex Ott
Post by Juri Linkov
In `auto-mode-alist' you can see that with the exception of
`archive-mode', `doc-view-mode' and `image-mode', all remaining
modes are programming text modes. It would be more useful
to identify file types for these modes that libmagic can't do.
Do you know a library that identifies programming languages?
Such a library might be implemented using a Bayesian classifier
trained on a sufficiently large corpus of different programming
languages.
N-Gram algorithms is could be used to identify languages - it simpler
than bayes, and requires smaller database
Sorry, I skipped, that this was about programming languages, not real
languages.
It would be interesting to try using N-Gram algorithms for programming
languages and see how well they perform. For example, most frequently
used bigram "/*" belongs to C, most frequently used trigram ";;;" belongs
to Lisp, etc.
--
Juri Linkov
http://www.jurta.org/emacs/
Continue reading on narkive:
Loading...