Contents

This is a set of notes for a proposition to include in Creole 2.0 (or later) specification a simple markup for formulas. This is not in any way official, and not even propsed yet -- just a work in progress. Feel free to comment or add to it if you think it will make it better.

## Rationale

Wikis are typically used for collecting, exchanging and storing information of all sorts, including in large part scientific or technical information. Such texts are often rich with various symbols, equations and formulas. There is however no standard way to represent them. Some wiki engines offer custom plugins for rendering math formulas as images or embedded MathML, others give you some limited control over your text, so that you can insert symbols and use superscript/subscript. Of course, no common markup for that. It would be nice to have some.

## Separate markup

There are already proposals to extend Creole with markup that gives users more control on the size and positioning of the glyphs, such as markup for superscript and subscript text. One of the rationales for that is that then users can construct crude formulas in a portable way. It's possible to add even more markup, letting them stack glyphs on top of each other, or control their size more accurately. Such additions, however, add to the language making it more complicated and harder to parse and learn and increasing the chances of unexpected results, when the users typed something that wasn't supposed to be interpreted as markup. It also makes converting from and to Creole harder. The markup itself must be thus made characteristic enough as to not conflict with normal text and other parts of Creole markup -- this often makes it elaborate and hard to read and write.

One way to limit the effect of additional markup rules on the whole of Creole is to make it work only within limited scope, where the Creole markup itself doesn't work. This also lets people who don't feel like learning everything about formulas to safely recognize them in text and avoid any modifications in them. They can be moved around, copy-pasted, etc.

Creole 1.0 already reserves double dollar signs `$$ ... $$` for formulas. What is missing is a specification on what goes inside and how it is processed.

## Technical limitations of presentation

Displaying formulas in web browsers is a difficult task. If we want Creole to be a common language for wikis large and small, we can't tax them with such required dependencies as MathML or LaTeX renderer. That's why there should be several ways to display a formula:

- Just display the raw markup inside a
`<code>`tag (?). The markup must be simple enough to translate in their heads anyways, if you want them to be able to edit it without a WYSIWYG editor. This requires almost no overhead and is suitable for the simplest wiki engines. - Substitute the symbol literals with appropriate Unicode characters or (X)HTML entities, use the
`<sub>`and`<sup>`tags to render superscript and subscript parts of the formulas, highlight (colorize) all the markup that cannot be easily rendered. - Render the formula as an image, with the raw markup as the ALT text.
- Render the formula as embedded MathML.
- Render the formula with CSS and/or JavaScript.

The developers might also choose to use different renderings depending on what the formula actually contains and/or where it is placed.

## The markup

There is one common standard for representing formulas with plain text that is widespread and known, at least in part, to many scientists. It's the LaTeX typesetting language. It's commonly used by the wiki engines that do support formulas, and by external services such as MathTran. Therefore it seems wise to use some small subset of the LaTeX markup, picking the parts that are commonly used, easy to parse and possible to render.

In particular, following markup should be supported:

- Greek letters, in the form of
`\alpha`or`\Gamma` - Operators and math symbols such as
`\times`and`\rightarrow` `^{Superscript`} and`_{subscript`}- Common symbols such as
`+`,`-`would get subtituted to apropriate unicode characters and get correct spacing for operators (for example, dash-minus would be converted to Unicode minus glyph). - Accents, such as
`\ddot`or`\hat` - more (?)

I have to admit I don't know much about TeX. At the moment, I'm still trying to read up on it. So, first question: Are we talking about TeX or LaTeX? You're talking about LaTeX, but your example (MathTran) seems to use Plain TeX. Btw: Plain TeX Reference Card might be helpful. -- MarioLenz 2009-04-17 19:11:15

The language is called TeX, and it's a very flexible, very complicated Turing-complete programming language for defining how the page should be laid out. The language mostly consists of different macros. LaTeX, on the other hand, is a library of ready macros designed for typesetting documents such as letters, articles, books, etc.

At the level of formulas you are mostly using the basic TeX syntax -- there are few LaTeX macros that are useful, so it makes little difference. Of course, the problem with supporting full TeX is that only TeX can parse TeX -- in other words, you need a working TeX interpreter to be able to interpret arbitrary TeX documents. This is a little inconvenient, especially considering how an average install of TeX suite is around 200-300MB, including all the fonts, symbols, libraries, internationalization packages and such. It is, however, possible to select a small subset of TeX that is useful in formatting formulas, and use that -- without the need to allow defining macros and Turing-completeness. MimeTeX is one such project, for example. It has lower quality than TeX (which is about the best quality that is possible nowadays, only recently being chased by DTP software like Indesign), but is useful and easier to grasp.

A great advantage of using a subset of TeX is that there are many ready solutions, from picture renderers like mathtran and mimtex, through mathml converters, to client-side solutions like jsmath.

-- RadomirDopieralski 2009-04-17 20:01:01

My three cents:

- You don't need a "full installation". There are "minimal" installations of TeX; if anyone is interested, I'll find the link.
- Even with formulas you usually use some LaTeX constructs, most notably \frac.
- Apart from LaTeX, there are other packages built on top of TeX, for example ConTeXt. Currently it uses LuaTeX, and it supports (among others) so-called "calculator math", which is another way of math markup (you can read about it here: http://www.pragma-ade.nl/general/manuals/mk.pdf, chapter IV).

I don't think wiki engine developers should be forced to use (La)TeX. What if they don't want to, but intend to convert to MathML? Defining a TeX subset to be supported would make this probably easier. Btw: \frac seems better than the standard TeX way, so why not use it? I mean: we're talking about standardizing this stuff. No one forces us to use only TeX. -- MarioLenz 2009-04-19 10:01:29

You wouldn't install the mathtran parser or something similar, would you? I think it might be helpful to see what we're talking about and not only the TeX markup, e.g. `\oint`. -- MarioLenz 2009-04-18 12:58:26

I think a good attempt to standardizing is looking finding out what is necessary, finding out what is being used already, and picking the common part of it all. We could start with a /list of existing formula solutions, then a list of what they support -- the intersection should be a fine starting point... Of course that's a lot of work, but fortunately it doesn't have to be done all at once and we have lots of time. -- RadomirDopieralski 2009-04-19 10:37:03

Have you had a closer look at the /list of existing formula solutions? I've started but then there was just too much other stuff I had to deal with :-/ Besides, I'm not really an expert when it comes to markups for formula. I think MathML is too complex, although it would have the advantage of being standardized. All other solutions seem to boil down to "some" subset of (AMS)TeX/(AMS)LaTeX. I'd prefer some standard... ASCIIMathML syntax "correspond[s] to a wellspecified subset of Presentation MathML and behave[s] in a predictable way", but wellspecified isn't a standard. However, at the moment I think this is the most promising solution. -- MarioLenz 2009-06-17 19:40:19

Done :)

## TeX subset to be supported

Please comment: