SPOC-Web Icon, semantic Knowledge Management

Individuen: Dinge, Eigenschaften und Verbindungen

Extending or creating Programming/Markup Languages

Spoc-Text already supports Highlighting of more than 20 Programming and Markup Languages, but You can add Your own Language.

Languages are defined by *.lang Files in the Sub-Folder _Resources/Syntax. These are XML Files with a defined Schema that allows You to define Parsing Grammars based on Regular Expression Text Matching.

Lang Files can also define Styles, but because these are volatile and changed by more People, it is better to create a separate, associated *.CSS File, because the Syntax and Structure is much easier. It also clearly separates the Grammar from the Highlighting Styles. Mind though, that the Styles and Keywords from CSS Files are defined globally and cannot be restricted to a certain Text Section. For that Purpose You need to use <Style> Elements.

Structure of *.lang Files

Lang Files contain multiple <Grammar> Elements that define Sets of <Match> and <Range> Elements.

When parsing a Text only a single <Grammar> is currently active, but a <Range> can contain or refer to a different <Grammar> for its Section. This happens by either nesting a <Grammar> in the <Range> or by referencing it by Name.

<Grammar>s use so-called Regular Expressions for Text-Matching. They may contain these Sub-Elements:

  • <Match>es are single regular Expressions that match the whole Text Content to be styled.
  • <Range>s select a Text Section between two regular Expressions, the start and the stop Expression. These can be specified using either Attributes or Sub-Elements. Expressions for <Range>s are considerably harder to design, because the must be defined very selective to avoid accidental Matches in the Content. Often that Content needs to be parsed by a different, nested <Grammar>.

<Match>es and <Range>s compete when parsing Text. <Range>s are determined first, because they can define Structures that may span the whole File. 'Earlier' (left or above) Matches in the Text have Precedence. When two <Range>s compete, the one defined earlier in the *.lang File wins. 

Similarly <Match>es are determined within these Sections: by Text Position first and then by Position in the Style File (when matching at the same Text Position).

Example: XML.lang

This Section explains the actual XML Grammar used for Syntax-Highlighting all Kinds of XML Files. Of course You can (and should) use Spoc-Text to edit them, because it defines a Highlighting-Scheme that makes it easier to read and Autocomplete-Keywords that help in writing XML. The Sample XML Code You see in this Article was simply copied and pasted into this Web-Page.


<Grammars name="XML" extensions=".xml .xsl .xslt .xsd ... .wsdl .disco .ps1xml .nuspec" xmlns="http://spoc-web.com/xtext/styles/2015"

      folding="XmlFolding" indenting="DefaultIndentation" keep_left_markup="1" keep_right_markup="1">

This Start Tag defines the Language Name, the applicable File-Extensions, Strategies to identify Folding Section and to indent the following Line. The keep_markup Attributes are usually defaulted to 1/true with Programming Languages and to 0/false for Markup Languages, so the Markup will be removed when copying or printing. 

<Grammar> Elements

The first <Grammar> Sub-Element defines the Default Rule-Set. That is the initially active <Grammar> and must either NOT be named or the Name must match the Language Name.

<Grammar name="XML">
It contains several <Range> Definitions for different Sections of an XML File. Each <Range> selects a style and has a begin and an end Attribute containing regular Expressions that detect whether this Range starts or ends. Styles are defined either in this Rules File or, better, in separate CSS Files. 

  <Range style=".Comment" multiLine="1" start="&lt;!--" stop="--&gt;"/>
  <Range style=".CData" multiLine="1" start="&lt;!\[CDATA\[" stop="]]&gt;" />
  <Range style=".DocType" multiLine="1" start="&lt;!DOCTYPE" stop="&gt;" />
  <Range style=".XmlDeclaration" multiLine="1" start="&lt;\?" stop="\?&gt;" />

Imported Grammars

You can also import Grammars from different Files to reuse them or to form a unified Language:

  <Import grammarRef="EntitySet"/>

This adds the Rules for XML/HTML Entities to the Base XML Grammar.


Nested <Grammar> Elements

The next <Range is especially interesting, because it contains a new <Grammar> to parse XML Attributes. This means that the Text between the start and the stop Characters the inner Rules and Ranges are used for parsing and their Styles applied:

 

  <Range style=".XmlTag" multiLine="1" start="&lt;" stop="&gt;" >

   <Grammar name="Tag">
    <Range style=".AttributeValue" multiLine="1" grammarRef="EntitySet" start='"' stop='"|(?=&lt;)' />
    <Range style=".AttributeValue" multiLine="1" grammarRef="EntitySet" start="'" stop="'|(?=&lt;)" />
    <Match style=".AttributeValue">=</Match>
    <Match style=".AttributeName">[\d\w_\-\.]+(?=\s*=)</Match>
   </Grammar>
  </Range>

Key-Words in <Grammar> Elements

Programming Language Grammars often define a Set of built-in Literals, so-called Keywords. These are highlighted with the given Style and presented at the Top of the Completion List when activated. Since they are  used literally, they don't need to be Regex-escaped. They have a very simple Syntax to declare them, including a brief Description that is displayed as Tooltip to support Selection from the List:
 

<KeyWords style=".Standard">
  <Key Word="xml:lang=" content="Universal Attribute to indicate the Language of the (Text-)Content"/>
  <Key Word="xml:space='preserve'" content="preserve|default Universal Attribute to control the Language of the (Text-)Content"/>

... 

</KeyWords>


Keywords are like <Match>es, only they don't use regular Expressions and therefore don't need to be escaped (except for being defined in an XML or CSS File which require their own escaping.

<Style> Elements to define CSS Property Values

As mentioned above, You don't have to separate Styling and Grammar. You can define Styles anywhere in the *.lang File using the supported subset of CSS Properties:


<Style name=".Comment" color="Green" content="&lt;!-- comment --&gt;" />
<Style name=".CData" color="Blue" content="&lt;![CDATA[data]]&gt;" />
<Style name=".XmlTag" color="DarkMagenta" content='&lt;tag attribute="value" /&gt;' />
<Style name=".AttributeName" color="Red" content='&lt;tag attribute="value" /&gt;' />
<Style name=".AttributeValue" color="Blue" content='&lt;tag attribute="value" /&gt;' />
<Style name=".Entity" color="Teal" content="index.aspx?a=1&amp;amp;b=2" />

Styles and Keywords from the CSS File

Styles and Keywords are better defined in CSS Files, because it is much easier to write in CSS than in the *.lang Grammar. Additionally You can specify HTML-Translations there using the target-name Property. The only disadvantage is that such Definitions are global.

 

And that's all about Lang Files; now go and specify the Styling for Your own Language.