File Format For JDesigner

I’m in the middle of redesigning JDesigner’s file format. I’ve been using SBML since its inception in 2000 and JDesigner uses SBML as its native format by using the annotation mechanism to store all the layout and rendering information plus other metadata relevant to JDesigner. Given that SBML evolves, JDesigner also tries to support the different levels and has to adapt as new requirements on the standard are published. Since JDesigner was developed, SBML as also acquired a library to help developers, libSBML. However this came out some years after I had rolled my own SBML parser (One claim to fame I can make is that I wrote the very first SBML parser, it was used in an early version of JDesigner and Gepasi used it to import SBML models). I never had the energy to move over to libSBML particularly since I would have to incorporate all the other metadata I had. As they say if it ain’t broken don’t fix it. The problem however is that the SBML code in JDesigner is complex, untidy and requires updating in order to keep up with the standard. With the new SBML extensions coming out the prospect will be hopeless to maintain any semblance of compliance. In fact there is the danger that one will end up spending most of the development time trying to keep up with SBML and not adding any new functionality to the application. It’s time for a change.

I’ve been looking around for formats that I could replace SBML with I came up with the following list:

XML
JSON
YAML
TOML

Of these XML is already used to encode SBML. I’ve never particularly liked XML, its too verbose, too big of a technology for a simple file format, and not that easy to parse, although modern software libraries makes this easier. So XML was out. Here a sample I borrowed from Wikipedia, Angle brackets is its defining visual trait:

<person>
  <firstName>John</firstName>
  <lastName>Smith</lastName>
  <age>25</age>
  <address>
    <streetAddress>21 2nd Street</streetAddress>
    <city>New York</city>
    <state>NY</state>
    <postalCode>10021</postalCode>
  </address>
  <phoneNumbers>
    <phoneNumber type="home">212 555-1234</phoneNumber>
    <phoneNumber type="fax">646 555-4567</phoneNumber>
  </phoneNumbers>
  <gender>
    <type>male</type>
  </gender>
</person>

In fact the apparent failings of XML has resulted in a number of new formats being introduced in recent years, these include:

JSON

JSON is a popular format that was conceived from the Javascript world. I’ve written a reasonably large application (EasyGraph)  that used JSON as the file format. It’s certainly not difficult to parse but visually I just couldn’t’ handle all the quotation marks it uses, this example was taken from Wikipedia.

{
    "firstName": "John",
    "lastName": "Smith",
    "isAlive": true,
    "age": 25,
    "height_cm": 167.64,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": "10021-3100"
    },
    "phoneNumbers": [
        { "type": "home", "number": "212 555-1234" },
        { "type": "office",  "number": "646 555-4567" }
    ]
}

YAML

YAML is another new format to emerge for those disappointed in XML. This time we have a Python flavor and like Python, white space matters. How much you indent means something. This idea has never really appealed to me and only recently have I got used to indenting in Python. Here again is a sample from Wikipedia:

---
  firstName:  John
  lastName:  Smith
  age: 25
  address: 
        streetAddress: 21 2nd Street
        city: New York
        state: NY
        postalCode: 10021

  phoneNumber: 
        -  
            type: home
            number: 212 555-1234
        -  
            type: fax
            number: 646 555-4567
  gender: 
        type: male

Of the three YAML is clearly the easiest to read and ease of parsing will depend on the library one uses.

TOML

Finally is is work mentioning TOML, not sure if this is begin developed anymore but its a simple looking format that resembles Windows ini files, making it easy to parse. Here is a example from the TOML Github page:

# This is a TOML document. Boom.

title = "TOML Example"

[owner]
name = "Tom Preston-Werner"
organization = "GitHub"
bio = "GitHub Cofounder & CEO\nLikes tater tots and beer."
dob = 1979-05-27T07:32:00Z # First class dates? Why not?

[database]
server = "192.168.1.1"
ports = [ 8001, 8001, 8002 ]
connection_max = 5000
enabled = true

[servers]

  # You can indent as you please. Tabs or spaces. TOML don't care.
  [servers.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [servers.beta]
  ip = "10.0.0.2"
  dc = "eqdc10"

[clients]
data = [ ["gamma", "delta"], [1, 2] ]

# Line breaks are OK when inside arrays
hosts = [
  "alpha",
  "omega"
]

I am not sure where TOML is going and seems to be stuck on version 0.2. Of the four possibilities, TOML appeals to me the most. So what am I going to choose, turns out none of them. At the end of the day, what I need is full control over my native format. Some might say, yes but if you use JSON, other people can read your files, good point but in almost 15 years no one has (Apart from Frank Bergmann who can read the format for his viewer). So that argument doesn’t really apply.

After much experimentation I’ve come up with the following syntax, it’s nothing revolutionary and I am sure someone has such as format somewhere and in any case its quite similar to JSON but without all the quotation marks. I will call the format Quintus, named after Quintus Sertorius a Roman statesman and general (126 BC – 72 BC).

The basic unit of Quintus is the element

[tag ....]

The …. can be replaced either by one or more elements and/or one or more properties. A property is simply a value assigned to a variable, for example dove = “bird”. Currently I only have need for strings, booleans, integers and doubles as values. For example:

name = "David"
isBlue = true
numberOfPoints = 100
stepSize = 0.5

White space is irrelevant other than for separating things, indent, carriage returns etc don’t matter, they are all treated as if they were space characters. And that’s about it.

Here is the same example used above but expressed in Quintus format:

[person
    firstName = "John"
    lastName = "Smith"
    isAlive = true
    age := 25
    height_cm = 167.64
    [address
       streetAddress = "21 2nd Street"
       city = "New York"
       state = "NY"
       postalCode = "10021-3100"
    ]
    [phoneNumbers
       [phonenumber 
         type = "home"
         number = "212 555-1234" 
       ]
       [phonenumber 
         type = "office"
         number = "646 555-4567"
       ]
    ]
}

In EBNF the format is defined by:

value ::= string | boolean | integer | double
property ::= name "=" value
content ::= property | element
element ::= "[" name { content } "]"
rootDocument ::= element
This entry was posted in Programming, SBML, Software. Bookmark the permalink.

Leave a Reply

Your email address will not be published.