## Repeatable, Reproducible [and Replictable]

There appears to be great confusion in the scientific and social sciences communities on the meaning of words related to certain aspects of the scientific method. The ArXiv paper by Lorena Barba “Terminologies for Reproducible Research” highlights the confused state that has appeared over the last 20 years. The words in question include:

Repeatability
Reproducible
Replication

I will dispense with replication simply because it’s too hard to say quickly (especially replicability) but see below for a more serious reason. The contention appears in the meaning of reproducibility or to reproduce. As Barba points out there are at least three ‘camps’ in this community which she labels, A, B1, and B2. The A camp makes no distinction, so we’ll forget about those. To describe B1 and B2 we must look at two extreme scenarios with respect to an experiment:

1. An experiment is carried out and is done again by the same author, using the same equipment, same methods, basically the same everything.

2. The experiment is carried out by a third-party using different equipment, different methods, etc. Basically, everything is different

In between these two extremes are variants, For example, the third-party could use the same methods but implement them independently of the original author, usually by reading the description given in the original published paper.

Given these descriptions, the B1 group calls the first scenario, ‘to reproduce the experiment’ while the second group, B2, calls the first scenario, ‘to replicate the experiment’ and there lies the contention.

Personally, I don’t like either of these terms as used here. As I mentioned before, replicability is a hard word to say. But not only that, from a dictionary perspective, it means the same thing as reproducibility. The Oxford English Dictionary describes replicability as “The quality of being able to be exactly copied or reproduced.” So why use two words, for two quite different things, where the two words have essentially the meaning?

My personal choice are the following two words, Rather than use the word replicability which seems redundant, I choose repeatability, hence:

Repeatability: means ‘to repeat the experiment again’, the word implies that the experiment was done exactly as before – Scenario 1

Reproducibility: means: ‘to recreate the experiment anew; reproduce implies creating a new thing, independently of the old – Scenario 2.

Of course one can get much more fine-grained, especially when it comes to computational experiments. But the fine graining can be included as levels within the class reproducibility.

Other than a change in wording from replicability to reproducibility, I appear to belong to camp B2. I should list others in camp B2, these include FASEB, NIST, 6 sigma, ACM, and Wikipedia and this Wikpedia page and The Physiome Project. I am sure there are others. For example, ACM writes:

Repeatability (Same team, same experimental setup)

The measurement can be obtained with stated precision by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation.

Reproducibility (Different team, different experimental setup)

The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently.

Essentially the same definitions I gave above.

The National Academies recently studied this issue closely and is soon coming out with a report. It is apparently in favor of the more confusing option.

## How to do a simple parameter scan using Tellurium

A common task in modeling is to see how a parameter influences a model’s dynamics. For example, consider a simple two reaction pathway:

-> S1 ->

where the first reaction has a fixed input of vo and the second reaction a first-order rate laws k1*S1. The task is to investigate how the time course of S1 is influenced by vo.

The script below defines the model, then changes vo in increments and plots the effect on the pathway via a time course simulation. To do the parameter scan we exploit plotArray. This initially prevents the plots from being shown using show=False. We make sure that each plot gets a different color using resetColorCycle=False, finally, we show the plot using show(). To make things more interesting we also add a legend entry for each plot.

Note we call reset each time we run a simulation to ensure that S1 is reset back to its initial condition.

The following figure shows the resulting plot:

Thanks to Kiri Choi for pointing out how to use plotArray in this way.

## How to plot a grid of phase plots using Tellurium

Let’s say we have a chemical network model where the species oscillate and you’d like to plot every combination of these on a grid. If so, then this code might be of help.

This code will generate:

Here is an alternative that has removed the internal ticks and labels and arranges the column and row labels as contiguous. This is probably more what one might expect. This version plots all combinations including the transpose combinations.

The following shows the steady state oscillations. This was done by simulating the model twice and using the plotting the second set of simulation results.

## A look at the Euler’s number: e

I’ve never particularly liked the way , Euler’s number, is introduced in textbooks. Most approaches give me a very limited intuitive feel for what actually is. Modern textbooks appear to use one of four common ways to introduce e, and only one of them gives me a semblance of an intuitive feel for . These approaches include computing compound interest (I think this is one of the worst), integrating and noting that the area from 1 to the value of e is one (amazing but so what?), looking at the slope of at , the slope is one when (yes ok, and…) and finally the one I like is starting with and finding the power series solution and then using this to define .

Before continuing let’s state what e is numerically equal to:

e = 2.71828….

For normal algebra, all we need are addition, and multiplication (subtraction and division are just alternative forms of these). For convenience, we also introduce a power notation such as and , but these are just a short-hand notation for doing lots of multiplications at once. With the introduction of trigonometry which involves relationships between sides and angles of triangles, the basic algebra becomes cumbersome because it involves the use of infinite series. Rather than writing the series down all the time, we define short-cut names such as sine, cosine etc. Calculus brings us another special type of series which involves the solution to differential equations. The short-cut notation for this series is .

Note that in all these cases we are still only doing addition and multiplication. The functions sine and are just short-hand for particular combinations of addition and multiplication that happen to be in the form of an infinite series.

Given that pops up in the form of when solving differential equations, I think this is the place to start. Let’s consider the simplest possible non-trivial differential equation:

This equation is saying something quite interesting, that the derivative of is the same as . What this means is that if we were to find a solution to this differential equation, the solution would have to also equal . We can express the above equation in the form:

this perhaps makes it more obvious that the derivative is the same as the function . Can we find an actual function that is like this? The way to find this answer is to define as a general power series:

We now differentiate this with respect to to give:

To find the function that also equals we need to discover the values for the coefficients, , etc. To make it easier, let’s decide that when , the value of is one, i.e: . This will mean that . What we’ll now do is match up the pairs of terms in and , that is set: and so on.

This allows us to state that: , , , and so on. Working back by subsituting into the equation and into the equation, and so on leads to the result that:

We therefore conclude that the equation, that satisfies is:

You can easily check by differentiating , you’ll get again. The solution to must therefore also be:

Not to labor the point too much, but differentiate and you’ll get the equivalence .

Let us define the value of the series when to be:

For convenience we will call this value :

The next question is, if then what does equal?

I’m going to make a jump here and propose that . From first principles, we can work out the derivative of and remarkably it is ! You’ll find a proof at the excellent Paul’s Online Math Notes

Let’s now use the Maclaurin series to find an approximation to . The Maclaurin series is a Taylor series centered on zero, that is:

Since the derivative of is , as well as the third, fourth fifth derivatives etc and noting that at is one then we can write:

But this is just the solution to , — see equation (1) — therefore we conclude that:

It is also worth noting that the derivative for the general exponential is given by:

In other words the derivative of is the special case where . One reason why is special is that its the purest exponential in the sense that the derivative has no scaling factor. It’s the canonical exponential, the simplest possible. It’s the only exponential function where the function and its derivative are identical.

What else can we say about ? What about its rate of increase? We know that the rate of increase is itself, , but how fast is that? One way to look at this is to derive the relative increase in . The relative growth rate is defined by:

Given this definition, let’s look first at how fast is increasing by computing

If we now do the same for we find:

In other words is the only exponential function that increases at a rate equal to . For example if , it means that a 1% increase in leads to a 1% increase in while if then a 1% increase in leads to a 10% increase in . This is the characteristic of exponential functions, they increase at an ever increasing rate. Contrast this with a power term such as . If we compute the relative increase for we find it has a fixed increase of 2:

The growth of a bacterial colony follows this pattern, where the growth in the colony is a fixed percentage over time.

## Smallest Chemical Reactions Systems that is Bistable

A while back Thomas Wilhelm, published a paper that described the smallest chemical network that could display bistability. The paper that describes this result is:

Wilhelm, T. (2009). The smallest chemical reaction system with bistability. BMC systems biology, 3(1), 90.

This is a diagram of the network generated using pathwayDesigner:

Here is a Tellurium script that uses Antimony to define the model (Note that $P means that species P is fixed). The S term in the first reaction is supposed to represent an input signal. Using the auto200 extension to roadrunner we can plot the bifurcation diagram for this system as a function of the signal S. If S is below 0.8 only one stable steady state exists and the values for X and Y are both zero. Above S = 0.8 we see three steady states emerge. For the system where it shows three steady states, one is at zero concentration, and is stable but not shown in the plot. The other two are marked by the horizontal line roughly at 1.5 on the y-axis and is unstable. The other is represented by the line that moves up from the turning point at about x = 0.8. This steady state is stable. The unstable branch appears to asymptotically approach a limiting value at high S values, 1.5 for X and approximately 0.00028 for Y. The paper also describes what happens when we add a fixed input flux to X at a rate of 0.6. This can be simply done by adding the line J5: ->X; 0.6; to the model. This change results in a more classical look for the bifurcation plot as shown below: The Tellurium script for generating these bifurcation plots is shown below: Posted in Modeling, Pathways, Python, SBML, Software, Systems Theory, Tellurium | Leave a comment ## Smallest Chemical Reaction System with Hopf Bifurcation A while back Wilhelm, and Heinrich published a paper that described the smallest chemical network that could display a Hopf bifurcation. That is, the chemical species oscillated. The paper that describes this result is: Wilhelm, Thomas, and Reinhart Heinrich. “Smallest chemical reaction system with Hopf bifurcation.” Journal of mathematical chemistry 17.1 (1995): 1-14. This is a diagram of the network taken from their paper: Here is a Tellurium script that uses Antimony to define the model (Note that$A means that species A is fixed):

The paper gives parameter values that result in oscillations but I wanted to find some other parameter sets. One way to do this is to load the model into the SBW slider control. By changing the sliders one can observe the dynamics. Here I used pathwaydesigner (pathwaydesigner.org) to do the same thing. I first exported the model as SBML using:

I loaded the saved SBML file into pathwayDesigner and used the autolayout plugin to get the following network:

I started the slider plugin and varied the sliders until I saw oscillations. In fact, it didn’t take much to get oscillations, all I had to do was increase the value of k1:

I copied the new value of k1 to the Tellurium model (see above) and got the following output:

## C Based Reduce Row Echelon Code

I recently needed some code to compute the reduced row echelon of a matrix. Applications such as Matlab, Mathematics, sympy and R support this functionality out of the box. Libraries such as LAPACK do not, including the linear algebra package in Python. The linear algebra package in Python has some surprising omissions which appear to be by design.

Here I include a C based function that will compute the reduced row echelon. It uses partial pivoting to prevent numerical stability but I don’t know how it would fare when confronted with a large matrix (i.e > 500 rows or columns).

It uses a simple matrix type called TMATRIX. It shouldn’t be difficult to modify code if you use a different matrix structure.

To call rref use:

The 1e-9 argument is the setting that decides whether a value is zero or not. Numbers below tolerance are considered zeros.

The output matrix givne the input should be:

## Plotting Bar graph of Species Concentrations in Tellurium

I had a model with 27 flaoting species and I wanted to plot the steady state concentrations on a histogram where the labels were the names of the different species. Here is a general purpose script that will do that:

The function has a default width and height for the resulting plot. The default widens the usual size so that the labels don’t collide. The first argument is the libroadrunner instance.  To given a contrived example, the following is a model of 8 floating species that form a linear chain governed by simple irreversible mass-action kinetics.

The resulting plot is shown below:

A summer student working in my lab, Ming Hong Lui from Hong Kong University (HKUST), worked on the perturbation analysis of signaling cascades and in his writeup he use TikZ to draw a nice cascade diagram which I present here.

This code draw the following figure:

## Another Inhibition Pathway Diagram using TikZ

Here is another pathway diagram I needed to draw using TikZ. IN this case I needed an inhibited step. This was more tricky because I needed the inhibition line to point midway to a reaction but without touching the reaction itsef.

To solve this I had to ask stackoverflow and within 2 hours I had an answer. The linear portion of the pathway is straightforward:

The question was how to draw the inhibition line from inhibitior I to reaction v2. The trick is to use the tikz calc extension package which can be used to do more more advanced coordinate calculations.

\usetikzlibrary{calc}

The line to add that that will draw the vertical inhibition line is:

\draw[|-,line width=1.2pt] ([yshift=4pt]$(S)!.5!(P)$) –++(0,1.5cm)node[above]{$I$};

Let’s decompose this. The first two arguments in the \draw command [[|-,line width=1.2pt] simply set the arrow style (blunt end) and the line width. The second part specifies the coordinates of the line itself and the remainder node[above]{$I$}; indicates what text to draw and where to draw it relative to the line. The real meat is in the drawing line section, that is:

([yshift=4pt]$(S)!.5!(P)$) –++(0,1.5cm)

The basic syntax for a line coordinate is (x1,y1) — (x2,y2). In the example there is a modification to this where the coordinate is specified as (x1,y1) — ++(x2,y2). The ++ means that the coordinate at x2,y2 is computed relative to x1,y1, that is the coordinate at x2,y2 is actually (x1,y1)+(x2,y2). The (0,1.5cm) then means that the coordinate for the end point has the same x coordinate but the y coordinate is displaced upwards by 1.5cm. Note that the tikz y coordinate is like a normal graph plotting axis.

The trickest bit is computing the starting point for the vertical line. Note that this has to be mid point between nodes S and P.  This requires some coordinate calculations. The x,y coordinate is specified by $(S)!.5!(P)$). The bit in front, ([yshift=4pt], just moves the computed y coordinates up 4pts.

The  text inside the  means that we are doing a coordinate calculation,ie $….$ represents a math calculation. (S) and (P) represent the coordinates of the nodes S and P respectively. The important bit is !.5!. The explanation point is a pathway modifier and can be put between two coordinates.

(S)!.5!(P) means compute the coordinate that is half way between (S) and (P).

The last thing to add is that I nocticed that the blunt end was a bit too sort. To widen the blunt end I used the arrow.meta extension package. This allows one to modify the size of the arrow heads, in this case I used the following to widen the blunt end to 5mm.

{|[width=5mm]}-