Friday, November 30, 2012

Using GitHub For JavaScript and VBA Code Examples

From now on, I'm going to use GitHub to share, version control, and store code examples.  I'll continue to place small code snippets in the blog text but will add GitHub links for larger code examples.

My motivation for doing this is as follows:
  • It forces me to learn the Git revision control system.
  • I can make working code more accessible.
  • The blog text can concentrate on explaining code.

Click here to view my GitHub account.

GitHub Contents as of Today
Currently, there is a single repository in my GitHub area. It is called VBA and it contains two class files.  One implements a simple Dictionary-type object using collections.  I wrote this because the VBA Dictionary object is not available on the Mac version of Excel, anyone running Excel on Windows will not need this class.
The second class file implements a DNA-to-protein sequence translator.  The only interesting aspect of this class is that it can deal with DNA sequence that contains IUPAC DNA nucleotide ambiguity codes.  It therefore can report amino acid mixtures for polymorphic sites so is useful for processing sequences from viruses such as HIV and Hepatitis C.
Regarding IUPAC codes and translation tables, see these links for more information:

In future blog entries, I will re-write the DNA translation code in JavaScript and will add it to a JavaScript repository in my GitHub area.  Comparing and contrasting the VBA and JavaScript implementations should be informative.

The code is heavily commented, so if you are interested, take a look in GitHub!

The VBA code assumes some knowledge of genetics/microbiology but I chose it because it is a problem domain that I am familiar with and dealing with IUPAC ambiguity codes complicates matters considerably.

DNA Translation - An Aside
  • Each strand of the iconic double helix is composed of nucleotides (also called bases) denoted by the single-letter abbreviations A, C, G, and T.
  • DNA is translated into protein via intermediate RNA but that fact can be ignored here without affecting the results.
  • The DNA code is read in sets of three that correspond to codons in the intermediate RNA
  • The input DNA is said to be "in-frame" if it begins with a triplet that corresponds to a codon.
  • The DNA in the code examples is in-frame, try removing the initial character and then do the translation!  The output will contain multiple asterisk characters (*), a sure sign that the translation is not in-frame.  Removing another leading nucleotide will return another out-of-frame translation but removing a third will put the sequence back in-frame because the first three nucleotides will correspond to a codon.
  • Translation of DNA is more complicated if the input DNA contains IUPAC ambiguity codes that denote the presence of a nucleotide mixture at that location, e.g. R denotes an A and G mixture.

For example, consider this VBA code:

Sub testAmbig()
    Dim dnaSeq As String: dnaSeq = "CCY-CAG-RTC-ACT-CTT-TGG-CAA"
    Dim transl As clsTranslateDna: Set transl = New clsTranslateDna
    dnaSeq = Replace(dnaSeq, "-", "") 'remove the "-" from the DNA sequence
    Debug.Print Join(transl.AminoAcidsForDNA(dnaSeq, True), " ") ' prints "P Q IV T L W Q"
End Sub

  • The two ambiguity codes are highlighted in bold and the DNA is broken into in-frame triplets by the "-" character which is removed before translation.
  • The space-separated translated output shows the effect of the ambiguity codes. For the triplet with the "Y" a single P is given because Y codes for C or T so the triplet is a mix of CCT or CCC but both of these encode P (Proline).  The genetic code is said to be "degenerate", i.e. there is in-built redundancy especially with the third nucleotide.  
  • However, for the R in the second triplet the two possible triplets are ATC or GTC, ATC encodes I (Isoleucine) while GTC encodes V (Valine) so the output is given as IV.

VBA Code Notes
The two VBA classes given provide concrete examples of doing object-oriented (OO) programming in the language.  The classes have methods, constructors, and properties so they may be of interest to anyone embarking on OO in VBA.  The DNA translation class is a re-write of some old procedural code I wrote a long time ago.  The dictionary implementation is very simple but does show usage of the oft neglected VBA Collections class.

Finally, the VBA code referred to here should run in any application that hosts VBA.  It was developed in Excel but it should run equally well in MS Word, MS Access etc.  When I first wrote VBA, I used to mix GUI code and code that accessed the application classes with the business logic code.  Doing so makes code re-use difficult or impossible. The translator class given here could be used directly in Excel as a user-defined function though the call might require a short calling function.

This lesson also applies to Google App scripting with JavaScript.  By separating GUI, and Spreadsheet object code from business logic, there is a good chance that we can write JavaScript functions that can be run in the browser, in Google spreadsheets, or even on the server using Node.js.  Take the VBA translator code for example, I will re-write it in JavaScript so that it can be executed in the browser, in Google spreadsheets (even as a user-defined function), or from Node.js!  JavaScript is now ubiquitous so knowledge gained in one setting can be applied to others.  The days when JavaScript was a derided browser-only language are long gone!

Thursday, November 29, 2012

Creating a GUI Application (Part 2)

Note Added 2013-06-02: The GUI Builder is being phased out, Html Service is now the preferred GUI tool, see blog entry on this.
Creating A web Service
  • The mortgage app from part 1 can also be run as a web service and to do so requires only a few more steps but no changes to the GUI itself.
  • The important point here is that the web service, although it was created in a Google Spreadsheet, runs completely independently of the spreadsheet and could just as easily and effectively been created from a Google Site.
The code in “calculate()” remains the same but the following code replaces “onOpen()” and “showGui()” in the spreadsheet version:

 "use strict;"
function doGet() {
  var app = UiApp.createApplication();
  var guiName = 'MortgageGui';
  return app;

To deploy as a web service:
  • There needs to be a function named “doGet()” as above that returns the app.
  • In the script editor, you need to select File->Manage Versions....  Enter some something and save the version.
  • Then, once again in the script editor, select Publish->Deploy as web app...
  • Copy the URL that is generated and paste it into the browser to view the web page where an ugly but functional form will appear.
  • The URL I created is here:
You will need to authorize it.
The previous blog entry showed a screen shot that displayed the web application so it has not been re-shown here.

The example given here, though very simple and not visually pleasing, does hint at the power of this approach to deliver sophisticated applications over the web.  Here the form just takes some input, does some calculations on it, and then displays the results on the form.  But these types of applications could be hooked up to back-end persistent data repositories such spreadsheets or cloud MySQL databases to enter, retrieve, or analyze data.
This examples requires code version management before the app can be published.  Code versioning is a very good thing and is often neglected by Excel VBA developers.

Friday, November 23, 2012

Creating a GUI Application (Part 1)

Note Added 2013-06-02: The GUI Builder is being phased out, Html Service is now the preferred GUI tool, see blog entry on this.
  • Use the Google GUI Builder tool and JavaScript to build a web application that implements the functionality of a well-known single page application that does mortgage calculations.
  • Demonstrate how input values are retrieved from GUI textboxes and how JavaScript-calculated values are written back to other textboxes.
  • Highlight a few “gotchas”.
  • Show how the application deals with bad input.
  • Show the differences between deploying the application as a web application as against a spreadsheet-bound one (next blog entry).

An earlier blog post covered the basics of adding buttons and menus to Google Spreadsheets in order to improve the user experience.  However, the Google web toolkit allows us to do better and provides a GUI builder.  We can use this GUI builder to quickly knock up an application which can be run either:
1: From the containing Google spreadsheet where we can use a menu item to display the application.  This approach is closely analogous to how we can build a GUI form in Excel VBA and display it using a menu.
2: As a web service where the application can be built from either Google Sites or Google Spreadsheets.  The important difference here is that the application can run entirely independently of the spreadsheet. (The next blog entry)

If you’ve used VB/VBA to build GUIs, you will appreciate how convenient these tools are, especially if you have also tried something like Tkinter in Python as I have.  I found Tkinter usable but very frustrating and felt very much spoiled by the VB drag-and-drop approach  However, desktop applications are not nearly as important as they once were and the web is undoubtedly now the dominant platform for delivering functionality to users.

Here is the link to the example web page whose functionality we would like to mimic, here’s a screenshot of the form:

To use, you simply enter the figures in the four textboxes in the top part of the form and the results are posted to the three lower textboxes.  The link above gives the actual JavaScript code that does the calculations.

The Google web service application is here (To view, you'll be asked to authorize it):
and looks like this:
It wins no design awards but it does implement the basic functionality of the web page and it was easy create with the GUI builder.

To build the above, start with a Google Spreadsheet and in its Script Editor select menu item File->Build a user interface

The GUI shown above is composed of four Vertical Panels with the two on the left containing labels only and the two on the right containing textboxes only.

At the top, above the panels, there is one long label with the text “MORTGAGE MINDER” and the bottom there is a single button that when clicked calls code to do the mortgage calculations and writes the output to the textboxes in the bottom right panel.

The application now has two components:

  1. The JavaScript code
  2. The GUI

Linking the two together is not difficult but there are a few traps for the unwary:
1: To retrieve input values from GUI textboxes, you need to give the textboxes names!  The ID and the name are not the same, see this screenshot:

2: To link a callback to button action where the callback needs to be able to read input values in a textbox, the GUI builder needs to be filled in as in this screenshot:

Under the Mouse Click event for the button labelled calculate select the function named “calculate”, this function had already been written in the Script Editor and the GUI Builder is clever enough to load all available functions into the drop-down list.  To the right of the box above containing “calculate”, there is a +/- toggle.  To enable the callback to be able to access textbox input values, each of the input textbox names must be entered here as a comma-separated list.  For this application, although not visible because the box is not expandable, there are four, they are txtHousePrice,txtDownPayment,txtAnnualInterest,txtTerm. These are the names of the four input textboxes on the top-right panel in the GUI.

The Callback Code

Notes on Callback Function “calculate()”

  • Line number 19 gets an instance of the active application
  • Lines 20 and 21 set the text and color of the top-most label.  Code further down will set res-set the color to red and display a label indicating if the user input is “bad” so these lines just ensure that after each click of the button “calculate” that the label is re-set to its defaults.
  • Lines 22-25 retrieve the textbox user inputs.  To do this, the name of the textbox is required as described above.  The call to parseFloat() also includes the second radix parameter to ensure input numbers are interpreted as base 10.
  • Lines 26-29 use the “isNaN()” function to check that the four calls to parseFloat() returned numbers, if not the top-most label color is set to red and the text “BAD INPUT” is displayed.  The GUI does not disappear, however, so the user can fix the input and click the “calculate” button again.
  • Lines 30-33 do the calculations and assign the values to variables.
  • Lines 35-37 assign the variable values to the output textboxes.
  • Line 39 can be omitted and no error is generated.  However, the output will not appear in the GUI so forgetting to include it means the application will not work.

Displaying The GUI From a Spreadsheet
The allows the GUI to be displayed from a spreadsheet and when the spreadsheet is opened a menu is added that has one item that, when selected, loads the GUI.

GUI Display Code

Code Notes

  • Line 1: ECMAScript5 includes a strict mode that is highly recommended, I now always use.  To learn why it’s a good idea, consult this authoritative source (Nicholas Zakas).
  • The “onOpen()” function was discussed in a previous post and allows us to add a menu item dynamically when the spreadsheet is opened.
  • The single menu item is used to call the “showGui()” function.
  • Line 12 above is where the form created using the GUI builder is loaded into the application.

This GUI runs but is tied to a spreadsheet as shown below. The code itself in the "calculate()" function is very simple and the calculations could of course have been done in the spreadsheet itself. However, in the next blog entry, I will demonstrate how this spreadsheet application can be very easily turned into a web application that is entirely independent of the spreadsheet.

The application looks like this when loaded from the spreadsheet:
The added GUI menu is just visible on the right-hand side. 

(Drop me an email if you'd like the full spreadsheet from which this was built!)

Tuesday, November 13, 2012

Enhancing Statistical Capabilities with User-Defined Functions

The code presented here was originally written in Excel VBA to calculate confidence intervals for sample means.  Although both Excel and Google spreadsheets have built-in functions to do this, these functions make certain assumptions about the underlying data distribution that may not always be valid so a well known alternative method was implemented in both JavaScript and VBA to do this.

Main Points
  • Use of bootstrapping to derive the confidence interval for the sample mean
  • User-defined JavaScript function to perform this task
  • Extensive use of JavaScript arrays and array methods to implement the bootstrapping function
  • Highlight some potential traps when copying and pasting JavaScript code from the internet
  • Present Excel VBA code to compare and contrast with the JavaScript version

Note: Use statistical packages like R for serious statistics and make sure you understand the tests you are using and the assumptions such tests make.  If possible, consult with a statistician at all stages from experimental design to data analysis.

Confidence intervals are widely used in statistics for evaluating the precision of a parameter estimated by sampling a population.  A common example is the confidence interval of a sample mean and a commonly reported confidence interval is 95%.  A 95% confidence interval of the sample mean tells us that if we were to repeatedly calculate sample means for the population we would expect our true population mean to be included within the confidence interval 95% of the time.  Confidence intervals of the mean can be calculated by both Excel and the Google spreadsheet application by using the CONFIDENCE function, see the Microsoft documentation for this function.  This function is fine for data that at least approximates to a Normal distribution.  There is an alternative method that can be used based on bootstrapping (Wikipedia link) that has only become feasible since computerised statistical applications and electronic spreadsheets became widely available.  It relies on hundreds of repeated estimations of the mean from the sample data so some programming is required to use it. Since I am not a statistician, I feel uncomfortable about making too many assumptions about the data so the bootstrapping method is appealing.  However, to use it in either Google spreadsheets or Excel requires some coding which is the subject of this blog.

Both VBA and JavaScript code are given here to show how the bootstrapping can be implemented for estimation of the confidence interval of a sample mean.  Comparing and contrasting the code in the Excel and Google versions highlights some important differences between the functionality of the two languages.

Sample Data
Firstly, for some sample data here are 12 human body temperature measurements:
The above data was taken from Intuitive Biostatistics by Harvey Motulsky, a highly recommended book if, like me, you are interested in statistics but are not so mathematically inclined.
If these data points are entered in either an Excel or a Google spreadsheet, the 95% confidence interval of the mean can be calculated using the formula:
See the Excel documentation link above for further information on the CONFIDENCE function.

To calculate the 95% confidence using the bootstrapping method, we need code that does the following:
  • Randomly select a sample of the same size from the 12 values allowing the same value to be chosen more than once, that is, sampling with replacement.
  • Calculate a mean for new sample
  • Repeat the previous two steps multiple times (1000 times in this example) and store all the 1000 means that were calculated by re-sampling the original sample set.
  • Sort the 1000 calculated means in ascending numerical order.
  • The confidence interval corresponds to the 26th and 975th values.  2.5% of the calculated means are below the lower value and 2.5% are above the upper value.

To code this in JavaScript, we need to:
  • Transfer the sample values from a spreadsheet range to a JavaScript array.
  • Perform a number of random re-samplings of this array, 1000 was chosen in the example code.
  • Calculate the means for each of the arrays, store the means, and, after the 1000 trials are complete, sort the array of means, identify the 26th and 975th elements, and return the result as a string.
// A function to calculate 95% confidence intervals of the sample mean by the bootstrapping method.
function calculateCI(values) {
 var n = values.length;
 // This check is required because if called as a user-defined function values will be an array of arrays!
 if( values[0] instanceof Array ) {
//Could also use Array.isArray(values[0]) above to check for “arayness”
    values = function(value) { return value[0]; } );
 var rand_idx;
 var arr;
 var arr_permuted = [];

 var permute_times = 1000;

 for( var i = 1; i <= 1000; i++ ) {
   arr = [];
   for( var j = 0; j < n; j++ ) {
       rand_idx = Math.floor(Math.random() * n);
       arr[j] = values[rand_idx];

 arr_permuted.sort(function(a,b){return a > b ? 1 : a < b ? -1 : 0;});

 return  arr_permuted[25].toFixed(2) + " - " + arr_permuted[974].toFixed(2);


// Return the arithmetic mean (average) for an array of numbers.
function mean(arr) {
   var element_count = arr.length;
   var total = 0;
   for( var idx = 0; idx < element_count; idx++ ) {
       total += arr[idx];
   return total/element_count;

Notes On The Code Above
  • The function calculateCI(values)  can be called as a user-defined function that can take a range as of cell values as input.  To do this, the “instanceof” check on the first value is required because the range of input values is in the form of an array of arrays.  If the first input value is an array, this is taken to mean that the entire input is composed of an array of arrays so a map function is applied to build a new array of just the first elements of each sub-array.  A consequence of this is that the input values must all come from a single column.
  • Something to be aware of with JavaScript is that the code will run without error without the instanceof check but it will always return “NaN - NaN” when called as a user-defined function.  It runs fine when called like so:
function test() {
 var temps = [37,36,37.1,37.1,36.2,37.3,36.8,37,36.3,36.9,36.7,36.8];
  • It took me some time to figure out why it didn’t work as expected when called as a user-defined function though I had already identified the potential snag earlier in a posting about arrays.  This point also emphasizes the fact that JavaScript’s flexibility and dynamic typing come at a price.  Checking and testing is paramount!
  • The method used for generating the random indexes used to sample the array of values was taken from this source.  The linked article highlights the importance of choosing the correct Math method to avoid bias.  We all cobble together code from Google searches but there is an inherent danger in doing so if we do not fully understand what the code does.  I was unaware of the bias in using round() method until I read that blog.  Once again, testing output against expectations is crucial and all code, including code on this blog, should be treated as experimental until it has been rigorously tested for its intended purpose.  I cannot emphasize this point strongly enough: blithely copying and pasting code, for any programming language, into your project with proper scrutiny and testing, is a recipe for disaster!
  • After the 1000 iterations the re-sampled means are stored in an array that needs to be sorted numerically.  This brings up another issue because the JavaScript default sort is textual but a function can be passed to the sort() method to make the sort numeric ascending, the code used was taken from this source where a description is given of the sort() method.
  • Once the array is sorted, the 26th and 975th elements are taken as the lower and upper confidence limits of the 95% confidence interval.

 Excel VBA Code (comments in green, code in blue)
Option Explicit
'Return an array of numerical values from an input range
‘This could be simplified as a range can return an array of variants but the
‘more verbose code was preferred to ensure all input was numeric and the array generated
‘was one-dimensional and zero-based.
Function GetArrayFromRange(rng As Range) As Double()
    Dim cell As Range
    Dim ArrayFromRange() As Double
    Dim idx As Long: idx = 0
    For Each cell In rng.Cells
            ReDim Preserve ArrayFromRange(idx)
           ArrayFromRange(idx) = cell.Value
           idx = idx + 1
    Next cell
    GetArrayFromRange = ArrayFromRange
End Function
'Return the mean of an input array of numbers using the built-in spreadsheet function "AVERAGE"
Function GetArrayMean(arr() As Double) As Double

    GetArrayMean = Application.WorksheetFunction.Average(arr)

End Function
'Taking an input array of numbers, return a new array of the same size by re-sampling
'Check this link to see how the random indexes are generated:
Function GetNewArrayBySampling(arr() As Double) As Double()
    Dim arrSampled() As Double
    Dim i As Long
    Dim randIndex As Long
    Dim lastElementIndex As Long: lastElementIndex = UBound(arr)
    For i = 0 To lastElementIndex
           ReDim Preserve arrSampled(i)
           randIndex = Int(Rnd() * lastElementIndex)
           arrSampled(i) = arr(randIndex)
    Next i
    GetNewArrayBySampling = arrSampled
End Function

'The subroutine that calculates the confidence interval
' It prompts for the range
' Extracts the numerical values from that range into an array
' Adds a new worksheet to be used to hold and sort the 1000 re-sampled means
' Generates the 1000 re-sampled means and writes them to the new worksheet.
' Perform a numerical ascending sort of the 1000 re-sampled means
' Displays a MsgBox with with the 26th and 975th cells in the sorted range
'Cleans up by deleting the new worksheet
Sub Main()
    Dim tmpSpreadsheetName As String: tmpSpreadsheetName = "tmpResampled"
    Dim resampleTimes As Long: resampleTimes = 1000
    Dim inputDataRng As Range
    Dim inputDataArray() As Double
    Dim newWorksheet As Worksheet
    Dim i As Long
    Dim mean As Double

    Set inputDataRng = Application.InputBox("Select data for Confidence Interval", "Confidence Interval Calculation", , , , , , 8)
    inputDataArray = GetArrayFromRange(inputDataRng)
    Set newWorksheet = ActiveWorkbook.Worksheets.Add
    newWorksheet.Name = tmpSpreadsheetName
    For i = 1 To resampleTimes
           mean = GetArrayMean(GetNewArrayBySampling(inputDataArray))
           newWorksheet.Range("A1").Offset(i - 1, 0).Value = mean
    Next i
    newWorksheet.Range("A1:A1000").Sort key1:=Cells(1, 1), order1:=xlAscending
    MsgBox "CI (95%): " & newWorksheet.Range("A26").Value & "-" & newWorksheet.Range("A975").Value
    Debug.Print "CI (95%): " & newWorksheet.Range("A26").Value & "-" & newWorksheet.Range("A975").Value
End Sub

Notes ON The VBA Code
  • The VBA code is considerably more verbose than the JavaScript version but it could be shortened.  
  • VBA is strongly typed (at least when we avoid the variant type) and this can help to pick up erroneous input.
  • There is no need to write a “mean()” function because in Excel VBA it is possible to call a worksheet function just as if it were a VBA function.  This is a nice feature that does not appear to be possible in Google spreadsheets using JavaScript. Please let me know if there is a way to do this aside from the hack of writing the formula to a cell and then retrieving the result!
  • There is no built-in VBA array sort function or method, various array sorting code is available but writing the re-sampled means to a temporary sheet and doing the sorting there is an acceptable workaround.

I'd like to re-emphasize that serious statistical analysis merits consideration of serious statistical packages.  Spreadsheets are fine for basic exploratory-type work but concerns have been raised repeatedly about Excel in this regard.  Many of the problems may have been fixed but it is worth reading this piece before doing any heavy analysis in Excel.  As for Google spreadsheets, I do not know how it fares in this regard but it is probably best to treat it with caution too.  R is an excellent free and open source statistical package worth exploring.  However, I found the learning curve quite steep though the R programming language does, especially in its functions and functional approach, bear some resemblance to JavaScript.  Finally, JStat which runs in the browser came to my attention recently.  This looks very promising although the documentation is sparse at the moment but it's certainly one to watch!