Friday, November 30, 2012

Using GitHub For JavaScript and VBA Code Examples

From now on, I'm going to use GitHub to share, version control, and store code examples.  I'll continue to place small code snippets in the blog text but will add GitHub links for larger code examples.

My motivation for doing this is as follows:
  • It forces me to learn the Git revision control system.
  • I can make working code more accessible.
  • The blog text can concentrate on explaining code.

Click here to view my GitHub account.

GitHub Contents as of Today
Currently, there is a single repository in my GitHub area. It is called VBA and it contains two class files.  One implements a simple Dictionary-type object using collections.  I wrote this because the VBA Dictionary object is not available on the Mac version of Excel, anyone running Excel on Windows will not need this class.
The second class file implements a DNA-to-protein sequence translator.  The only interesting aspect of this class is that it can deal with DNA sequence that contains IUPAC DNA nucleotide ambiguity codes.  It therefore can report amino acid mixtures for polymorphic sites so is useful for processing sequences from viruses such as HIV and Hepatitis C.
Regarding IUPAC codes and translation tables, see these links for more information:
http://en.wikipedia.org/wiki/Nucleic_acid_notation
http://en.wikipedia.org/wiki/DNA_codon_table


In future blog entries, I will re-write the DNA translation code in JavaScript and will add it to a JavaScript repository in my GitHub area.  Comparing and contrasting the VBA and JavaScript implementations should be informative.

The code is heavily commented, so if you are interested, take a look in GitHub!

The VBA code assumes some knowledge of genetics/microbiology but I chose it because it is a problem domain that I am familiar with and dealing with IUPAC ambiguity codes complicates matters considerably.

DNA Translation - An Aside
  • Each strand of the iconic double helix is composed of nucleotides (also called bases) denoted by the single-letter abbreviations A, C, G, and T.
  • DNA is translated into protein via intermediate RNA but that fact can be ignored here without affecting the results.
  • The DNA code is read in sets of three that correspond to codons in the intermediate RNA
  • The input DNA is said to be "in-frame" if it begins with a triplet that corresponds to a codon.
  • The DNA in the code examples is in-frame, try removing the initial character and then do the translation!  The output will contain multiple asterisk characters (*), a sure sign that the translation is not in-frame.  Removing another leading nucleotide will return another out-of-frame translation but removing a third will put the sequence back in-frame because the first three nucleotides will correspond to a codon.
  • Translation of DNA is more complicated if the input DNA contains IUPAC ambiguity codes that denote the presence of a nucleotide mixture at that location, e.g. R denotes an A and G mixture.

For example, consider this VBA code:

Sub testAmbig()
    Dim dnaSeq As String: dnaSeq = "CCY-CAG-RTC-ACT-CTT-TGG-CAA"
    Dim transl As clsTranslateDna: Set transl = New clsTranslateDna
    dnaSeq = Replace(dnaSeq, "-", "") 'remove the "-" from the DNA sequence
    Debug.Print Join(transl.AminoAcidsForDNA(dnaSeq, True), " ") ' prints "P Q IV T L W Q"
End Sub

  • The two ambiguity codes are highlighted in bold and the DNA is broken into in-frame triplets by the "-" character which is removed before translation.
  • The space-separated translated output shows the effect of the ambiguity codes. For the triplet with the "Y" a single P is given because Y codes for C or T so the triplet is a mix of CCT or CCC but both of these encode P (Proline).  The genetic code is said to be "degenerate", i.e. there is in-built redundancy especially with the third nucleotide.  
  • However, for the R in the second triplet the two possible triplets are ATC or GTC, ATC encodes I (Isoleucine) while GTC encodes V (Valine) so the output is given as IV.

VBA Code Notes
The two VBA classes given provide concrete examples of doing object-oriented (OO) programming in the language.  The classes have methods, constructors, and properties so they may be of interest to anyone embarking on OO in VBA.  The DNA translation class is a re-write of some old procedural code I wrote a long time ago.  The dictionary implementation is very simple but does show usage of the oft neglected VBA Collections class.

Finally, the VBA code referred to here should run in any application that hosts VBA.  It was developed in Excel but it should run equally well in MS Word, MS Access etc.  When I first wrote VBA, I used to mix GUI code and code that accessed the application classes with the business logic code.  Doing so makes code re-use difficult or impossible. The translator class given here could be used directly in Excel as a user-defined function though the call might require a short calling function.

This lesson also applies to Google App scripting with JavaScript.  By separating GUI, and Spreadsheet object code from business logic, there is a good chance that we can write JavaScript functions that can be run in the browser, in Google spreadsheets, or even on the server using Node.js.  Take the VBA translator code for example, I will re-write it in JavaScript so that it can be executed in the browser, in Google spreadsheets (even as a user-defined function), or from Node.js!  JavaScript is now ubiquitous so knowledge gained in one setting can be applied to others.  The days when JavaScript was a derided browser-only language are long gone!

No comments:

Post a Comment