Wednesday, April 1, 2015

Automated Generation Of SQL From Google Spreadsheets Using The Revealing Module Pattern

Update 2015-06-02: Fixed a bug that affected code generated for MySQL only where the CREATE TABLE statement column types were all assigned "undefined". Thanks to Ivan for spotting that and alerting me. Code on GitHub now fixed too.

Update 2015-06-09: Some users have reported problems with the shared sheet not generating the SQL as expected. If you encounter problems, try creating your own spreadsheet and then add the three code files (either from the shared sheet or from the GitHub link).

Introduction

I am currently writing an update of my Google Spreadsheet Programming book (free for those who buy, or already have bought, the current version). I want to include a new chapter that describes a large and potentially useful application that brings together, in a coherent manner, many of the topics described in the book.
I work with spreadsheets and relational databases on a daily basis and frequently have to move data between them. I have written some scripts in the past in shell and Python that automated/semi-automated the process of transferring a tab-separated file into a relational database. These scripts generated the SQL CREATE TABLE statement using the column headers and also generated the upload scripts to transfer the data.
I decided to write a similar application in GAS that uses the column headers to generate the CREATE TABLE statement and then generates INSERT statements for each data row in the input sheet. Having now written the application, I am finding it more useful than I expected so I am hoping that others might also find it useful.
If you use relational databases and would like to automate generation of both CREATE TABLE and SQL INSERT statements, then read on. Even if this is not an area of interest for you but you are interested in GAS/JavaScript, then you may derive some benefit from it too as it employs a useful JavaScript pattern and builds a practical, if plain, GUI.

 

 Setting Up The Application


Here is a link to the spreadsheet containing the application. This link lets you view only. To make your own copy of the application:
  • Create a new spreadsheet and add a sheet named "PasteTableDataHere". The name you give to the spreadsheet is unimportant but the application expects to find the input data in a sheet with this name.
  • Got to the GAS script Editor and create a project. The name does not matter, I called mine "spreadsheet2sql_code".
  • Get the three files from this GitHub repository (Code.gs, index.html and spreadsheet2sql.gs). It is best to check out the code using a Git client but you can copy the data directly. It is best to copy from the GitHub "Raw" view because indentation is retained when copied from this source.
  • In the GAS editor of your spreadsheet, create three empty files with these names and then copy and paste the code from each to your corresponding file.
  • When you re-open the spreadsheet, the menu should appear and you will be able to display the form and run the application.
  • If you need test data to get started, copy the data from the linked spreadsheet above into the "PasteTableDataHere" sheet and run the application on this small data set.
  • Execute the generated SQL in the relational database that you chose on the form.
  • Please feel free to email me if you have any difficulty (mick@javascript-spreadsheet-programming.com).

Overview Of Techniques Covered

Apart from the practical utility of this application, the code also covers:
  • The revealing module pattern
  • HtmlServices GUI building
  • Client side JavaScript
  • Menus
  • JavaScript array manipulation
  • General GAS programming techniques using,  among other things, Range and Sheet objects

Application Requirements Overview

  • Run from a HtmlService GUI where the GUI is displayed using a custom menu
  • Output consisting of CREATE TABLE and INSERT SQL statements is written to a new spreadsheet
  • Generates a primary key column in the output
  • The output SQL statements can be written in one of three open source relational database SQL "dialects"; PostgreSQL, MySQL and SQLite
  • The output can be pasted into a suitable database query tool (for example pgAdmin3 for PostgreSQL) and executed or can be saved to an SQL source file and executed by a database client program such as psql for PostgreSQL, mysql for MySQL and sqlite3 for SQLite
  • Can assign data types to columns. Distinguishes between text, integer, floating point number and date types
  • Able to recognise empty cells as NULLs that are accounted for correctly in the generated INSERT SQL statements

Images Of Input Sheet Data, The GUI, and Output Generated By Application

Screenshot of input data in spreadsheet

The Input Data




Above is a screenshot showing the input data. Those who have learned Oracle SQL some years ago may recognise it as being taken from the emp table of the old teaching data schema scott. I changed one value in the column entitled comm from 500 to 500.5 in order to force the application to detect it as a floating point number and thereby generate a floating point column type. The application is hosted in the spreadsheet named "SpreadsheetToSQL", this is shared below. Note that there is a custom menu entitled "Create SQL" on the right-hand side that appears when this spreadsheet is opened.

The GUI

Selection the "Create SQL" menu item "SQL Create Form" displays the following GUI:

There are four inputs (top to bottom):
  1. The name you want to give to the database table, emp here
  2. The target relational database management system (RDBMS), PostgreSQL in this example. The drop-down also has values for MySQL and SQLite.
  3. The name for the spreadsheet to which the SQL statements are written, ScottEmp in this example. The spreadsheet will be created and stored in the user's Google Drive.
  4. The name for the sheet to which the output SQL statements are written, scott_emp in this example.
Once the "Make SQL" button is clicked, the application begins generating the output SQL. When finished, a "Done" message box appears and the GUI disappears. If there has been an error, check the View->Logs menu in the script editor window.

The SQL Output

When the application has completed and the GUI has disappeared, go to your Google Drive (root folder) and open the newly generated spreadsheet. Here is what you should see:



All the output data is in column A. Cell A1 contains the CREATE TABLE statement and the rest are the INSERT statement for the data rows. Each line is terminated with the SQL semi-colon command termination characte.

The RDBMS chosen in the GUI was PostgreSQL so we can copy the statements above and paste them into the pgAdmin3 query window as shown below:



Hitting the execute button in the pgAdmin3 Query window will create the table emp and insert the 14 data rows into it. Note a minor inconvenience: The CREATE TABLE statement is enclosed in double quotes to preserve the embedded new line characters, these quotes have to be removed before executing the statement.

Code Discussion

I am not including the code here because it is quite long and it is extensively documented. If you wish to understand the code, then copy the spreadsheet and then read the comments given in the three files where the application is defined.
The application code is divided into three files:
  1. Code.gs is a standard GAS code file containing functions that  display the menu, call the code to build the GUI, take the GUI input and pass it to a function that uses the module (spreadsheet2sql) to create the SQL statements, and write these statement to a newly created spreadsheet.
  2. The index.html file defines the GUI. The GUI input elements are defined in an HTML table. although this is not best practice, it works quite well for a simple GUI like this one. There is one client side JavaScript function defined in this HTML file that is sued to execute the GAS getValuesFromForm() function defined in Code.gs. This ability of JavaScript in the client to call GAS on the server is crucial to how HtmlServices GUI's work.
  3. The heart of the application is defined in the file spreadsheet2sql. The application is implemented, as stated above using the revealing module pattern. I gave a link above to a discussion of this pattern. The code comments should help in understanding the pattern and a Google search will bring back lots of useful resources that can describe it better than I can. An earlier blog entry discussed the Constructor-Prototype pattern and this is another well known JavaScript pattern for building scalable applications. 

Concluding Remarks

The best way to learn from this blog entry is to make a copy of the spreadsheet hosting the application and to experiment with it and read the code and code comments. The application is useful to me and I hope it will also be useful to others. I have tested it quite heavily and used it frequently and, so far, it has worked quite well. However, I can offer no guarantees so please check inputs and outputs carefully and let me know of any bugs. I have used it for quite large data inputs of up to 30,000 data rows and it worked fine. Google Spreadsheets have improved enormously recently and a row count of 30,000 would not have been feasible in earlier versions. However, here are two important potential problems to bear in mind
  1. Floating Point Number Precision: For MySQL this application uses the DECIMAL type. If you need very high precision, check the data types defined for "float" for each of the three RDBMs and alter accordingly.
  2. Copying Large Amounts Of Data In Google Sheets: When I tried to copy the 30,000 rows to the sheet named "PasteTableDataHere", only about 1,000 rows were copied even though I selected the full sheet. I circumvented this problem by resorting to GAS to copy the full target sheet that I then re-named to "PasteTableDataHere". I will post an entry on this in the near future. 

This application could be extended in a number of ways. PostgreSQL and MySQL support the concept of metadata where documentation can be entered into the data dictionary to describe database objects such as tables and columns. In PostgreSQL the SQL syntax to this is like:
COMMENT ON COLUMN my_table.my_column IS 'Employee ID number';
One approach to document the columns could be to add notes to the header row cells that could then be used to generate comments like the one given above. A textarea input could also be added to the HTML form to allow the user to enter multi-line table documentation into the RDBMS data dictionary. Believe me, this type of documentation is crucial when dealing with complex database schemas.
As stated earlier, I will add a chapter to my book that will discuss this application and the underlying code in greater detail. I will also re-visit this blog entry from time to time to improve it. The linked spreadsheet application will also be reviewed and  improved so, if interested, please re-visit this page to see how it has changed. Meanwhile, happy coding!