Friday, December 12, 2014

Copying Tab-Separated Variable (TSV) File Contents Into A Google Spreadsheet

Note: Google Spreadsheets comes with a ready-made function for importing data tables (including tab-separated and comma-separated tables), it is called IMPORTDATA here is the Google documentation for it.  The code below is useful if you need to grab the data and process it first or do something with it other than just write it to a spreadsheet. I wrote the code here before I was even aware of the function IMPORTDATA.


Tab-separated value (TSV) files are a a simple and popular data interchange that are generated by many tools and applications. They can be easily imported into spreadsheet applications like Excel. I frequently like to view and share them in Google Spreadsheets so, to facilitate this, I wrote a few GAS functions to take an uploaded TSV file in Google Drive and copy its contents into a sheet. Links to the sample data and the spreadsheet output with the GAS code are available below.


Summary of the GAS Code
  • Given a TSV file name, return its file ID
  • Read the entire file contents into a single string
  • Create an array by splitting the string using the newline (\n) character
  • Loop over each line and split line on tabs (\t) thereby building an array of arrays
  • Copy the array to a sheet

Here are the functions that implement the above tasks

Get the file ID for the target file
function getFileIDFromName(fileName) {
  // Return the file ID for the first
  // matching file name.
  // NB: DocsList has been deprecated as of 2014-12-11!
  // Using DriveApp instead.
  var files = DriveApp.getFiles();
  while (files.hasNext()) {
    var file = files.next();
    if (file.getName() === fileName) {
      return file.getId();
    }
  }
  return;
}
Return the file contents as an array of arrays
function getTsvFileAsArrayOfArays(tsvFileID) {
  // Read the file into a single string.
  // Split on line char.
  // Loop over lines yielding arrays by
  //  splitting on tabs.
  // Return an array of arrays.
  var txtFile = DriveApp.getFileById(tsvFileID),
      fileTextObj = txtFile.getAs('text/plain'),
      fileText = fileTextObj.getDataAsString(),
      lines = fileText.split('\n'),
      lines2DArray = [];
  lines.forEach(function (line) {
                  lines2DArray.push(line.split('\t')); 
               });
  return lines2DArray;
}
Write the array of values to the active sheet
function writeArrayOfArraysToSheet(tsvFileID) {
  // Target range dimensions are determine
  // from those of the nested array.
  var sheet = SpreadsheetApp.getActiveSheet(),
      arrayOfArrays = getTsvFileAsArrayOfArays(tsvFileID),
      dimensions = {rowCount: Math.floor(arrayOfArrays.length),
                 colCount: Math.floor(arrayOfArrays[0].length)},
      targetRng;
  targetRng = sheet.getRange(1, 1, 
                             dimensions.rowCount,
                             dimensions.colCount);
  targetRng.setValues(arrayOfArrays);
}
Prompt user for a file name and run the TSV to sheet operation
function runTsv2Spreadsheet() {
  // Call this function from the Script Editor.
  var fileName,
      fileID,
      file;
  fileName = Browser.inputBox('Enter a TSV file name:',
                               Browser.Buttons.OK_CANCEL);
  fileID = getFileIDFromName(fileName);
  if (fileID) {
    writeArrayOfArraysToSheet(fileID);
  }
}
To run, you can paste the above functions into a Script Editor and try them out on your own data. To run, just upload a TSV file to your Google Drive, and execute the function named runTsv2Spreadsheet().
If you want some sample data, I have made my test TSV file public so you can just copy it, link here!
If you wish to see what the spreadsheet output should look like, the generated sheet is linked here! The code is also there, so just open the script editor and have a look. You can copy both the spreadsheet and TSV file to have a play with them.

Important Points About The Code Above

  • Memory usage: The code given reads the entire file into a GAS string. I have used it for files with circa 13,000 rows but if the target TSV file is very big, you may run into problems.
  • The code assumes each record is separated by a new line and that each filed (column) in the record is tab-separated.
  • Duplicate file names: The function used to retrieve the file ID, getFileIDFromName(), returns the ID for the first matching file name. You can have duplicate file names in the same folder in Google Drive because they are distinguished by unique file IDs and not names!
  • DocsList has been deprecated as of 2014-12-11 so DriveApp has been used in its place in function  getFileIDFromName()
  • The examples above provide a re-cap on subjects covered in earlier blog entries and in my Leanpub book.

No comments:

Post a Comment