Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2015/12/10

Trying to minimize in-page Javascript



An overstatement, but a funny one. When everything on the web was new, we put all our JS code into the page itself, because there wasn't much other choice. We put our triggers into our tags and in script blocks at the top of the page, as we did with our CSS in style blocks.

Eventually, we decided that this is bad. In part, consistency across several pages requires each page to have access, so we pulled our style into CSS files and our code into JS files, and it was good.

And then came tablesorter.

In my lab, I deal with data a lot, and I create many web pages where the data is put in tabular form. This is the use of tables that Zeldman wouldn't yell at me about. We use tablesorter to allow us to do cool things with the tables, and if we were doing it vanilla, we could just make tablestarter.js that reads something like $(function() { $('.sorted').tablesorter() } ), but instead, we parse dates into sortable form and set widget options and all sorts of stuff, which isn't necessarily going to transfer between tables. So, I have a script block that's ever-growing, mostly with config. I've set up a tablesorter_tools library that'll allow me to start pulling code out of the pages, but the config? I will have to find a better solution.

2015/12/03

Working with Perl Critic around Exporter

I am working on a framework for modern web APIs using CGI and old-school Perl objects, inspired by what I learned from perlbrew. This is more or less how the included modules go. This auto-exports every function that starts with 'api_', so that I could write helper_one() and helper_two() with no worries about them being exported, and without having to add api_butt, api_stump, and api_tail to @EXPORT, because it's striking me that the export list follows documentation as a pernicious form of code duplication.

package API_Proof ;use strict ;
use warnings FATAL => 'all' ;

use Exporter qw{import} ;

our $VERSION = 0.1 ;

our @EXPORT ;
for my $entry ( keys %API_Billing_0_1:: ) {
    next if $entry !~ /^api_/mxs ;
    push @EXPORT, $entry ;
    }

sub api_stub {
    # code goes here
    }

I intend to put the whole deal up on +GitHub eventually, but to avoid complication, I'll just get to the point where it's used, below. I'm exporting everything that starts with api_, so it's all available for $api->run() when we get there. (It's all in previous posts.)

#!/usr/bin/env perl

use strict ;
use warnings ;
use lib '/path/to/lib' ;

my $api = API->new( @ARGV ) ;
$api->run() ;

package API ;
use lib '/path/to/lib' ;
use base 'API_Base' ; use API_Stub ;

And here is where we run into Perl Best Practices and Perl::Critic. I've been running some modules through perlcritic -1 for sake of being complete, and it lead me to do some changes, and there's one that is keeping me from being clean. It's that I'm using @EXPORT. I should be using @EXPORT_OK or %EXPORT_TAGS instead, it says. This means, that first code block should have something like this instead.

my @export ;
for my $entry ( keys %API_Bard:: ) {
    next if $entry !~ /^api_/ ;
    push @export, $entry ;
    }

$EXPORT_TAGS{ api } = [ @export ];
our @EXPORT_OK = ( @{ $EXPORT_TAGS{ 'api' } } ) ;

And then use API_Stub qw{:api}. I am not quite convinced. I'm open, though. I guess I would just like to know what the problem with export by default is, but this isn't in PBP.

2015/12/01

Thinking Aloud about Testing a Module

I have a module that handles getting a DBI object connecting to MySQL databases. With some work, it could be made to talk to others, but since I don't really have any Oracle or PostgreSQL databases to connect to, it hasn't been worth it.

I have another module that handles calling the first module, managing credentials so they don't show up in code, getting and holding the DBI object and doing the actual calls to the database. This way, I can set a variable saying this DB is named 'foo' and say "I want an arrayref containing the result of this query" without dealing with all the nooks and crannies of DBI.

I now have a module that uses that second module to do work. Get information from the database, throw it into an object and kick it out to be converted to JSON later. And I want to write tests for it. But I have thought and continue to think that having a test that queries a database in a way that requires a dynamic resource to be static is stupid.

Is the way to test it to feed in a "this is a test" flag somewhere, stick a bunch of canned output into the functions to output if the flag, because what I should be testing here is that the module dumps things the right way, right?

2015/11/12

Considering committing Clever with jQuery XHR code

A code snippet

    if ( protecting_namespaces.data.hasOwnProperty( 'xhr' ) ) { 
        submission.data.xhr.abort() ; 
        }
    protecting_namespaces.data.xhr = $.ajax({
        url: url ,
        data: object ,
        method: 'POST',
        dataType: 'json'
        }) ;

Seems that Syntax Highlighter is happy with JavaScript. OK.

I'm using jQuery's .ajax() interface, and what this does is, should the function that does this get called again, aborts an unfinished AJAX call. This is wonderful when this will be called from one place, but it sucks if you call it from several places to do a few different things.

And, by wonderful, I mean useful for the UX. If you tell the web server to do a thing, you cannot go back and say "No, I didn't mean it." You can tell it "Do something else to get us back to the first state", but you cannot stop it once it has started, unless you're logged into the server and can kill -9.

So, I am considering making xhr into an object, based on a unique identifier for what's actually being done, which should give me protecting_namespaces.data.xhr[id], so I could have several XHR tasks going on in parallel.

2015/11/06

Trying to Read MongoDB dump files using BSON

I've been looking at increasing the amount of MongoDB we use at work, and this includes backing up the data. Due to my own confusion, I had a little issue getting mongodump to work, but I have been able to dump from one Mongo database and restore to another.

mongodump writes in a format called Binary JSON, or BSON. I installed BSON.pm with the intention of reading the BSON file and ensuring it works. with small tests, I was able to put objects into BSON, write to file, read from file, and use Data::Dumper to display the object is what I wanted it to be.

But, I find I cannot read the file, because BSON.pm reports it as having an incorrect length.


I fully expect that I'm doing something subtly stupid, but what it could be isn't immediately obvious. the contents of $bson should be exactly as written to file, and mongorestore was happy with it. encode() and decode() had worked acceptably, but admittedly on a much smaller dataset than the one I'm working with here, which contains several months of status updates.

I suppose I don't need to read the BSON files, but I do like being able to check the contents of a database before restoring.

2015/10/17

Long, Boring To-Do Post

I'm considering stepping up my web presence, and as a precursor to that, I've created a twitter account specifically connected to this blog, @varlogrant. So, I need to do things to make it look more like I'm using it and less like it's an egg. (If I made the account picture the cover of that one Wilco album, would people get the joke?)

I certainly can automate a lot of stuff, going through the back catalog and retweeting old posts, but the question is, how much of that is just spammy? And, to what extent should I move my opinionated tech tweeting from @jacobydave to @varlogrant?

Beyond that, it strikes me that blogs where the blogger is more-or-less talking to himself are self-limiting, so I should start blogging more about certain subjects and less about things that are annoying me right this minute. 

That being said:
  • I am involved in a group creating an annual event. Specifically, I'm the web guy. There are some administrivia things going on, creating the pages for the team. This is a small matter of Wordpress time, so not hard. 
  • A harder thing is this: We have photos of previous presenters, which were existing head-shots of them, from before their presentations. We also have a large library of photos from the events. I've decided that the smart move is to use Flickr's API and WebService::Simple to grab all the old photos, use smartcrop.js to crop them to the appropriate size, and either personally chose a good one or make a web tool to crowdsource that amongst the group. This process seems more fun to me than the other thing.
  • I promised a while ago to contribute some R-specific stuff to Bobby Tables, and have done jack since. I made some progress on it recently, but need to install a more modern version of R to do appropriate testing before I make a PR. When I first looked into it, I saw no SQL escaping and no placeholders, but now I'm seeing some progress. Nothing's quite up to snuff, in my opinion, but it's better. 
  • A side project I'm involved in has need of Bluetooth Low Energy support, and I've done the slightest poking with it. I need to do more. It seems that a few necessary tools for debugging are Unix/Linux/Mac only, and my dev laptop is Windows, so I need to either get going with VirtualBox, figure things out in Visual Studio or let it go.
  • There's also need for a smartphone app, and my experiences with Eclipse and Android Studio haven't been pleasant. I know there's Cordova integration with Visual Studio, so that seems to be the quick way in. I don't know if I can do any BLE stuff within a Cordova app, but we'll get there when we get there.
  • There's another side project I'm involved in, called Lafayettech. Specifically, I'm in the proto-makerspace corner, Lafayettech Labs. And it seems like I'm the only one involved in it. So, I am thinking of stopping. Right now, there's a few self-perpetuating scripts in a crontab file that do much of the work. I need to decide something about this.
There's a few more things that should be here, but I don't have together enough to even make a lame bullet point for.

2015/10/11

Thinking Aloud: Power Issues for a Raspberry Pi as a Car Computer

We could switch from a Raspberry Pi to an oDroid or another sort of low-power computer-on-a-board. My Pi has a task right now, so if I was to go forward with this, I'll have to get something new anyway, but for sake of this discussion, we'll assume this is it.

I own a USB GPS unit. I own a OBDII-to-USB unit. I own a small VGA monitor for Pi use. A thing that would be useful is a thing that does some networking over the cellphone network, but if it just dumps to my home network when I get home, that'd be good enough.

Here's a niggly bit or me: I start the vehicle and the Pi gets power. I stop the vehicle and the power cuts, leading the computer shutting down suddenly. This is not a happy thing with computers. In fact, I think I can say they hate that, and eventually, the SD card will say enough with this and not boot.

So, the proper solution is to have a power circuit with a battery, that allows it to boot when the car starts and sends the shutdown signal when it stops, but providing enough juice in the battery for the Pi to shut down nicely.

Google told me how to trigger the shutdown when wanted. Just need to figure out how to know what's going on with power.

Overkill II: The Quickening

Previously on /var/log/rant, I talked about using recursion to brute-force a middle-school math problem. Because I learned a little bit about using Xeon Phi co-processor (the part formerly called video cards), I thought I'd try C++. And found that, while the Perl version ran for about a minute and a half, the C++ version took about a second and a half.

I then tried a Python version, using the same workflow as with the C++. I backed off on the clever for the testing because I am not as sure about using multidimensional arrays in C++ and Python as I am in Perl. When you only code in a language about once every 15 years, you begin to forget the finer points.

Anyway. the code follows. I don't believe I'm doing a particularly stupid thing with my Perl here, but it's hard to ID particularly stupid things in languages sometimes. Here's the code, now with your USDA requirement of Node.js.

2015/10/09

Overkill: Using the Awesome Power of Modern Computing to Solve Middle School Math Problems

I was helping my son with his math the other night and we hit a question called The Magic Box. You are given a 3x3 square and the digits 3,4,5,6,7,8,9,10,11, and are expected to find a way of placing them such that each row, each column, and each diagonal adds up to 21.

I'm a programmer, so my first thought was, hey, I'll make a recursive algorithm to do this. The previous question, measuring 4 quarts when you have a 3 quart measure and a 5 quart measure, was solved due to insights remembered from Die Hard With A Vengeance, so clearly, I'm not coming at these questions from the textbook.

With a little more insight, I solved it. 7 is a third of 21, and it the center of an odd-numbered sequence of numbers, so clearly, it is meant to be the center. There is only one way you can use 11 on a side, with 4 and 6, so that a center column or row will be 3 7 11. If you know that column and the 4 11 6 row, you know at least this:

.  3  .
.  7  .
4 11  6

And because you know the diagonals, you know that it'll be

8  3 10 
.  7  .
4 11  6

And you only have 5 and 9 left, and they're easy to just plug in

8  3 10
9  7  5
4 11  6

So, that's the logical way to to solve it. Clearly, order isn't important; it could be reversed on the x and y axis and still be a solution. But, once the thought of solving with a recursive algorithm came into my head, I could not leave well enough alone. So, here's a recursive program that finds all possible solutions for this problem.


2015/10/03

You Know Where You Stand In Your Hellhole

Sawyer X tweeted this:
I said "Deep".

This can be a variation of "He says he has 5 years of experience, but he really has 1 year of experience 5 times." Except not really.

I've worked as a developer for years, and it took me years before I started writing modules. It took a while after before I started having those modules being more than bags of vaguely related functions. And it was just this year before I looked into and started contributing patches to open source projects..

So, one way of looking at this is "I have one year experience as a newbie which I repeated for five years, one year of being a coder which I repeated for five years, and I've just finished a year of being a developer making modern tools for other developers, which I haven't repeated." Or the like.

There isn't necessarily anything wrong with this. In the year where you've been coding, you're doing a thing. You aren't growing, you aren't taking it to the next level, but you are creating and maintaining code, and you are making something that provides value to someone.

Or, you can think of Sawyer's statement more like I've been coding, working at the a well-trod level, bit-twiddling and the like, but not doing anything interesting. This is the feeling I get when I get close to more computer-theoretical things. I have access to computers with massive amounts of cores, massive amounts of memory, but don't see where my problems map to those resources. Until I do interesting things with interesting data on interesting hardware, I'm a coder, not a programmer.

I'm interested in improving, in coding less and programming more. Or, perhaps, interested in aspects of improvement but less interested in doing the work. There's a certain safety in knowing that you're doing what you're experienced with, not reaching out. Perhaps David St. Hubbins and Nigel Tufnel say it best in the last chorus: "you know where you stand in a hell hole".


I'm trying to take steps to move forward. Maybe find myself a new, more interesting hell hole to program in. 

2015/09/25

a DRY KISS

I've been working on a tool. I discussed a lot of it yesterday.

I had a model to get the information based on PI, and I wanted to get to what I considered the interesting bit, so it was only after the performance went from suck to SUCK that I dove back, which is what I did yesterday. Starting with the project instead of the PI made the whole thing easier. But now I'm stuck.

The only differences between these queries are on lines 14, 19 and 20. Which gets to the problem. I know that I don't need half of what I get in lines 1-11, but when I pull stuff out, I now have two places to pull it.

I have a great 90-line majesty of a query that includes six left joins, and I append different endings depending on whether I want to get everything, or a segment defined by A or B or something. I could probably tighten it up so I have SELECT FROM, the different blocks, then AND AND ORDER BY. But there we're adding complexity, and we're butting Don't Repeat Yourself (DRY) against Keep It Simple, Stupid (KISS).

I'm happy to keep it as-is, as a pair of multi-line variables inside my program. I think I'd rather have the two like this than gin up a way to give me both, so KISS over DRY, in part because I cannot imagine a third way I'd want to access this data, so we hit You Ain't Gonna Need It (YAGNI).

But if there's strong reasons why I should make the change, rather than package it up and put it in cron, feel free to tell me.

2015/09/24

This is only kinda like thinking in algorithms

I have a program. It has two parts: get the data and use the data.

"Get the data" involves several queries to the database to gather the data, then I munge it into the form I need. Specifically, it's about people who generate samples of DNA data (called "primary investigator" or PI for those not involved in research), a little about the samples themselves, and those that the data are shared with.

"Use the data" involves seeing how closely the reality of the file system ACLs is aligned with the ideal as expressed by the database.

I expected that I'd spend 5% of my time, at worst, in "get the data" and 95% of my time in "use the data". So much so, I found a way to parallelize that part so I could do it n projects at a time.

In reality, it's running 50-50.

It might have something to do with the lag I've added, trying to throw in debugging code. That might've made it worse.

It might have something to do with database access. For this, I think we take a step back.

We have several database tables, and while each one rarely changes, they might. So, instead of having queries all over the place, we write dump_that_table() or the like. That way, instead of digging all over the code base for SELECT * FROM that_table (which, in and of itself, is a bug waiting to happen) (also), you go to one function and get it from one place.

So, I have get_all_pi_ids() and get_pi(), which could not be pulled into a single function until I rewrote the DB handling code, which now allows me to make [ 1 : { id:i, name:'Aaron A. Aaronson", ... }, ... ] to put it in JSON terms. Right now, though, this means I make 1 + 475 database calls to get that list.

Then I get all that PI's share info. This is done in two forms: when a PI shares a project and when a PI shares everything. I start with get_own_projects() and get_other_pi_projects(), which get both cases (a project is owned by PI and a project is shared with PI). That makes it 1 + ( 3 * 475) database calls.

I think I'll stop now, because the amount of shame I feel is still (barely) surmountable, and I'm now trying to look at the solutions.

A solution is to start with the projects themselves. Many projects are on an old system and we cannot do this mess with, and there's a nice boolean where we can say AND project.is_old_system = 0 and just ignore them. Each project has an owner, and so, if we add PI to the query, we lose having to get it special. Come to think of it, if we make each PI share with herself, we say goodbye to special cases altogether.

I'm suspecting that we cannot meaningfully handle both the "share all" and the "share one" parts in one query. I'm beginning to want to add joins to MongoDB or something, which might just be possible, but my data is in MySQL. Anyway, if we get this down to 2 queries instead of the nearly 1500, that should fix a lot of the issues with DB access lag.

As, of course, will making sure the script keeps DB handles alive, which I think I did with my first interface but removed due to a forgotten bug.

So, the first step in fixing this mess is to make better "get this" interfaces, which will allow me to get it all with as few steps as possible.


(As an aside, I'll say I wish Blogger had a "code" button along with the "B" "I" and "U" buttons.)




2015/09/18

Not Done, But Done For Now

I spent some more time on it, and I figured something out.

I looked at the data, and instead of getting 1 2 3 4 NULL NULL 5 6 7, I was getting 1 2 3 4 NULL NULL 7 1 2, starting at the beginning again. So, I figured out how to do loops and made a series of vectors, containing the dates in one, and load averages per VM.


Lev suggested that this is not how a real R person would do it. True. But this works, and I know how to plot vectors but not data tables. So, a few more changes (having the date in the title is good) and I can finish it up and put it into my workflow. Yay me.

2015/09/17

Logging, Plotting and Shoshin: A Developer's Journey

I heard about Log::Log4perl and decided that this would be a good thing to learn and to integrate into the lab's workflow.

We were having problems with our VMs and it was suggested I start logging performance metrics, so when we go to our support people, we have something more than "this sucks" and "everything's slow".

So, I had a reason and I had an excuse, so I started logging. But logs without graphs are boring. I mean, some logs can tell you "here, right here, is the line number where you are an idiot", but this log is just performance metrics, so if you don't get a graph, you're not seeing it.


That tells a story. That tells us that there was something goofy going on with genomics-test (worse, I can tell you, because we had nothing going on with genomics-test, because the software we want to test is not working yet. There was a kernel bug and a few other things that had fixed the other VMs, but not that one, so our admin took it down and started from scratch.

Look at that graph. Do you see the downtime? No?

That's the problem. This shows the last 100 records for each VM, but for hours where is no record, there should be -1 or a discontinuity or something.

I generally use R as a plotting library, because all the preformatting is something I know how to do in Perl, my language of choice, but I've been trying to do more and more in R, and I'm hitting the limits of my knowledge. My code, including commented-out bad trails, follows.


My thought was, hey, we have all the dates in the column final$datetime, and I can make a unique array of them. My next step would be to go through each entry for dates, and, if there was no genomicstest$datetime that equalled that date, I would throw in a null or a -1 or something. That's what the ifelse() stuff is all about.

But, I found, this removes the association between the datetime and the load average, and the plots I was getting were not the above with gaps, as it should be, but one where I'm still getting high loads today.

Clearly, I am looking at R as an experienced Perl programmer. You can write FORTRAN in any language, they say, but you cannot write Perl in R. The disjunct between a Perl coder codes Perl and an R coder codes R is significant. As a Perl person, I want to create a system that's repeatable and packaged, to get the data I know is there and put it into the form I want it to be. The lineage of Perl is from shell tools like sed and awk, but it has aspirations toward being a systems programming language.

R users are about the opposite, I think. R is usable as a scripting language but the general case is as an interactive environment. Like a data shell. You start munging and plotting data in order to discover what it tells you. In this case, I can tell you that I was expecting a general high (but not terribly high; we have had load averages into the hundreds from these VMs), and that the periodicity of the load came as a complete surprise to me.

(There are other other differences, of course. Perl thinks of everything as a scalar, meaning either a string or a number, or an array of scalars, or a special fast array of scalars. R thinks of everything as data structures and vectors and such. Things I need to integrate into my head, but not things I wish to blog on right now.)

The difference between making a tool to give you expected results and using a tool to identify unexpected aspects is the difference, I believe, between a computer programmer and a data scientist. And I'm finding the latter is more where I want to be.

So, I want to try to learn R as she is spoke, to allow myself to think in it's terms. Certainly there's a simple transform that gives me what I want, but I do not know it yet. As I do it, I will have to let go of my feelings of mastery of Perl and allow myself to become a beginner in R.

But seriously, if anyone has the answer, I'll take that, too.

2015/09/14

Thoughts on Machine Learning and Twitter Tools

I have a lot of Twitter data.

I decided a few months ago to get my list of those I follow and those following me, and then find that same information about each of them.

This took quite some time, as Twitter stops you from getting too much at a time.

I found a few interesting things. One was, of all the people I follow and all following me, I had the highest number of bi-directional connections (I follow them, they follow me; let's say "friends") of any, at something like 150.

But I'm thinking about writing the next step, and I gotta say, I don't wanna.

In part, that data's dead. It's Spring me, and my Spring followers. It's Fall now, and I don't know that the data is valid anymore.

So I'm thinking about the next step.

If "Big Data" is more than a buzzword, it is working with data that never ends. I mean, my lab has nodes in a huge research cluster, that contain like a quarter-TB of RAM. That's still a mind-boggling thing when I think about it. And it's not like we let that sit empty; we use it. Well, I don't. My part of the workflow is when the users bring their data and, increasingly, when it's done and being sent on. It clearly is large data sets being handled, but it isn't "Big Data", because we finish it, package it, and give it to our users.

"Big Data", it seems to me, means you find out more now than you did before, that you're always adding to it, because it never ends. "The Torture Never Stops", as Zappa said.

In the case of Twitter, I can get every tweet in my feed and every tweet I favorite. I could write that thing and have it run regularly, starting now. Questions: What do I do? How do I do it?

Here's the Why:

There's tools out there that I don't understand, and they are powerful tools. I want to understand them. I want to write things with these things that interest me, because it's an uncommon day that I'm interested in what I code most of the time.

Then there's Twitter. You can say many many things about Twitter, and it's largely true. But, there are interesting people talking about the interesting things they do, and and other people retweeting it. With luck, I can draw from that, create connections, learn things and do interesting things.

So, while there is the "make a thing because a thing" part, I'm hoping to use this.

So, the What:

A first step is to take my Favorites data and use it as "ham" in a Bayesian way to tell what kind of thing I like. I might need to come up with a mechanism beyond unfollowing to indicate what I don't want to see. Maybe do a search on "GamerGate", "Football" and "Cats"?

Anyway, once that's going, let it loose on my feed and have it bubble up tweets that it says I might favorite, that "fit the profile", so to speak. I'm thinking my next step is installing a Bayesian library like Algorithm::Bayesian or Algorithm::NaiveBayes, running my year+ worth of Twitter favorites as ham, and have it tell me once a day the things I should've favorited but didn't. Once I have that, I'll reorient and go from there.

2015/09/03

So You Think You CAN REST

This is notes to self more than anything. Highly cribbed from http://www.restapitutorial.com/

First step toward making RESTful APIs is using the path info. If you have a program api.cgi, you can post to it, use get and api.cgi?foo=bar, or you can use path info and api.cgi/foo/bar instead.

You can still use parameters, but if you're dealing with a foo named bar, working with $api.cgi/foo/bar is shorter, because you're overloading it.

Generally, we're tossing things around as JSON, which, as object notation, is easier to convert to objects on either side of the client/server divide than XML.

You're overloading it by using request method. Generally using POST, GET, PUT and DELETE as the basic CRUD entities. You can browse to api.cgi/foo/bar and find out all about bar, but that's going to be a GET command. You can use curl or javascript or other things where you can force the request method to add and update.

This means that, in the part that handles 'foo', we handle the cases.

For read/GET, api.cgi/foo likely means you want a list of all foos, maybe with full info and maybe not, and api.cgi/foo/bar means you want all the information specific to the foo called bar.

For the rest of CRUD, api.cgi/foo is likely not defined, and should return an error as such.

So, in a sense, sub foo should be a bit like this:
sub foo {

my $id  = undef ;
if ( scalar @pathinfo ) { $id = $pathinfo[-1] }

# READ
if ( $method eq 'GET' ) {
    if ( defined $id ) {
        my $info = get_info($id);
        status_200($info) if $info;
        status_204();
    }
    my $list = get_list();
    status_200($list) if $list;
    status_204();
}

# CREATE
if ( $method eq 'POST' ) {
    if( $param ) {
        my $response = create($param);
        status_201() if $response; 
        status_409() if $response == -409 ; # foo exists
        status_400() if $response < 0 ;
        status_204();
    }
    status_405();
}

if ( $method eq 'PUT' ) {
    my $id  = undef ;
    if ( scalar @pathinfo ) { $id = $pathinfo[-1] }
    if ( $id && $param ) {
        my $response = update( $id, $param ) ;
        status_201() if $response; 
        status_400() if $response < 0 ;
        status_204();
        }
    status_405();
}

if ( $method eq 'DELETE' ) {
    if ( defined $id ) {
        my $response = delete($id);
        status_400() if $response < 0 ;
        status_204(); # or 201
    }
    status_405();
    }
status_405();
}

And those are standard HTTP status codes. Here's the Top Ten:
  • Success
    • 200 OK
    • 201 Created
    • 204 No Content
  • Redirection
    • 304 Not Modified
  • Client Error
    • 400 Bad Request
    • 401 Unauthorized
    • 403 Forbidden
    • 404 Not Found
    • 409 Conflict
  • Server Error
    • 500 Internal Server Error
Right now, I have the pathinfo stuff mostly handled in an elegant way. I see no good way of creating a big thing without using params, and the APIs generally use them too, I think.

My failing right now is that I'm not varying on request method and I'm basically sending 200s for everything, fail or not, and my first pass will likely be specific to individual modules, not pulled out to reusable code. Will have to work on that.

2015/08/28

But Wait! There's More! Extendable DRY API Code

The previous post wasn't built in a vacuum.

I was chasing down ideas for a purpose. I've been building JSON APIs for the lab for quite some time, each with as much knowledge about them as I had at that time, which means I have a number of bad APIs out there, and I'm in a position now where I'd like to unify them into either task-specific or access-level-specific groups.

That last piece of code was me finally able to take the lesson I learned from perlbrew about the symbol table being usable as a dispatch table. But it had a problem.

That problem was that, to use it, I would have to use or require whatever module I want to access into that one module. This is a problem because then I'd need a different API.pm for each version of the API I wanted to make. This is not good.

I remembered code that I used to use to download Dilbert comics. I subclassed HTML::Parser, adding functions that were specific to the task of grabbing images. Which is exactly what I do here.
#!/usr/bin/env perl
# the application code

use feature qw{ say } ;
use strict ;
use warnings ;
use lib ;

my $api = API->new( @ARGV ) ;
$api->run() ;

package API ;
use base 'API_Base' ;
use lib ;
use API_PED ;

Above, I make a version of API that includes API_PED.

API_Base is conceptually very perlbrew-influenced, but I totally dropped the "I think you mean" because this is a JSON API, not a command-line program.
package API_Base ;
use feature qw{ say } ;
use strict ;
use warnings ;
use CGI ;
use Getopt::Long ;
use Data::Dumper ;
use JSON ;

# Yes, I still use CGI. I kick it old-school.

sub new {
    my ( $class, @argv ) = @_ ;
    my $self ;
    my $cgi = CGI->new() ;
    %{ $self->{param} } = map { $_ => $cgi->param($_) } $cgi->param() ;
    ( undef, @{ $self->{pathinfo} } ) = split m{/}, $cgi->path_info() ;
    return bless $self, $class ;
    }

sub run {
    my ($self) = @_ ;
    $self->run_command( $self->{pathinfo}, $self->{param} ) ;
    }

sub run_command {
    my ( $self, $pathinfo, $param ) = @_ ;
    my $command = $pathinfo->[0] || 'test' ;
    my $s = $self->can("api_$command") ;
    unless ($s) {
        $command =~ y/-/_/ ;
        $s = $self->can("api_$command") ;
        }
    unless ($s) {
        $self->fail( $pathinfo, $param ) ;
        exit ;
        }
    unless ( 'CODE' eq ref $s ) { $self->fail( $pathinfo, $param ) }
    $self->$s( $pathinfo, $param ) ;
    }

sub fail {
    my ( $self, $pathinfo, $param ) = @_ ;
    say 'content-type: application/json' ;
    say '' ;
    say encode_json {
        status   => 'fail',
        param    => $param,
        pathinfo => $pathinfo
        } ;
    print STDERR 'INAPPROPRIATE USAGE: '
        . 'desired path = '
        . ( join '/', '',@$pathinfo ) ;
    }

sub api_test {
    my ( $self, $pathinfo, $param ) = @_ ;
    say 'content-type: application/json' ;
    say '' ;
    say encode_json { status => 1, param => $param, pathinfo => $pathinfo } ;
    }
1 ;

We go into the symbol table twice. one to export api_PED, which would become api.cgi/PED, and one to make subroutines named ped_* into a dispatch table, allowing api.cgi/PED/test, api/PED/mail and api.cgi/PED/lookup.
package API_PED ;
use feature qw{ say } ;
use warnings ;

use Exporter qw{import} ;
use JSON ;

# PED is our campus LDAP server
use lib '/depot/gcore/apps/lib/' ;
use PED qw{ purdue_ldap_lookup } ;

our @EXPORT ;
for my $entry ( keys %API_PED:: ) {
    next if $entry !~ /^api_/ ;
    push @EXPORT, $entry ;
    }

# The goal here is to do as much as we can without repetition
# we export api_PED so that API accept this as a command without
# having to write it into API.pm

# api_PED checks for any subroutine starting with 'ped_' and
# runs it

# so in essence, exporting a sub starting with api_ adds it to the
# API dispatch table, and writing a sub starting with ped_ adds it
# to this module's dispatch tableee 

sub api_PED {
    my ( $self, $pathinfo, $param ) = @_ ;
    my %commands ;
    shift @$pathinfo ;
    foreach my $entry ( keys %API_PED:: ) {
        next if $entry !~ /^ped_/ ;
        $commands{$entry} = 1 ;
        }
    my $sub_name = shift @$pathinfo ;
    my $command  = 'ped_' . $sub_name ;
    if ( $commands{$command} ) {
        my $full = join '::', 'API_PED', $command ;
        &{$full}( $pathinfo, $param ) ;
        exit ;
        }
    else {
        say 'content-type: application/json' ;
        say '' ;
        say encode_json { c => \%commands, p => $pathinfo, e => $command } ;
        }
    }

sub ped_test {
    my ( $pathinfo, $param ) = @_ ;
    say 'content-type: application/json' ;
    say '' ;
    say encode_json { result => 1 } ;
    exit ;
    }

sub ped_mail {
    my ( $pathinfo, $param ) = @_ ;
    my $name   = $pathinfo->[0] ;
    my $lookup = purdue_ldap_lookup($name) ;
    say 'content-type: application/json' ;
    say '' ;
    say encode_json {
        status => ( scalar keys %$lookup ? 1 : 0 ),
        mail => $lookup->{mail},
        } ;
    exit ;
    }

sub ped_lookup {
    my ( $pathinfo, $param ) = @_ ;
    my $name   = $pathinfo->[0] ;
    my $lookup = purdue_ldap_lookup($name) ;
    say 'content-type: application/json' ;
    say '' ;
    say encode_json {
        status => ( scalar keys %$lookup ? 1 : 0 ),
        lookup => $lookup,
        } ;
    exit ;
    }
1 ;

I'm not the happiest. Each sub handles encoding the output itself. I normally use JSON, but I could imagine exporting JSONP, XML, CSV or something else, and I could imagine passing back the data and an indication as to how it should be handled. I think I might have that somewhere, like I had the base code in a web comic reader from the late 1990s.

To sum up, I'm in the middle panel of this:

Three Panel Soul. I think the artist turned this into a t-shirt.

Filling a Dispatch Table For Ease of Development and DRY

I hate magic.

Don't get me wrong. If the night's entertainment is going to see Penn & Teller, I'll enjoy myself like anyone else, but when I have to work with it, I hate magic.

I hate magic because there's an implicit promise of "this will always work", and when it doesn't work, because eventually it won't, I won't know how to fix it because I was never told how it works in the first place.

On the other hand, I hate repeating myself.

Consider this code:

package Me ;
use Exporter qw{import} ;
our @EXPORT =  qw{add} ;

# adds two variables
sub add {
    my ($i,$j)=@_;
    return $i + $j ;
    }
1;

It was recently suggested to me that documentation like the above is essentially code duplication, and if I made substantial changes, assuming a more complex concept than addition, then the comments are out of date. And because writing in code and writing in English are different, switching between the two can be taxing, so developers often say "I'll get to it later" and forget.

But there's another point of code duplication.

Not only are we saying add() is a subroutine with sub add(), we're saying it with our @EXPORT = qw{add}. I don't want that code repetition. So, what can we do to make it better?

Look at the symbol table!

package Me ;
use Exporter qw{import} ;
our @EXPORT ;
for my $entry ( keys %Me:: ) {
    next if $entry !~ /^my_/ ;
    push @EXPORT, $entry ; 
    }

# we do our own addition because I don't trust the built-in
sub my_add {
    my ($i,$j)=@_;
    return $i + $j ;
    }
1;

I've changed the documentation to why I've done such a silly thing as re-implemented add(), given add() a name much more likely to avoid namespace collisions, and made @EXPORT auto-filling. If I wanted to created my own subtraction, I'd just write my_sub() and away we'd go. Additionally, if I wanted a non-exported subroutine log_usage() to add to these subroutines and see how often my code is used, I could just write and use that and it won't be exported.

But, because I see how @EXPORT is filled, it will be apparent when I return to maintain this code how that works, so something else will be the biggest "WTF?" in the code base. No magic. No Three of Clubs. No Jack of Spades spitting cider into my ear. More than likely, it'd be "Why did I roll my own buggy addition? What was I even thinking?"

Which is good. Because, while I love clever, I hate magic.

2015/08/26

Now I Am A C# Developer

It started with Instagram.

I wanted to be able to have my most recent Instagram picture become my background.

For my Android, tablet, that was easy. I just used IFTTT to connect the Instagram channel to the Android Device channel and that was that.

I wrote a Perl module, which I should put on GitHub and maybe then CPAN one of these days, that interacts with the Instagram API, and wrote a program that downloads the latest and uses gsettings to set it as background. That was easy, and I suppose I could use Strawberry Perl to make it all work on Win32, but I thought "Hey, Microsoft just told me that Visual Studio Community was free to use, and I just moved my laptop to Win10, so why not learn a thing?"

So, I learned a thing. I have used Visual Studio to write four C# programs:

  • Hello World, because you need to learn how to do I/O and get the compiler to run
  • Iterative Fibonacci, because you need to understand how to use control structures in a language
  • Recursive Fibonacci, because you need to understand how to use subroutines in a language
  • SetBackgroundImage, which used inputs and SystemParametersInfo to allow me to type a path to an image and set that as my background.
Before: picture of my whiteboard

After: another picture of my whiteboard
The only part of the source that I had more than minor interaction with is Program.cs, and that isn't much. I'm not a fan of default bracing style, having my own preferences, but knowing that Ctrl-K Ctrl-D (I think) will tidy whatever I write into correct formatting is a good thing.

I'm a vi man, so living la vida IDE is a stretch for me. I had to Google to find out how to turn on line numbers, but to be fair, I was not born with the knowledge of :se nu, either, so that's fair. I generally use Windows machines that give me a browser and PuTTY, which let me connect to real computers, so this is me learning how Windows does Windows.

Anyway, that's my code, up there on GitHub. If you have suggestions for program #5, I'm all ears.

2015/08/21

Perl Module for interacting with Globus

Ever heard of Globus?

No?

I'm kinda not surprised. It's a bit of an obscure thing.

Imagine you have research data. A collaborator wants to analyse it. Thing is, you're where you are and the collaborator is on the other side of the continent. There are many ways to move things, but many of them involve you having to be fairly technically sophisticated in order to make them work. Or your collaborator, who is an expert in your domain but not necessarily in file transfer. Plus, there's permissions and security and all that.

Or, you can use Globus.

We're changing how we handle data sharing here in the lab. We just moved file systems, moving to one that supports access control lists (ACLs), which means that, instead of playing games with symbolic links in order to share a researcher's work, we could simply setfacl -R -m 'u:user:r-x' and go from there.

That was fun, but because an ACL has to be set for every file, changes take a long time, so testing takes a long time, and updates take a long time. But it means that local researchers have access on the Big Iron.

But not everyone we want to share with is local. Some are on the other side of the continent. Thus Globus.

And, soon, Net::Globus.

It uses their first API, which is exposed via SSH. This is not my preferred interface, but I was able to get it to do the subset I needed fast enough, and I'll go back and make a thing work with their REST API in my copious free time. I mean, I could just use the Python API, right?

If you were to take my Globus.pm and copy it into your ~/lib directory, and you had the right stuff set up with SSH keys, it'd work. For the subset that isn't stubtastic, it's there.

I started of this project with dzil new, which is as much of Dist::Zilla as I know how to use. (I didn't start this paragraph intending an Elvis Costello reference, but there you go.) I know how to use git, how to make pull requests and merge them. My contribution graph is more white than green, but GitHub is not integral to my workflow, containing more my toys than my work. So, I'm pretty close to OK for the managing of a repo, but the steps toward getting it onto CPAN will involve me reading and asking a lot of stupid questions irc.perl.org.

Plus, the tests. Oh, man, the tests....

Perl is good because Perl is tested, and the tests I know how to write are tests I consider stupid. So, I expect to spend some time in chromatic's book of testing and in the tests for Net::Twitter and the like, trying to figure out what needs to be tested and how to write them.

So, I'm not ready to put it up to CPAN yet, but I see the path, because I finally have a module I could imagine someone else needing.

So, if this is a thing you think you have a use for, try it. If you think I did something stupid, I'm sure I did. Suggestions are great; pull requests are better. And thanks.

2015/08/19

I Need To Write This Up To Understand It, or The Epic Battle between Wat and Derp!

I've signed up for Neil Bowers' Pull Request Challenge, as I've blogged about before.

This month, I'm working with System::Command. It's neat, a step up from system calls, and as I understand it, integral to handling Git programmically, within Git::Repository. If I wasn't elbow-deep, I might try to use it as part of my Net::Globus project.

But the challenge isn't to use a module, it's to commit changes to a project. I wrote book, and he suggested writing a test to catch a bug that shows up in Windows. There's lots of "what even?" that's been going along with this, the great majority has a lot to do with the difference between coding for your self and your lab and coding for a community. I mean, I really do not get what's going on here, and you know that old joke? "Doctor, it hurts when I do this!" "Don't do that, then!" I keep thinking "If this is breaking your code, do something else!"

But, that sort of thinking doesn't get your code merged. So, I'm proceeding, my mind becoming the ring for an epic cage match between Wat and Derp.

One of the points of Derp (thanks #win32 on irc.perl.org) is that warnings are not exceptions. Since the error as given has a distressingly consistent ability to lock up Powershell session, I hadn't thought of that. It's lead me to think about cutting back to the least amount of code you can include to get a behavior you want to test for. In this case, it isn't just giving the wrong result, it's kicking up a lot of errors. Or, rather, warnings.


Among the questions on the $64,000 Pyramid are:

  • "Why does Git Bash shell show the IPC::Run::Win32Pump errors, after the next prompt shows up, while Powershell doesn't?" 
  • "Is that significant?"
  • "What, then, is killing the great Powershells of my laptop?"
  • "How do you trap that?" 
  • "Isn't that more a test issue or a program issue?"
  • "How, then, do you turn this mess into a meaningful test?"
That last one is what I'll be sleeping on tonight. On the one hand, turn this into a .t file and any warning, any at all, would be enough to trigger the test, as it's demonstrated that no errors occur in Linux. On the other, a test that says "Yeah, this kicks up errors" seems less-than-helpful, so specifying the GEN7 and GEN11 seems like the kind of thing that could break on another system.

I suppose I could write both a general and specific error.

Any comments, either supportive ("Yeah, that's exactly the kind of test you need!"), inquisitive ("Can I use BEGIN blocks in my other work?") or dismissive ("_That_ is a stupid way to test this out!") will be read and learned from. 

Meanwhile, I wrote this after my bedtime, so good night.

2015/08/17

I Am Unlikely To Move To Perl6

A camel in Busch Gardens in St. Louis.
I'm working on a project, which I will blog otherwise. This project is my hope to get into CPAN and level up my Perl game. I am still in progress, so it sits in the earliest stages of construction, with the bare minimum calling dzil new Foo::Bar will give you.

A friend asked me to rewrite it in Perl6.

He has also started putting out challenges, saying "Do this task in Perl5 without modules and I'll do it with Perl6.

I have several problems here. This project started with code I wrote to do something for work, and first and foremost, it had to do the work thing. That one part of the code could be useful to others is a perk. We're a Perl5 shop, so spending more than a few moments trying to do this task in Perl6 is just as counterproductive as trying to do it in Python, in Haskell, in Visual Basic or in x86 assembly. The library doesn't do everything, but it does everything I need, and the move from being a Perl developer to a CPAN developer is steep enough, thanks.

As for the challenges, I find them fundamentally unjust. If you have language A and language B, where feature X is available built into A but exists as a module in B, a module that comes with the core distribution of B, and this might not be a thing you use in most programs, then, to me, A puts unnecessary code into memory, while B is respectful of your requirements by not including things you don't need. My friend thinks the inclusion of feature X into A is a thing to be celebrated, a selling point he's trying to use to build excitement, while I regard it as a stacked deck.

But even as I write that, and intellectually agree with the logic I created, it isn't my fundamental objection. I could probably make a reverse argument if the tables were turned, and in the age where we have multiple gigabytes of RAM, and machines I have access to have a quarter-terabyte available, being so parsimonious with memory is out of style. As the saying goes, you cannot reason a person out of a position they didn't reason themselves into.

I started out as a Computer Science student, getting a second degree because my Journalism degree gave me no job. My school taught C as a subset of C++, using gcc on Solaris systems. This involved the use of stdio.h, and strings as arrays of characters. Intellectually, I get that it still is, but as a practical matter, it was not easy and it was not fun.

I worked helping to maintain the websites for a part of the school, and at first wrote HTML and documents in HTML, and also maintained the documentation library for the system administrators. Physical library. This is where I learned to love O'Reilly and Amazon. And, as I learned to program by learning CGI by learning Perl, where I learned that dealing with strings as scalars instead of arrays pointers to arrays of characters was a lot easier. This very much was where I learned to love Perl5.

This was an age of underpowered computers, and the computers I worked with were, in general, more underpowered than most. So weak, actually, that they could not run a web browser, which makes working as a web developer difficult. This is why I learned about X redirection. And, the minute I added any Javascript code to do anything on a page, this meant that the pages became unusable on the (for the time) beefy servers I was running the browsers on. I had reasoned myself into the position "pages + Javascript = slow and unusable".

Two things changed. The first was Time + Moore's Law = better computers. The second thing was the discovery of XMLHttpRequest, aka XHR or AJAX. This meant that Javascript was capable of more than display hacks, that you could do actual things with it. So, I grew to like Javascript, almost but not quite to the extent I like Perl.

A few years later, I had graduated and been hired by a local medical clinic. This was more IT than development, and I should've left early than I did, but while I was there, many of my friends had started to move to Python.

There was a computer lab where I had spent a great amount of time, and someone I knew had code that would identify who was at which SPARC5 machine, and which machines were empty, so you could know if it was worth your time to even go to that part of campus. I didn't need that information anymore, but what I needed was working Python code so I could learn how the language worked.

However, the program was indented with tabs. Python, for those unfamiliar with the Dynamic Language Wars, is a language where control structures are determined by indentation, instead of brackets, like most C-related languages. "Significant White Space" is the Ypres of the war between Perl and Python, the point where neither side will give ground.

The problem with tabs is that they tend to hide spaces. Which is to say, unless you have your editor set to show you the difference, SpaceTab looks exactly like Tab. This was exactly the problem that I found myself having. I was expecting, even wanting, to learn how if statements and for loops look in Python, and instead I found myself having to learn how Python error messages work.

The argument for white space is that, if you allow code to look like a one-line hairball, coders will make it look like a one-line hairball, but if your language design enforces coding standards, you will have readable code. I like reading readable code, and, as I said, I came from a Journalism background where I learned the rules to make printed documents more readable. It should be easy to reason me into appreciating Python, but having spent years learning Perl made me defensive, and spending hours struggling with with code entrenched my disgust. I viscerally dislike Python. I use it; my FitBit code is written in it because I found working through OAuth in Perl to be difficult. I'm willing to suck it up when necessary. But it is not my first choice of languages, and likely will never be, for entirely emotional reasons.

When I first started hearing about Perl6, I liked the idea of it. I wanted it to succeed. Perl is complex, they said. Only Perl can parse Perl, which isn't good. To hack Perl, to improve the base language, you need to know a bunch of domain-specific hackery, which means that the number of people who can make meaningful changes to the language is small and getting smaller. If we back up and start again, we can remove a bunch of cruft. This is fine.

And, since we're starting from scratch, we'll identify things we want, things that'll make the language better, and build those in. Logically, this works.

The brains of the Perl community went to work on Perl6, on the various platforms they tried to build it on. I can't create a strong argument against any individual technical decision. Use a VM to make it portable across many languages? Sure. Make our own VM system so we aren't reliant on someone else with interests unaligned with our own? Yeah, that's probably smart. But taken as a whole, that's a lot of things you need to do, with a significant chance of failure. The charitable might take it as a gamble worth taking, as a thing worth trying, even if it doesn't work right.

Not everyone is that charitable.

Meanwhile, those of us trying to use the language to do things to do things were left with, basically, a dead language. That is, until a bunch of us started to say "No!" We decided that, as we can, the good things about Perl6 will be taken and backported to Perl5, and that, while the world may identify the moment Jon Orwant's coffee mug hit the wall as the last moment they gave a shit about Perl, we have an enormous collection of pre-existing wheels called CPAN, which is considered the standard by which other languages' library systems are judged and found wanting.

So, to me, Perl5 is the language that saved me from C-type strings, the language that opened up computing for me and that has provided the majority of my paychecks for well over a decade. Perl6 is the language that has made me feel like a laughingstock for a good portion of that time. Perl5 is the a language that supported me, and Perl6 is the language that betrayed me.

Intellectually, I see that Perl6 has a real release date, that we're one community, that Perl6 isn't just a skunkworks for features to pull into Perl5. But I do not foresee my current workplace moving from Perl5 to Perl6, and I do not expect most place I might get hired into to be enthusiastic in embracing the change either.

I'm willing to change my mind. This is me trying to work through my issues enough to be open to Perl6. But it remains to be seen if I can change my heart.

2015/08/01

What Language Should I Learn? Three Answers

A friend of mine, who works in IT but is not a developer, asked me a question during lunch today.

"I want to learn to program. What language should I learn first?"

This is a common question, one I have answered before. But, because I've blogged a lot recently, I've decided to write it up and post here.

I told him I have three answers.

The first is somewhat sarcastic. "I'm going to Europe. What language should I learn?" There just is not enough information to really answer that question, because many languages are very context-specific. If you're hoping to get into programming apps for the iPhone, your best choice is Objective C. If you want to code for Arduino microcontrollers, you'll want to start with the Arduino IDE and it's very C-like language. And, of course, there's JavaScript, your only choice on web browsers.

Where you want to go determines what you should learn.

But there's more to it than that.

There's a thing called the Church-Turing Theory, which states that any real-world calculation can be computed. Turing postulated using a Turing Machine, while Church referenced the Lambda calculus.

We get to a concept called Turing Completeness. A thing that can be computed in one Turing-Complete machine can be simulated in another Turing-Complete machine. The first real use of this was the creation of compilers, of higher-level languages that developers can use which compile to machine code that the hardware itself can run. What it means, for those learning, is that it doesn't really matter what language you learn, that anything one language does can be done by another language.

So the second answer is, Alan Turing would tell you it just doesn't matter which language you choose, that what you do and learn in one language can be simulated or applied in another language. So it doesn't really matter which you choose.

When Jeff Atwood of Coding Horror coined Atwood's Law -- any application that can be written in JavaScript, will eventually be written in JavaScript -- he didn't know the half of it. He knew that graphical applications were starting to be done within web browsers, like Gmail, He didn't know that web server applications and even command-line applications could be written in JavaScript via Node.js. He didn't know that a framework for creating cross-platform mobile applications using web technologies including JavaScript called Cordova would come along. He didn't know that Microsoft would allow developers to create Windows applications using HTML and JavaScript. He didn't know that Open Source microcontrollers such as Arduino would be developed, and frameworks such as Johnny Five would come to allow you to develop Internet of Things projects and even robots with Javascript. It might be a bit more complex to set it up to do these things with JavaScript, but they are possible.

Plus, if your code plans are more functional and computer-theoretical, you'd be glad to know that JavaScript is a Lisp.

If you want to code Objective-C, you need a Mac and the Apple development tools. If you want to code C#, you'll need to install Visual Studio tools from Microsoft (or Mono on Linux). If you want to code JavaScript, you'll need a text editor (and one comes with your computer, I promise) and a web browser (and one comes with your computer, I promise), plus there are places like CodeBin where you can enter your code into the browser itself .

If you're going to be writing an operating system, device drivers, you will want something that compiles to native machine code. If you're looking to get into a specific project, you'll want to know the language of that project. But the corners of the development landscape where JavaScript is the wrong choice are small and shrinking. So, the third answer is, it might as well be JavaScript.

This rubs me a bit wrong. I've set my rep as a Perl Monger, and I always feel like that's where you should start. But while my heart feels that, my mind argues the above, that the greater forces of modern computing are pushing to give JavaScript a front-row seat in the language arena.

But I'm willing to be wrong, and if I am, I want to know. Where am I wrong? What would you tell someone wanting to learn to program?

2015/07/29

Justifying My Existence: Indent Style

I just got dinked for my indent style on StackOverflow.

I wrote an answer to someone's question, even as I figured there are much better ways to handle his greater issue than store everything in text files like that.

Nearly immediately, I see this: For uncuddled elses the practice is to let the closing block brace } be vertically aligned with the following elsif.

I remember when I started coding oh so many years ago. I remember looking at GNU style and not liking it.

if ($boolean)
{
    ...
}

"You've disconnected the beginning brace from the conditional", I thought. "I can't code like that."

The other primary style people talk about is K&R.

if ($boolean) {
    ...
}

"Better", I thought. "The beginning of the block is connected to the conditional, so that's good. But the end. The brace at the end of the block won't tell you it's connected to the block at all. Nope."

It's about the readability. The part that's in the main part of the code is the if statement. The block should be separate from the sounding code, and this style (I'm told it's Ratliff style) is what I've been using since.

if ($boolean) {
    ...
    }

My first degree was in journalism, and part of how I proved myself in my first computing jobs is making the large text blocks of the early web attractive and readable. At least, as far as early web browsers let you. And, while I am a vocal Python hater, and a hater of significant white space in programming languages in general, by-the-books Python is close to how I like to format code. (Just be sure to tell your editor that \t is the devil's work and should be shunned.)

Below is my .perltidyrc. I believe I first started using that tool soon after I read Perl Best Practices by Damian Conway. Ironically, perhaps, I moved to the long form because I found the terse form found in PBP to be unreadable and uneditable, anyway.


If you have a problem with my code alignment, perltidy exists. Use it.

I'd rather offend you with substance than with style, anyway.

2015/07/28

Threads Unspooling, or "What's my problem NOW?"

I have never done much with access control lists (or ACLs), as most of my time as a Linux and Unix user has been in positions where everything needed to control access could be done with standard unix permissions: owner/group/all and read/write/execute.

Also, most of the file systems were not set up to support them, which makes the barrier to entry enough that I never got beyond "I wonder how ACLs work".

I work with genomics data on the top university supercomputing cluster in the US, and we generate lots of data for lots of users, and we had been doing some ugly hacks to share data with our users, but with the new file system, we have ACLs, which makes it as easy as setfacl -R -m "u:username:r-x" /path/to/research.

ACLs are not actually my problem.

The length of time it takes to set ACLs on a large data set is my problem.

Running the tool to set everything takes five minutes. With a subset of our total data. Which is only going to get bigger. If we're talking about a daily "get everything back to proper shape", that's well within bounds. If it's something a user is supposed to run, then no.

So, I'm looking into threads, and I can set all my ACLs in parallel using Parallel::ForkManager, and while I'm not sure threads are the asynchronous solution for Modern Perl, they work and I can get a number of directories getting their ACLs recursively set at once.

Sometimes, however, because machines go down or NFS mounts get hosed or you kill a process just to watch it die, the setting process gets interrupted. Or, you do more work and generate more data, and that goes into a subdirectory. Then, the ACLs at the top of the directory tree may be correct, but the deep nodes will be wrong, and it's best to not wait until the end-of-the day "set everything" process to get your bits in order.

So you want to set a flag. If the flag is set, you do it all. And when I try to set flags in the threaded version, I get an error.

Threads are not actually my problem.

I have the threads in the database, which makes both the incomplete-pass and the add-new-data options equally easy to handle. And, to make databases easier to handle, I have a module I call oDB which handles database access so I don't have to worry about correctness or having passwords in plaintext in my code. It uses another module I wrote, called MyDB, to connect to MySQL in the first place. I share the gist above, but I cut to the chase below.

my $_dbh ;               # Save the handle.

sub db_connect {
    my ( $param_ptr, $attr_ptr ) = @_ ;
    my $port = '3306' ;

    # ...

    if ( defined $_dbh
        && ( !defined $param_ptr || $param_ptr eq '' ) ) {
        return $_dbh ;
        }

    # ...

    if ( defined $_dbh && $new_db_params eq $_db_params ) {
        return $_dbh ;
        }

    # ...

    $_dbh = DBI->connect( 
        $source, 
        $params{ user }, 
        $params{ password }, \%attr )
        or croak $DBI::errstr ;

    return $_dbh ;
    }    # End of db_connect


Essentially , the "right thing" in this case is to generate a new DB handle each and every time, and my code is doing everything in it's power to avoid creating a new DB handle.

My problem is that I didn't write this as thread-safe. Because doing so was the furthest thing from my mind.

My problem is a failure of imagination.

2015/07/27

Things I learned for perlbrew: Config

Config.

Mostly, I haven't developed for Perl, I've developed for the perl on this machine. Sometimes, my perl on this machine, with this set of modules.

With perlbrew, you're starting with whatever Perl is available, and sometimes, upgrading the perl you're using. So, it's good to know something about that perl, isn't it?

So, how do we do that?

Config.

Look in there and you get api_revision, api_version and api_subversion, which allows you to know which version you are running.

Which makes me think that there are options here, if you're deploying software to places where they're not using the most recent perl.

In Javascript, they have a concept of polyfills, so that, if your page is loaded on an older browser with no HTML5 support, you can install a module that gives your old browser the capabilities it needs to do that.

Now, honestly, that seems a nicer way to handle things than
use 5.14 ; # Here's a nickel, buy yourself a Modern Perl
Doesn't it?

# pseudocode ahead. I'm just thinking through this
# polyfill_say.pl
use Config ;

use lib '$HOME/lib/Tools' ;

BEGIN {

if ( $Config{ 'api_revision' } < 5 || 
     $Config{ 'api_revision' } == 5 && $Config{ 'api_version' } < 10 ) {
        require Tools::JankySayReplacement qw{ say };
    }
}

say 'OK' ;

Of course there's perlbrew, plenv and just putting a modern perl in /opt/bin/perl or ~/bin/perl and being done with it. Just because I'm jazzed by an idea, that doesn't mean it's a good idea. But aren't a lot of the new cool things in Perl 5 just polyfilled back from Perl 6 these days? Shouldn't we be as nice to those stuck in 5.old as the Perl 6 people are to us?

Anyway, Config. It is in Core and it is cool.

2015/07/26

What I learned from perlbrew

I signed up for Neil Bowers' CPAN Pull Request Challenge, and the first module I got was App::perlbrew. After some looking and guessing, gugod pointed me to one of his problems, and after some time reading and understanding how things work, I got it done.

It took me a while to figure out how it worked. I had seen and used something like it — I had found out about dispatch tables from my local friendly neighborhood Perl Mongers — and I have started to use old-school Perl object orientation on occasion, but this combined them in a very interesting way.

A lot of the clever, however, isn't where I thought it was, which I didn't realize until now. The symbol-table manipulation isn't about making the commands work, but rather guessing what you meant if you give a command it can't handle. The "magic" is all about $s = $self->can($command) and $self->$s(@$args).

I wrote a quick stub of an application that would show off how to this works, with lots of comments that are meant to explain what's meant to happen instead of how it's supposed to work, as "Most comments in code are in fact a pernicious form of code duplication".

If you try symtest.pl foo, it will print 1 and foo. If you try symtest.pl food, it'll just print 1. If you instead try symtest.pl fod, it'll print "unknown command" and suggest foo and food as alternate suggestions. Like a boss.

One of the coolest things, I think is that you can put your user-facing methods in a second module. Or, perhaps I just have a low threshold for cool.

If you have questions about the code, or understand the things I handwave and know you can do better, please comment below.


2015/07/11

Interview-Style Coding Problem: Estimate Pi

Saw this as an example of the kind of programming test you get in interviews, so I decided to give it a try.

Just to report, it gets there at $i = 130657.

#!/usr/bin/env perl

use feature qw{ say  } ;
use strict ;
use warnings ;
use utf8 ;

# Given that Pi can be estimated using the function 
#   4 * (1 – 1/3 + 1/5 – 1/7 + …) 
# with more terms giving greater accuracy, 
# write a function that calculates Pi 
# to an accuracy of 5 decimal places.

my $pi = '3.14159' ;

my $c ;
for my $i ( 0..1_000_000 ) {
    my $j = 2 * $i + 1 ;
    if ( $i % 2 == 1 ) { $c -= 1 / $j  ; }
    else { $c += 1 / $j ; }
    my $p = 4 * $c ;
    my $p2 = sprintf '%.05f' , $p ;
    say join ' ' , $i , $pi , $p2 , $p  ;
    exit if $p2 eq $pi ;
    }

2015/07/08

Because everything can jump the rails...


I will have to do a write up. (While you wait for me, read Net::OpenSSH on MetaCPAN and know the key is keypath.) The thing to remember is that this means I can write complex programs that connect to other machines while I'm not there.

I've been able to do similar things with bash scripts for a while, but there's a complexity you can only get once you step away from a shell and move to a language.

That complexity has consequences. If you've never written a thing that went out of control and had unexpected destructive consequences, you're not a programmer. I'd go as far as to say that everyone has written rm -rf foo. * instead of rm -rf foo.* at least once.

This is why computer people strive to be very detail oriented. We wanted remove all the things, be they file or directory, if they start with the string "foo.", not REMOVE ALL THE THINGS!!! BUT GET THAT 'foo.' THING ESPECIALLY!!!! The stereotypical geek response starts with "Well, actually...", because "Well, actually, there's a space in there that'll ruin everyone's day" keeps everyone from spending the next few hours pulling everything off tape backup, or potentially never having those pictures from your wedding ever again.

One of the arguments toward "AI means we're doomed" is that of the stamp collector. Watch the Computerphile video, but the collector wants stamps and tells his AI "I want more stamps to fill out my collection". This is clearly a general statement, conversationally a wildcard, and the AI can take this several different ways, going from going to eBay and buying a few things with your credit card to hacking several printing presses and printing billions and billions of stamps, and to harvesting living beings to be turned into paper ink and glue.

I have a response to this thought experiment, but a part of my problem that I didn't get into is that deleting all your files is easy, spending all your money on eBay is slightly harder, but controlling things on another computer is far more difficult. If you have an open API on a machine, all I can do is things that the API lets me do, and if you have a delete everything option, you've probably done it wrong. (Or you're a Snowdenesque paranoid hacker, in which case, you know what you're doing and that's fine.)

Which brings us back to Net::OpenSSH. The first step is "connect to that server", and once you realize it's going to prompt you for a password, the second step becomes "hard code your password to make it work" and the quick follow up is "Use config files or enable ssh keys or anything that allows you to not have your password in clear text, you idiot!"

Because, with an SSH shell controlled by a program, you grant the program permissions to do every command you're capable of on that system, and for many systems, you have the ability to be very destructive.

And I have that between a number of systems, because I'm trying to make a thing work that has SSH not AJAX and JSON as the API and need to know it works outside of that. I do know, however, that it means I have the capability to run code on another machine.

Which I'm not necessarily logged on and not necessarily monitoring.

Where I'm not the admin, nor the sole user.

Where I can ruin the days of myself and many many others.

So while I code, I feel the same fear I feel while standing in line for that rickety-looking wooden roller coaster at an amusement park. 

2015/07/01

Unstuck in Time and Space: An Investigation into Location over WiFi.

I track my location with Google and my phone, because I lack sufficient paranoia. To the right is my June 30.

I swear that I didn't leave the Greater Lafayette area. I certainly didn't teleport to the southern suburbs of Indianapolis.

This happens to me all the time, and it has bugged me a lot. But, normally I've just looked and griped, rather than trying to work it out.

Today, however, I'm watching a compiler or two, so I have some time I can use to work this out.

The protocol is KML, and this is what it looks like:

That isn't all day's results, merely the point in time I jumped 67 miles to the southeast. I was going to try to use a KML-specific Perl module, but found that the ones I could find were more about generating it than parsing it, and it's XML, so I figured what the heck.

I had previous code to work out the distance between two points, so it was an easy case of parsing to find the jump:

Breaking it down, at 2015-06-30T13:13:05.103-07:00 I go 67 miles to Greenwood, and at 2015-06-30T13:53:31.467-07:00 I pop back.

Let me bring up an other map.

 I didn't have any mapping software going, and I was using wifi, so this data is location via wifi not GPS. I know, though, that the group that runs my servers has a weekly "coffee break" on Tuesdays, that I met with my admin there, and I walked around toward his office before goign back to mine. His building is off S. Grant St., and I walked next to Hawkins Hall, in front of Pao Hall, near the Forestry Building and down to my office in Whistler.

So, question is, how does location over WiFi work? I recall hearing that routers and access points report location, but I'm not sure of the protocols involved. I can imagine two possible scenarios that cause this.

First is that one of Purdue's routers is misreporting location, either in Forestry or Pao. This is possible; I have another issue that I haven't worked through yet where I leap instantly to the EE building, and it seems that's entirely based on location within my building.

The second scenario, one I'm taking more and more seriously, is that there's a replicated MAC address or something between the apartments across from Pao Hall. I say "or something" because MAC addresses should be unique. The thing that makes me suspect this is that it points me to a residential neighborhood south of Indy, and I could see that mistake happening with two residential routers or two experimental electronics projects.

I'm curious about how to test this, because I do want to know it has something to do with Purdue's networking gear before I complain. I'm also curious about how these things actually work. I could very much see me walking around, looking at Google Maps and tripping over things, then trying to dump my ARP tables or something.

2015/06/29

Fixing an old logic issue

I am not especially proud of the code below.
It does it's job. Give it a request and a number of accessions and the names you want them to go by, and it changes them in the database.

Except...

Accessions are defined as zero-padded six-digit numbers, so instead of 99999, you'd have 099999. If you're strict, everything's fine.

But user's are not always strict. Sometimes they just put in 99999, expecting it to just work.

Oh, if only it were that easy.

I have requests here for the purpose of ensuring that for request 09999, you can only change accessions associated with that request. This is what lines 27-29 are for, to get the set of accessions that are entered by the user and one of the given request's accessions.

Yes, requests are defined as zero-padded five-digit numbers.

If I don't zero-pad the accessions, I get nothing in @accessions.

But if I do zero pad, I get no library name from $param->{ $acc }.

There is a fix for it. I could go back to the source and ensure that this sees no un-padded numbers. I could run through the $param hashref again, But clearly, this is something I should've built in at first.

2015/06/22

"Well, That Was Strange": Hunting Gremlins in SQL and Perl

The query base is 90 lines.

Depending on what it's used for, one specific entry or the whole lot, it has different endings, but the main body is 90 lines. There are 20 left joins in it.

It is an ugly thing.

So ugly, in fact, that am loath to include it here.

So ugly that I felt it necessary to comment and explain my use of joins.

This is where the trouble started.

I noticed it when I was running tests, getting the following error.


Clearly, it needed a bind variable, but something along the line blocked it.

I had this problem on Friday morning on our newer server, then it stopped. Honestly, it was such a fire-fighting day that I lost track of what was happening with it.

Then the module was put on the old server and the problem rose up again.

Whether that makes me Shatner or Lithgow
remains an exercise for the reader.
I said "my code has gremlins" and went home at 5pm.

When I got back to the lab this morning, I made different test scripts, each identical except for the hashbang. I set one for system Perl, which is 5.10, one for the one we hardcode into most of our web and cron uses, which is 5.16, and the one we have as env perl, currently 5.20.

The cooler solution would've been to have several versions of Perl installed with Perlbrew, then running perlbrew exec perl myprogram.pl instead, but I don't have Perlbrew installed on that system.

The error occurs with 5.10. It does not with 5.16 or 5.20.

And when I run it against a version without the comments in the query, it works everywhere.

I don't have any clue if the issue is with Perl 5.10 or with the version of DBI currently installed with 5.10, and I don't expect to. The old system is a Sun machine that was off support before I was hired in, and the admin for it reminds us each time we talk to him that it's only a matter of time before it falls and can no longer get up. I haven't worked off that machine for around two years, and this query's move to the old server is part of the move of certain services to the new machine.

And, as everything is fine with Perls 5.16 or higher, I must regard this as a solved problem except with legacy installs.

I know that MySQL accepts # as the comment character, but Sublime Text prefers to make -- mean SQL comments, so when I commented the query, I used the double-dash, and our solution is to remove the comments when deploying to the old server. It's a temporary solution, to be sure, but deploying to the old server is only temporary, too.

It's a sad and strange situation where the solution is to uncomment code, but here, that seems to be it.

Update: Matt S. Trout pushed me to check into the DBD::mysql versions, to see which versions corresponded to the error. The offending 5.10 perl used DBD::mysql v. 4.013, and looking at the DBD::mysql change log, I see bug #30033: Fixed handling of comments to allow comments that contain characters that might otherwise cause placeholder detection to not work properly. Matt suggests adding "use DBD::mysql 4.014;", which is more than reasonable.