Skip to main content


Benchmarking Asynchronous PHP vs NodeJS Properly

Posted in

I had fun this evening working on this with Samuel Reed. Performance and programming language choice is always a hotly debated topic and it's always fascinating the explore. It was truly interesting to see async php code. I know NodeJS is known for it, but PHP really isn't at all.

The actual article is here:

Analyzing the EIG (HostGator, BlueHost, HostMonster, JustHost) Outage

I just published an article looking at the impact of the major outage that occurred yesterday (August 2, 2013) when EIG's Provo, UT datacenter failed. I also predict what to expect based on previous major outages.

There was definitely a major spike in data produced and I got down to analyzing it.

Full Story:

National Day of Civic Hacking: Exploring Consumer Financial Protection Bureau Data

This weekend I participated in the National Day of Civic Hacking.

The project I decided to work on was working with the CFPB data (and also used some census data).


The CFPB released a large complaints database that contained information about what type of financial products people are complaining about. It also gave information about where the complaints came from, what they were complaining about and resolution information. Some of the data was released literally a day earlier. So I was given a chance to take a look at, analyze and visualize information that nobody has really seen yet.

It was an exciting and interesting opportunity. Since it was very fresh data with little to no previous work, much of what I got to do was more general analysis. I created a handful of graphics (click them to see full size) and maps (click to use the map) which I have included below:

What products are people complaining about the most?

The biggest product people are complaining about is mortgage related products. There is a category for other mortgages that people can choose and it seems most people seem to select that. I wasn't sure why until I looked at the issues people were having.

What are the most common issues people have?

Foreclosure, Loan Modification and Collection. Intuitively, this makes a lot of sense. I probably don't care what type of mortgage it is when they are trying to take my house away. This particular issue dwarfed everything else, so I had to use a log scale to even see what the other issues people were facing.

What are the most common issues people have? (log scaled)

This gives a more in-depth picture of the issues, but the first graphic really shows you the most common and/or pressing financial issue for people.

Which companies had the most complaints?

Then I explored which companies were receiving the most complaints. This data is NOT normalized. That means that just because a company has more complaints doesn't make it worse than one with less. For example, if company A had 10 complaints and 100 customers and company B had 5 complaints and 20 customers, company B would be worse (if we measured complaints as a % of customers). I didn't have easy access to a database with any dataset that would normalize these banks, so this is for curiosity more than any meaningful insight. It probably is a proxy for the largest players in consumer finance though.

[MAP] Where are the complaints coming from in the US? (Normalized for state population)

This map shows where people are complaining the most. DC won that dubious honor. Maryland, Delaware, New Hampshire, California and Florida were also high on the list. All the numbers were adjusted to reflect complaints relative to population of a state. So we can clearly see there are differences and we can probably make educated guesses for some of them. DC for example is probably the highest because of the highest awareness of CFPB (since it's based in DC). Maryland probably has a high awareness too. Florida and California had big real estate bubbles and perhaps were hit especially hard. New Hampshire and Delaware are a mystery to me, although a woman from New Hampshire at my event told me that she complained to the CFPB and told all her friends as well. Perhaps an above-average awareness of the CFPB caused the higher complaint rate.

Top 10 Companies by Complaints and their Disputed Resolution Rates

Next I explored disputed resolutions. Companies alert the CFPB when the matter is resolved and consumers are allowed to tell the CFPB they were not satisfied with the resolution. I graphed the top 10 companies by complaint volume and what their disputed resolution rates looked like. It's interesting to see such big differences between companies but without further information about how they handle disputes, it's impossible to say anything confidently comparing one company to another.

[MAP] Disputed Resolutions by State

Finally, I created a map graphing disputed resolutions by state. There was a surprisingly large variation between states. Alaska, for instance, had a disputed resolution rate of over 26% while Wyoming had a measly 16%. I have no idea why, but it's interesting and worth looking into further.


I had a lot of fun exploring this fresh set of data and there is a lot more to be learned from it. I want to give a big thanks to Ana from the CFPB and Logan from the Census Bureau who attended the event and helped participants navigate the data provided their respective organizations.

Small Programming Decisions that Expose More Information than Intended

Sequential Account Numbers + Affiliate Program = Financial Transparency


Accounts generally come in two flavors: name based and numerical. Name based systems use a text representation for an account (username). Numerical systems use an id number to identify accounts. (Note: these aren't mutually exclusive)

The affiliate program is nearly a staple of any online business these days. Companies give people a cut for referring them new customers that make a purchase. I have used dozens, if not hundreds of them. But what sort of information can be gleaned from these affiliate interfaces?

What's happening?
Some companies use sequential account numbers and their affiliate programs report the account id when you refer a sale.

Why would that matter?

It matters because I can measure your company's growth and revenue. It's pretty simple to take a few sales and calculate the time between them and see how many accounts were created on average per unit time. If the accounts started at 1, it's pretty easy to see how many customers have signed up too. If the company is selling one product, this pretty much gives away the keys to the castle in terms of the company's revenue. It's slightly more complex if there are different products and prices, but with enough data, you could create an estimated average sale value.

How is exposing this information problematic?

If someone, like a competitor or analyst, were trying to estimate or value your company, this would be a pretty simple (and possibly cheap) way to get that information. For a private company that doesn't want to give away their financials, this is a fairly direct way to get one of the key numbers (revenue).

This was just an interesting example of how little programming decisions might expose a lot more than you had planned. I bet there are many others that you may have encountered and I would love to hear of other seemingly correct programming decisions that might be wrong with more context.

PHP APC Performance Improvement

Posted in

I installed PHP APC today was simply shocked by some of the performance gains I was getting by it.

I had a problematic page that was calculating a lot of info (the same info) every time it loaded. Those were costly calculations.

APC gave me about 80ms faster load off the bat. But the performance was REALLY noticeable after a lot of users were hitting the site. (I used to do load testing)

It went from 2.5 second response time with 260 users to 400ms with 260 users. Without APC I had a 600ms load time with only 81 users.

The graphs below really speak for themselves.

TL;DR: caching = win.

No Caching Graph:
no apc

With Caching (APC)
with apc

Potentially Malicious Fake Advertiser using Wordpress Plugin (

It starts with an innocuous email:


I am sorry I have to write you to e-mail from whois information of the domain. But I could not find contact e-mail or feedback form on your site.
We are looking for new advertisement platforms and we are interested in your site %DOMAIN%
Is it possible to place banner on your site on a fee basis?

Best regards,
Nicolas Gauthier

But it quickly turned strange:


Thanks for reply to our proposal!
We like your price.We would like to place 160x600 banner.

To pass to the banner control system follow the link
To enter use the following data:

login: %DOMAIN%
password: %PASSWORD%

Client Side Tweet Parser in JavaScript (jQuery)

Posted in

I just published a simple JavaScript that helps websites comply with Twitter's Display Guidelines. It helps you comply with issues 2,3,4.

It automatically links to urls, hashtags and mentioned usernames. You simply set the div class to 'tweet' (or whatever you change it to in the code) and link to twitter.js. Make sure you also have jQuery(Only tested 1.6.4) loaded before twitter.js. It is a dependency.

Thanks go to Raphael Mudge who helped with some of the regular expressions.

Future improvements: this code isn't perfect by any means. One improvement would be merging the 3 primary functions into one and some conditions. I've also seen some very elegant server side solutions using almost entirely regular expressions from CodeIgniter (see code below).

function parse_tweet($tweet)
$search = array('|(http://[^ ]+)|', '/(^|[^a-z0-9_])@([a-z0-9_]+)/i', '/(^|[^a-z0-9_])#([a-z0-9_]+)/i');
$replace = array('$1', '$1@$2', '$1#$2');
$tweet = preg_replace($search, $replace, $tweet);

return $tweet;

How to Make Facebook Like Button on a Website Connect to a Facebook Page

I wanted a simple 'Like' button connected to my facebook fanpage. No stream, no faces, no counts, no nonsense. I couldn't find an easy and obvious way to do it.

I spent more time than I would care to admit trying to figure this out and it turns out to be trivial.

  1. Go to Like Button developer tool.
  2. Set 'URL to Like' to your facebook page (
  3. Set 'Width' to '50'
  4. Uncheck 'Send Button'
  5. Choose 'Button Count' for the Layout Style.
  6. Uncheck 'Show Faces'
  7. Get Code, choose iframe version.

The 50 width should cut off the like count box (you can make it bigger/smaller to make sure it works). Simple, easy, like button connected to facebook fanpage.

Screenshot below:

Firefox Inspector Bug (10.0.2)

Posted in

I wanted to report the bug to mozilla but it took me 15 minutes to even find their bugzilla and then it wanted too much of my time and to share my personal info (email address). So I will just post it here instead.

So basically the problem is if you double inspect, you can't escape the inspector and the close box (X) disappears.

Screen shots after the jump.

Click Tracking using JavaScript and Google Analytics - The Good and The Bad

Posted in

I ran into an interesting problem recently, I didn't want to change my links to some sort of URL forwarding script ( to track outbound clicks but I still wanted the outbound click data.

I thought it should be possible using JavaScript and the onClick() functionality. My first inclination was to just go to jquery and bind the click to a function that would simply make an ajax call to record the click data and then forward the user.

Upon further research I found out Google Analytics has this functionality and tracking already built in.

You simply include another javascript snippet:

<script type="text/javascript">
function recordOutboundLink(link, category, action) {
try {
var pageTracker=_gat._getTracker("UA-XXXXX-X");
pageTracker._trackEvent(category, action);
setTimeout('document.location = "' + link.href + '"', 100)

And then for any click you want to track you simply add:

onClick="recordOutboundLink(this, 'Outbound Links', '');return false;"

Outbound Links is the label for the link. It all shows up in the Event Tracking reports in Google Analytics. Simple.

Or so it seemed. I tried this method and some users complained that it broke things like middle click (open in new window) functionality. I am also not convinced it actually worked for every click, my click volume didn't seem to match on the links I did have access to stats on the outbound side for.

So I turned it off and canned the idea for now. I guess that's why facebook, twitter and everyone else seems to use link tracking scripts like l.php and

Syndicate content