But Ruby is bad for math
August 18, 2020
After building out the client parts of Preferr's API, proving they worked, and making them look pretty, it was time to figure out how to calculate some basic statistics for an A/B test.
Since Preferr is written with a Rails API as the backend, Ruby is the easy choice to do the work necessary to figure out statistical significance.
Right?
For a while, I banged my head on the desk trying to write a nice little Ruby method to calculate a p-value for an A/B test. It was incredibly hard. I'm not a trained statistician, so the added burden of learning the ins and outs of basic stats concepts combined with a programming language not cut out for heavy computation was a recipe for a bad time.
Ruby is notoriously bad for mathematical operations of any moderate complexity. It's great for readability and it's easy to understand, but when it comes to the heavy lifting of statistical analyses, Ruby shrinks into the corner like an outmatched featherweight.
The airspeed velocity of an unladen swallow
So, clearly Ruby was the wrong choice here. I could try and work it out in JavaScript in the browser, but that sounded even worse.
"What language is made for science and math?" I wondered.
Python.
Python's namesake comedians famously asked "What is the airspeed velocity of an unladen swallow?" Python the language could no doubt answer that question.
Python is the overwhelming favorite for all sorts of mathematicians, scientists, and other people of the more numeric persuasions. You can start it up and call on its friends numpy
and scipy
to handle all the hairiness of statistical analysis or calculating an indefinite integral.
Knowing that Python would come to my rescue, I now had to figure out how to run a Python function from a Ruby server.
Serverless functions don't have to be scary
Over the past several months (and years to some extent), there's been a lot of talk about the power of serverless functions. Instead of setting up a server on a service like Heroku or Digital Ocean, you can stick what would be your serverside code into individual functions with their own HTTP endpoints and host them on a cloud service provider.
Sounds great! And also scary.
As a developer deeply entrenched in the ways of server based applications and architecture, abandoning the comfort of my server based home was intimidating. I felt a little bit like Frodo Baggins, with the towering Mount Doom that is Amazon and AWS filling me with doubt about my ability to accomplish the task of deploying a serverless function.
But, I really didn't have a choice if I wanted to unload the hard work of statistics onto Python.
After some googling and reading about the world of serverless, I found the appropriately named serverless.com. I dug into their documentation and started trying things out.
It took a few days, and I had to wade through some older documentation and call in my Python expert brother to help me out, but I eventually had a function deployed to AWS Lambda through Serverless.
I can ping that function's HTTP endpoint anytime I want, give it some JSON, and have it respond with exactly the data I need. In this case, I wanted a p value for Preferr's A/B tests, so I used scipy
's chi2_contingency
and fisher_exact
functions to figure out how statistically significant the results of an A/B test are.
serverless.com is like Samwise Gamgee
All things considered, Serverless made it really easy for a total noob to show up and pretty quickly get a serverless function up and running. To quote Samwise,
"Come, Mr. Frodo!' he cried. 'I can't carry it for you, but I can carry you and it as well. So up you get!"
If you want to get into the details and learn how I got my Lambda function set up using Serverless, head over here.