Diving into R, one year later

in April, it will be exactly one year ago that I started to scratch the surface of the well-known (at least, in the data science community) program called “R”. This is a short story about my journey.

Like anything I learn, I dive in, with, what I dare to call a head-first approach. Unorthodox, stimulated by ideas and the urge to solve problems, driven by visually sketched structured models. Some call it design thinking. I tend to call it “just sketching my thoughts”. No need to call on all these fancy terms for simplistic ways to translate thoughts into visual and understandable concepts.

I first started with a course on R on the Datacamp platform. Like any good student, you start with learning the syntax of course. Right? Well … the course sucked. I gave up that same night.

That same night, I tried two other platforms, from which I can’t remember the name anymore, as well. I didn’t hang in there either. I managed to finish a few pages into the second half of a multi-pages course of max 20 pages. Pathetic actually.

Not sure it’s me. Could be, but something is very off on these online learning platforms! They teach you syntax in a way it’s not connected to real-world issues you’ll run into. It’s actually, in my humble opinion, a waste of time to start with syntax.

So… I started sketching my problems to solve and started from there and worked my way backward with the help of some books, code from blog posts handling parts of the problem I wanted to solve and eventually signed up for a course to learn real-world best practice based machine learning (in python, also programming language I didn’t knew).

One hell of a year, but I learned a shit load! I learned about timeseries, casual impact, linear regression, plotting data in different graphs than you get in Google Analytics, working with the analytics API, handling data in new ways, … etc. Stuff Google Analytics can’t do. Stuff any aggregated interface/tool can’t do. Not even excel with exports.

Here are my main highlights I’ve learned in the past year:

  1. R is hard! At first, dealing with the installation of R Studio, packages, and the errors to get everything running was a pain in the ass. (thank you StackOverflow for the help)
  2. Basic data handling stuff is really simple. Calculating averages, means, etc. is very handy & easy to do with R.
  3. Too many packages. Packages often overlap in functionality and its hard to pick ‘the best’ package. The number of blog posts that use different packages to tackle the same challenges makes it even more confusing as a beginner in R. I guess you just have to go with it and switch packages if you think you’ll benefit from it in any other way: the first tutorial that actually works for you, when packages aren’t compatible with your version of R or your java isn’t up-to-date, …
  4. (Too) many statistical conditions. I started my journey with statistical tests like the t-test and p-values etc. and got lost very quickly. I find it very hard until this day to pick the right test for the right job. I guess it takes practice. Blog posts are often confusing or not decently backed by a simple comprehensible example. The entry-level is pretty high for someone with a limited or no statistical education. Still working on that and thus ordered myself a new book on Amazon about the statistical concepts like one-way ANOVA, two-way ANOVA, the p-value, the t-test, the f-test, the chi-test, etc.
  5. Machine learning is actually a lot easier to learn. It’s far more straightforward and you can get a primer with blog posts covering machine learning models. The irony of it all is that those blog posts only show you the easy part of machine learning: the models. How to basically get the model running is fairly easy but almost none to a handful of people talk real-world stuff: how to read the data, what model to pick, how to evaluate the model, how to pimp the model, … and maybe most important of all: HOW THE HELL DO YOU BUILD SOMETHING END-TO-END?? other issues like; should you use ML or build a formule instead? Or just use statistics to solve the issue? Isn’t there another solution or analysis that has more impact or is less complex/time-consuming? Do you know what’s needed before you can even start with machine learning at all, like: labeled data, a clear goal, …?
  6. No-one talks you through the whole process of getting the data, cleaning the data, preparing the data for your model, evaluating the model, turning the output into a decent outcome you can work with and eventually translate it into a format that is actionable for the business and comprehensible to work with, meaning: a decent advice that has value! In my world: something a marketer can work with and have a direct impact. Aka: see, think, do.
  7. Documentation is a bitch. Some packages have splendid documentation if it comes to syntax, few of them have working examples to show its purpose or how the functions work in a real-life scenario.

I’m probably forgetting a lot of things I’ve noticed along the way learning to work with R, but these are some of the biggest concerns if the world needs more data-savvy people (with the rise of AI and ML). We still have a long road to go and everyone needs help. Decent help. Or time.. to learn about this stuff in a context that makes sense to the real world you’ll end up in.

Since I’ve been playing around in R, I find myself switching between Google Analytics custom reports and R scripts to come up with several tailored made analyses. A combination of both as this is a very time-consuming effort, and thus expensive, in a marketing agency world when you’re still looking for a proper valuable and above all actionable outcome. Actionable as in, ideally: one sentence comprehendible advice for the marketer so he or she knows which knob to twiddle and in what direction.

Which makes me conclude the following:

From what I’ve experienced in the last year, there’s a HUUUUUUUUGE GAP to bridge between the marketer and the analyst. As the marketer is mostly tied into the communication and the ideation side of marketing, the analyst is mostly tied into the IT/Business and reporting side of things. The marketer needs the analyst to perform better, beyond the traditional and short-sighted online marketing as we know it.

The analyst needs to learn more about marketing to translate business opportunities in creative and practical, achievable marketing advice and/or tactics if the marketer isn’t that resourceful. Both have a role to play but resourcefulness is the biggest issue: performance often tends to trump, at least, that’s what marketers perceive, marketing creativity (ex.: campaigning ideas and storytelling). The same counts for the analyst, but in the opposite direction: storytelling and creative outlets trump long-term performance (ex.: LTV).

The gap is huge due to the lack of knowledge on both sides (marketing <> analytics), yet the value of an analyst in the marketing agency or the marketer in the analytics agency might be overlooked and forgotten. I’m personally not sure or convinced that this is possible within the same company unless you split the focus of the jobs to be done:

  1. Branding (campaigning & storytelling)
  2. Marketing (short-term and daily efforts: blogging, AdWords, SEO, etc.)
  3. Growth (long-term & monthly efforts: attribution & budget allocation, customer base evolution, etc.)

I’m intentionally not mentioning strategy & product development as these are more business related items than marketing related items.