What to Do First: SQL

I have an eclectic background. My bachelor’s degree is in Liberal Arts, specifically English and Psychology (I double majored). After college, I worked in a human performance lab for a few years and started learning the basics of statistics, how to apply the things I learned in class around experimental design, how to write up an analysis. But I also started to learn what not to do: I would not recommend putting 30+ variables into a regular multiple regression when there is less than 90 full rows of data. I would not recommend using data from your pilot when your pilot included the researchers who are fundamentally different (older, actually know the different tests and how to game them, etc.) from your regular sample. I would not make the experiment 4 hours long when the participant is hooked up to four different types of monitoring devices, some of which are uncomfortable after 30 minutes. But I digress. The fact of the matter is that I was interested in the right way to do things.

Years later, I’m realizing that I’ve made a few mistakes along the way. The first is that I let my SQL knowledge fade since starting a job that doesn’t require it. So the first order of business is to start re-learning SQL. I wanted to keep track of what I do and how I do it here so that if I ever needed to remember something, I have an easy way to do so. SQL is one of those languages that is incredibly simple. I first learned it in a data mining class back in 2014 (or was it 2015? I can’t remember). Back then, I was still very much a psychology major and the most I’d coded was in SPSS (until you stop using the GUI, it does not count as coding). I made the mistake of starting a class without knowing a thing about how to code and the class was mostly in SAS (I still refuse to use it nowadays). So I was a bit of an idiot. I should have really practiced more with SQL beyond those class assignments. Granted, I never really had a job where I needed SQL, but that’s more because of how companies are slowly moving towards data.

So the plan is to re-take the DataCamp SQL classes I took in the past and to create a post that shows how I would pull data in R versus SQL. I think that since I am a dplyr user, this shouldn’t be too hard, especially since dplyr is built to somewhat mirror SQL, just with more parenthesis. Once I finish the SQL classes on DataCamp, I’ll say what I learned, what I found important, and then will start thinking of new pulls to show R versus SQL.

Thanks for reading!

Leave a comment