10 Similarities Between Academic/Clinical Research and Data Science

J Kelman
6 min readApr 21, 2020

As I enter week 6 of my data science immersive program, I keep noticing similarities between what I am learning now and what I have applied during my time as a clinical research assistant and while doing academic research. So for anyone in the academic or clinical research field thinking about a switch to data science, here are 10 things you didn’t know you’ve already learned!

To illustrate some of the similarities we will use my senior year psychopathology capstone research project (for the academic research side) and one of General Assembly’s lab exercise (for the data science side).

1. The Workflow

At their core, both research and data science have the same goal: extract insights from data. In order to do so, the process and tools each field uses may be slightly different but the scientific method and data science workflow are actually very similar.

Let’s take a look at the scientific method:

[picture from https://towardsdatascience.com/a-data-scientific-method-80caa190dbd4]

Now the data science workflow:

[Picture from General Assembly’s Data Science Immersive pre-work]

Starting from a general question or observation, developing a testable hypothesis or problem statement, collecting and analyzing data, interpreting it to answer our specific question, and presenting this data to an audience. Those steps are the same whether you are working in a psychology research lab or for a data science consulting firm. And in both cases, those processes are iterative and our results will most likely always lead us to another, deeper question.

2. Writing a problem statement

Transforming an observation or question into a statement that is answerable with data is a crucial part of research and data science work. In both cases, the statement should be specific and conclusively answerable.

For instance, in our psychopathology academic research, we wanted to investigate “why do college students take prescription medication (such as Adderall) that has not been prescribed to them?” This question is way too broad to be a good hypothesis or problem statement. We delved into the literature to narrow our question and the statement became: “Is there a relationship between distress levels and un-prescribed prescription drug use?”

In our data science exercise, we wanted to know “What is the relationship between personalities and left-handedness?” Again, this question is too broad and was refined into: “As the number of push-ups an individual can do increases, how does their chance of being left handed change?”

We can see that a very similar process was done in both cases; we started from a general question and transformed it into a specific statement answerable with data available to us. It is also interesting to note that both of those problem statements could be investigated using either academic research or data science tools.

3. Data collection

The tools used to collect data and the size of the data collected may be different, but thinking critically about data collection is a skill needed in both research and data science. For instance, taking into account potential biases, how data collection protocols can skew the data, and ethics consideration such as how to deal with protected or sensitive information are paramount in both fields.

4. Formatting data

Once we have gathered our data, especially when data has not been collected specifically to answer our problem statement (when using web-scrapping or publicly available data for instance), reformatting the data may be necessary. This is a skill that both academic/clinical researchers and data scientists are familiar with.

For instance, for our psychopathology research, we used answers to the American College Health Association-National College Health Assessment (ACHA-NCHA) previously recorded by Franklin & Marshall College. As the ACHA-NCHA questions were not created to answer our specific problem statement, we had to reformat answers to four questions to create our “drug use score”. For each question, the participants were assigned a value of 0 if they answered “no” to such drug use and a value of 1 if they answered “yes.”

Similarly, in our data science lab, since we wanted to predict whether or not someone is left-handed, we had to reformat our target variable.

5. Data cleaning

Dealing with nulls, outliers, and incorrectly entered data are tasks that both researchers and data scientists have to do. While data scientists may spend a little more time cleaning data compared to researchers (because they usually work with much larger datasets from more various sources), cleaning skills learned in the research field are still very much applicable in data science.

6. Using descriptive statistics

Statistics such as the mean, mode, and standard deviation of our variables can bring valuable information. Understanding those numbers (as well as the math behind them) is an important skill that both academic/clinical researchers and data scientists use.

7. Data visualization

Being able to create visuals to efficiently communicate information is something that is an important part of the data science work. While data scientists may use python libraries to create those graphs, researchers have learned to create those visualizations using excel too. Both fields use the same types of graphs (histograms, bar graphs, boxplots…). Knowing how to interpret those graphs is another paramount skills that researchers already have!

Here’s an example of a graph created with excel for the psychopathology research work:

Here’s a very similar graph created with python and matplotlib for the data science lab exercise:

8. Interpreting and making decisions/gathering insights from statistical tests and models

Interpreting the analysis output is an essential skill in both fields. For instance, knowing how to interpret the p-value of a t-test is something that both researchers and data scientists need to be able to do. In this example, the transferable skills are obvious. In other cases, such as when interpreting a regression model’s intercept and coefficients, the specific feature to interpret may not be known yet to researchers, but having experience with interpretation of similar concepts is a huge advantage.

Here is an example of the test results and interpretation in the psychopathology study:

Here is an example of logistic regression coefficient interpretation in the data science lab:

9. Presenting your data in a way that people can understand

The format or audience may be different, but the skills required to present your data in a clear way are the same. Choosing the right phrasing and level of complexity based on your audience is something needed in both fields and something researchers have already learned.

10. Working as a team

Research is rarely done in a vacuum, neither is data science. As a psychology student and clinical research assistant, I worked with a team of researchers, professors, physicians, and nurses. Six weeks into my data science program, I already see how essential the data science community and teamwork is (from my peers to the strangers on stackoverflow and medium).

For anyone considering a shift to data science, I hope reading about all the skills you already have that can be transferred to a career in data science will give you the confidence to take the next step!

--

--

J Kelman

On a journey to shift from neuroscience to data science.