Skip to main content

Data Science Team FAQs

I need statistical support. What is the first step? 

Complete Qualtrics form: Qualtrics Survey | Qualtrics Experience Management

How long in advance should we request stats help?

As soon as possible! Consider scheduling an appointment with a statistician when you are initially planning your project. We can help with power calculations and sample size justifications, drafting initial statistical analysis plans, and grant timelines. These initial consultations are critical to make sure your planned analysis is feasible/appropriate based on your data and research questions

How long should we expect analysis to take?

Typically, statistical analysis is an iterative process! We will often have a kick off meeting to talk through your research questions and the statistical analysis plan. Then, we will work on our own to get familiar with the data, ask you and your team questions, confirm any issues with variables/data, and then start sending you results. We like to meet regularly with you to discuss results and ensure that we are on the same page. This close communication also ensures that we can quickly solve any potential problems or issues with the data.

Please note: If your data needs extensive cleaning and/or processing, this will add time before we can even start analysis!

How big of a sample do I need to run (some type of slightly advanced analysis)?

It depends! We want to use the simplest, most straightforward methods to adequately answer your research questions. We can help with sample size planning and power calculations during the planning phase of your study. 

Please note, if you come to us with small sample sizes (n<50 or 100), we are likely going to be limited in the type of inferential statistics that can be run.  A fancy model with lots of predictors/covariates is probably not in the cards!

Some general rules of thumb that might be helpful:

  • T-tests and confidence intervals: generally 30 or more is okay
  • Chi-squared tests: cell counts should be >5
  • Correlations: n>30 is typically okay assuming you are trying to detect stronger correlation coefficients. 
  • Linear regression: 10-20 observations per independent variable. More predictors => larger sample! 
  • Structural equation modeling: aiming for more than 200 is a good strategy. 
  • Please note: All of this depends on the quality of your data and how representative it is to the population you want to make inferences about. Sometimes, we can do very fancy things with small datasets. Other times, we are limited even with large sample sizes! Again – please come talk to us about your research question and we can help make an informed decision. 
  • As a second note: if your data does not have any variability across your target variables, it will make it hard to fit a model, even if you have a larger sample size. 
  • Descriptive statistics are still statistics! Most often, the simplest analysis is the best one.
What formats of data are acceptable?

.csv is preferred- we can easily read csv files into many of our software programs.

We need a data dictionary that includes variable names, descriptions of variables (including the range of possible values), and all labels.

What should I bring to a consultation in one of the office hours?

You can come to us at any stage in your research journey! When you have an unfinished idea, are drafting questionnaires, or have questions about the appropriate statistical test or visualization to use.

We love to collaborate! Our team is the most helpful when you come with a clear, defined research question! Then we can work quickly to operationalize the specific variables and recommend the most appropriate gold-standard quantitative analysis to answer that question. We also can ensure that any aims that you have written directly match with the data analysis section (which is important for grant reviewers).

When is it NOT appropriate to use ChatGPT (other AI)?

While AI is exciting new technology, please don’t use AI to draft analysis you want us to do.  Instead, find other similar peer-reviewed papers and send them to us. We find this is much more helpful in the long run, to ensure that we are using rigorous evidence-based and established methods to answer your research questions.

What software do we use/support?
  • Chloe: R, SPSS, Tableau
  • Sanmi: R
  • Matt: SPSS, SAS, ArcGIS
  • Gina: SAS
  • Please note: we are not technical/IT support. Also, if you are using a software program not listed, we likely cannot help you with your code but are happy to look at your output
I have a question about implementation science. I want to write an implementation aim into my grant – can you help?

Yes! Dr. Aileen Chou specializes in implementation science and can collaborate with you so you can decide how to integrate these methods and frameworks into your proposal. 

I want to use big data to answer a complicated real-world question using lots of different variables. Can you help with that?

For sure! Dr. Gina McKernan is a data scientist and biostatistician with over 15 years of experience working with all types of datasets and variables including: biological, behavioral, psychosocial, socio-demographic, clinical, EHR, ‘omics, and more!

I have a measurement question. How do I decide what survey to use in my study?

We can help with that too! Dr. Christy Zigler specializes in measurement science and can assist you with the choice of measure(s) for your specific needs. She also can work with you to design studies to develop new measures.