Statistical computing with R for applied biology. 1. Basic and intermediate methods.
Dates and venues: from March, 8th 2023 to sometimes in June 2023, in a room A8 (fifth floor Building 3B). Tue 9.30-11.30, Thu 9.30-11.30
Course objective:
To provide an introduction to use of the R environment for graphical and statistical analysis in biology, biotechnology, medicine and food science and nutrition
Download the draft brochure for AY 2022-2023 here (.pdf, 631 kB). If you want to know what this course is about do attend the first lecture (9/3/2023 9.30-10.30 room A8).
Learning goals:
- knowledge and understanding: an introductory knowledge of principles of statistical computing for applied biology; working knowledge of basic methods for data wrangling, exploratoty data analysis, statistical and graphical analysis
- applying knowledge and understanding: ability to develop code in R and use it for graphical and statistical analysis
- making judgements: ability to choose the graphical and statistical methods which are more appropriate in a given situation
- communication skills: ability to produce reports for the statistical and graphical analysis of experimental data in a variety of formats
- learning skills: ability to access and peruse literature and technical information on statistical computing
Prerequisites:
A BSc in Agriculture, Food Science, Chemistry, Biology, Biotechnology. At least 5 ECTS credits in Mathematics (some statistics, 3 ETCS credits in Statistics, may help). Ability to use spreadsheet software packages under Windows, MacOS or Unix/Linux operating systems. A knowledge of technical English language (for speakers of English as a second language a B1 or B2 level is suggested)
Enrolment.
You can enrol for this course:
- if you are a student of the PhD program in Agriculture, Food and Forest Sciences at the Università degli Studi della Basilicata
- if you are a student of one of the PhD programs with a standing agreement with the International PhD program in Agriculture, Food and Forest Sciences
- if you are a student of another PhD program at Università degli Studi della Basilicata
To enrol in the course you just need to register for the course in the e-learning platform. You should also join the Google group StatCompR22. If you are unable to do so by yourself you are unlikely to get any juice out of this c(o)urse.
Attendance.
There are two ways you can attend this course:
- in presence
- self-teaching using the e-learning platform
Only ≤12 highly motivated students can attend the course and get their exercises graded. Further students can be accepted, and can attend the lectures, but their exercises and reports will not be graded by an instructor.
Grading.
To obtain full credits (5 ECTS) the students must attend the lectures (>75%, in presence) complete the exercises on the e-learning platform and turn in a report (in Word, .pdf format or .html format, generated using a Rmarkdown document) within 1 month from the end of the course. The report shall describe in full (including code) the descriptive and inferential statistical analysis and the graphical analysis of one of their own experiments. A suitable dataset form a R package can be used. Because of lack of time and enthusiasm I will only grade ≤12 reports (from the students obtaining top marks on the e-learning platform)
Up to 3 ECTS credits can be obtained by attending the lectures and completing the exercises on the distance learning platform in self-teaching mode). An attendance certificate will be provided to all students attending the course in presence, distance learning or self-teaching mode: you just need to turn in to the secretary of your PhD course the report from the e-learning platform or, if you manage to get your report approved, my evaluation.
Course content:
Lectures (32 h). 1. An introduction to statistical analysis (2 h). 2. The R environment (1 h). 3. Importing data, data structures in R (3 h). 4. Data wrangling, tidying and reshaping (2 h). 5. Data visualisation with base functions and ggplot2 (3 h). 6. Numerical and graphical summaries of data. Generating reports with R markdown and knitr (3 h). 7. Group comparisons with t-tests and non parametric tests; one way ANOVA and multiple mean comparisons; tests of independence and association for contingency tables; power analysis (3 h). 8. Experimental design; ANOVA and ANCOVA (4 h). 9. Covariance, correlation and linear regression. (3 h) Bonus lectures (only if I feel motivated enough) 10. Factorial designs and empirical model building (4 h). 11. Non-linear regression (4 h)
Practicals: 16 h. Writing and running code, generating reports using datasets from R or case studies
Venue and timetable: Room A8, building 3B, Tuesdays and Thursdays 9.30-11.30.
Course material: the only source of course material is the e-learning platform.
Suggested readings.
- Gacula, M., Singh, J., Bi, J., Altan, S. 2008. Statistical methods in food and consumer research. Academic Press.
- RStudio team. Finding your way to R.
- http://www.biostathandbook.com
- Grolemund G., Wickham H.. 2017 R for Data Science.
Hardware and software requirements.
To attend this course you obviously need a personal computer (desktop or laptop) with Windows, Linux or MacOS, and, if at all possible, the most recent version of R and RStudio. Legacy versions of R (from 3.2) might be sufficient for most of the course. An active network connection is needed to access the e-learning platform (you also need an account, see enrolment). You can also use the cloud version of RStudio.