Using online and blended learning tools to teach data science skills

By Miranda Prynne, 23 March, 2022
Advice on teaching data science skills using online and blended learning tools and resources, by Philip Leftwich
Article type
Article
Main text

Transferrable skills

In undergraduate classes, classical training in STEM rarely requires deep knowledge of quantitative analysis, especially “big data”; these skills are often developed much later. But understanding data science – broadly mathematics, statistics and coding – is increasingly fundamental for STEM scientists at all levels. It provides a valuable and transferable skill that can be applied across many different industries. What resources and techniques are available to help students master these vital skills?

Blended learning

Learning to code can be intimidating; first attempts can be frustrating and require repetition and feedback to gain confidence. Often learners will move at very different paces, depending on prior experience and confidence. As a result, online or blended learning, combining self-paced workbooks with instructor contact, can be the perfect environment for teaching data science. Luckily, there are many platforms and tools to enable successful blended learning, and I outline these along with some valuable tips for successful learning in data science.

Generating interest in data science

Live analysis demonstrations as early as possible into the course help students connect with the materials and allow instructors to highlight the most important aspects of their class. Data visualisations are a fun and engaging way to demonstrate what learners can achieve with coding skills. Including intentional mistakes and your process for checking and debugging errors is an excellent addition here.

Going through common mistakes can humanise you as the instructor and demonstrate that learners often write code that does not produce the intended results. You can display your techniques for understanding errors and communicate that this takes time and patience. 

Building a straightforward narrative is a handy tool for building engagement. As with any practical classes, authentic problems work best, and care should be taken to develop projects that are real to the students and fit well with the rest of their curriculum.

Learning to code

Data visualisations are highly beneficial as a starting point for enabling learners to work with data. Wickham & Grolemund begin their book, R for Data Science, with a chapter on visualisation. This approach allows learners to create something they can share immediately. 

Getting students to start writing code early and often, with frequent and visual feedback, is key to building engagement and confidence. Frequent low-stakes quizzes and tasks are essential for maintaining engagement and building confidence.

Creating effective online learning environments

Barriers to entry include checking system requirements, downloading and installing the required software, and installing associated packages. It is easy to assume that students will be intuitively able to use computer software, but this is often not the case.

Interactive tutorials allow learners to run their analyses immediately and receive instant feedback without navigating a new coding environment. The Binder project is a software project to package and share reproducible interactive environments. A Binder repository can contain all dependencies, tools and data required to launch interactive sessions. Access is through a web browser and can be launched regardless of a student’s local computing resources. This set-up has been used successfully to run many “big data” lessons during lockdown periods in the first waves of the pandemic

The community around the R coding language, in particular, has developed many tools for education and interactivity. There is a dizzying array of resources and interactive worksheets for new learners. The learnR package is very straightforward for making interactive tutorials that host video, text, images, quizzes, and code chunks. Students can download these files to run on local computers, deploy with Binder or as a shiny app – an R package makes it easy to build interactive web apps straight from R.

Eventually, students need to be introduced to a standard graphical user interface (GUI) or command-line interface. Cloud services such as Amazon Web Services, Google Cloud, or RStudio cloud can host programming environments and make deploying files and dependencies easy while removing local system limits. RStudio Cloud has an education licence available and allows ready deployment of data projects with pre-installed dependencies. Instructors can also remotely access any student project to monitor progress or troubleshoot.

Building and maintaining these courses will have a cost; cloud services are restricted in available hours and memory space before a subscription is required. While initially free, Binder environments are limited in the amount of data that can be stored and the processing power that can be applied.

Organisation of learning resources

Finally, it is vital to have a well-curated “hub” that reminds learners where to find relevant resources, access course material, and contact their instructors. You may have access to a proprietary learning system such as Blackboard – and this may be the preferred option, especially if learners are familiar with this environment. However, GitHub pages and the GitHub classroom, or bookdown or blogdown course websites, which are R packages that create websites directly from R, are all viable alternatives and provide the ability to embed interactive tutorials and other HTML widgets. 

Learning data science can be challenging, but the basic pedagogical principles apply: remove barriers to access, encourage student autonomy, and nurture confidence. Discussions around responsible data science should be encouraged, for example, concerns about the iris data set. Now is an excellent time for us to refresh our approach to data science with the tools to enhance pedagogy for online and blended learning.

Philip Leftwich is a lecturer in genetic and data science at the University of East Anglia.

If you found this interesting and want advice and insight from academics and university staff delivered directly to your inbox each week, sign up for the THE Campus newsletter.

Standfirst
Advice on teaching data science skills using online and blended learning tools and resources, by Philip Leftwich

comment