Processing and Visualizing Clickstream Data Using R

Learning Analytics and Knowledge Conference 2022
Pre-conference Tutorial

Motivation for Tutorial

Student clickstream data refers to time-stamped records of student click or keystroke events within a Learning Management System (LMS) or digital learning platform (Baker et al., 2020). This data is particularly valuable to learning analytics researchers, as it provides a detailed record of students’ specific click/keystroke locations, number of interactions, sequence of click events, and time spent on a given page or lesson, to name a few. Below is a snippet of a clickstream data file. Here, the data is only for three students, each containing three individual click events. However, clickstream data for a class of around 370 students can generate over 380,000 individual click events (Park et al., 2017).

Chemistry 101 Clickstream Data Example
student_id url created_at
1 2/25/18 1:15
1 2/25/18 1:16
1 1/12/18 23:15
2 1/3/18 20:27
2 2/5/18 8:51
2 1/3/18 20:24
3 1/22/18 14:13
3 2/26/18 0:05
3 2/5/18 11:06

Student clickstream data helps researchers answer important questions about how student learning takes place in online and digital environments. These questions can be practical in nature (How often do students visit the course’s LMS?) or pertain to effective course design (Do short format lecture videos increase engagement versus longer format lecture videos?). Clickstream data has also been used to provide theoretical insights into students’ self-regulated learning and other motivational processes (Cicchinelli et al., 2018; Li et al., 2020).

Because clickstream data generates a high volume of individual data points, researchers have used it to develop robust predictive models of learning (Ding et al., 2019; Zhou & Bhat, 2021), data dashboards (Diana et al., 2017), and other rich visualizations of the learning process (Park et al., 2018; Rodriguez et al., 2021). Clickstream data can also be used to detect whether students are engaging in potential academic misconduct, such as exploiting features in the learning platform (Paquette et al., 2014; Trezise et al., 2019), or performing well on exams without engaging with course materials (Alexandron et al., 2016; Sangalli et al., 2020).

While clickstream data is a valuable source of data for learning analytics research, the inherently complex nature of this data, coupled with limited professional development training, creates a significant barrier for those who wish to conduct learning analytics research using clickstream data, but do not have access to any formalized entry points into the field.

Thus, the goal of this tutorial is to provide training opportunities to researchers interested developing the necessary skills for working with clickstream data.


Alexandron, G., Lee, S., Chen, Z., & Pritchard, D. E. (2016). Detecting Cheaters in MOOCs Using Item Response Theory and Learning Analytics. In UMAP (Extended Proceedings).

Cicchinelli, A., Veas, E., Pardo, A., Pammer-Schindler, V., Fessl, A., Barreiros, C., & Lindstädt, S. (2018, March). Finding traces of self-regulated learning in activity streams. In Proceedings of the 8th international conference on learning analytics and knowledge (pp. 191-200).

Diana, N., Eagle, M., Stamper, J., Grover, S., Bienkowski, M., & Basu, S. (2017, March). An instructor dashboard for real-time analytics in interactive programming assignments. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference (pp. 272-279).

Ding, M., Yang, K., Yeung, D. Y., & Pong, T. C. (2019, March). Effective feature learning with unsupervised learning for improving the predictive models in massive open online courses. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge (pp. 135-144).

Li, Q., Baker, R., & Warschauer, M. (2020). Using clickstream data to measure, understand, and support self-regulated learning in online courses. The Internet and Higher Education, 45, 100727.

Park, J., Denaro, K., Rodriguez, F., Smyth, P., & Warschauer, M. (2017, March). Detecting changes in student behavior from clickstream data. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference (pp. 21-30).

Park, J., Yu, R., Rodriguez, F., Baker, R., Smyth, P., & Warschauer, M. (2018). Understanding student procrastination via mixture models. In Proceedings of the 11th International Conference on Educational Data Mining. International Educational Data Mining Society.

Paquette, L., de Carvahlo, A., Baker, R., & Ocumpaugh, J. (2014, July). Reengineering the feature distillation process: A case study in detection of gaming the system. In Educational data mining 2014.

Rodriguez, F., Lee, H. R., Rutherford, T., Fischer, C., Potma, E., & Warschauer, M. (2021, April). Using clickstream data mining techniques to understand and support first-generation college students in an online chemistry course. In LAK21: 11th International Learning Analytics and Knowledge Conference (pp. 313-322).

Sangalli, V. A., Martinez-Muñoz, G., & Cañabate, E. P. (2020, April). Identifying cheating users in online courses. In 2020 IEEE Global Engineering Education Conference (EDUCON) (pp. 1168-1175). IEEE.

Trezise, K., Ryan, T., de Barba, P., & Kennedy, G. (2019). Detecting academic misconduct using learning analytics. Journal of Learning Analytics, 6(3), 90-104.

Zhou, J., & Bhat, S. (2021, April). Modeling Consistency Using Engagement Patterns in Online Courses. In LAK21: 11th International Learning Analytics and Knowledge Conference (pp. 226-236).