This paper proposes an approach for modeling and mining curriculum Big data from real-world education datasets crawled online from university websites in Australia. It addresses the scenario to give a student a study plan to complete a course by accumulating credits on top of subjects he or she has completed. One challenge to be addressed is that subjects with similar titles from different universities may put barriers for setting up a reasonable, time-saving learning path because the student may be unable to distinguish them before an intensive research on all subjects related to the degree from the universities. We used concept graph-based learning techniques and discuss data representations and techniques which are more suited for large datasets. We created ground truth of subjects relations and subject's description with Bag of Words representations based on natural language processing. The generated ground truth was used to train a model, which summarizes a subject network and a concepts graph, where the concepts are automatically extracted from the subject descriptions across all the universities. The practical challenges to collect and extract the data from the university websites are also discussed in the paper. The work was validated on nineteen real-world education datasets crawled online from university websites in Australia and showed good performance.
Citation: Kah Phooi Seng, Fenglu Ge, Li-minn Ang. Mathematical modeling and mining real-world Big education datasets with application to curriculum mapping[J]. Mathematical Biosciences and Engineering, 2021, 18(4): 4450-4460. doi: 10.3934/mbe.2021225
This paper proposes an approach for modeling and mining curriculum Big data from real-world education datasets crawled online from university websites in Australia. It addresses the scenario to give a student a study plan to complete a course by accumulating credits on top of subjects he or she has completed. One challenge to be addressed is that subjects with similar titles from different universities may put barriers for setting up a reasonable, time-saving learning path because the student may be unable to distinguish them before an intensive research on all subjects related to the degree from the universities. We used concept graph-based learning techniques and discuss data representations and techniques which are more suited for large datasets. We created ground truth of subjects relations and subject's description with Bag of Words representations based on natural language processing. The generated ground truth was used to train a model, which summarizes a subject network and a concepts graph, where the concepts are automatically extracted from the subject descriptions across all the universities. The practical challenges to collect and extract the data from the university websites are also discussed in the paper. The work was validated on nineteen real-world education datasets crawled online from university websites in Australia and showed good performance.
[1] | G. O'Neill, Curriculum design in higher education: theory to practice, Research Repository UCD, University College Dublin, Teaching and Learning, 2015. |
[2] | A. Vuong, T. Nixon, Brendon Towle, A method for finding prerequisites within a curriculum, in Proceedings of the 4th International Conference on Educational Data Mining, (2011), 211-216. |
[3] | P. Hill, Online educational delivery models: a descriptive view, Educause Rev., 47 (2012), 84-86. |
[4] | L. Yuan, S. J. Powell, MOOCs and open education: Implications for higher education, Cetis, 2013. |
[5] | P. R. Aldrich, The curriculum prerequisite network: modeling the curriculum as a complex system, Biochem. Mol. Biol. Edu., 43 (2015), 168-180. doi: 10.1002/bmb.20861 |
[6] | M. Komenda, M. Víta, C. Vaitsis, D. Schwarz, A. Pokorná, N. Zary, et al., Curriculum mapping with academic analytics in medical and healthcare education, Plos One, 10 (2015), e0143748. |
[7] | R. Wirth, J. Hipp, CRISP-DM: Towards a standard process model for data mining, in Proceedings of the 4th International Conference Practical Applications of Knowledge Discovery and Data Mining, (2000), 29-39. |
[8] | H. Liu, W. Ma, Y. Yang, J. Carbonell, Learning concept graphs from online educational data, J. Artif. Intell. Res., 55 (2016), 1059-1090. doi: 10.1613/jair.5002 |
[9] | K. L. M. Ang, F. L. Ge, K. P. Seng, Big educational data & analytics: Survey, architecture and challenges, IEEE Access, 8 (2020), 116392-116414. |
[10] | S. Avasarala, Selenium WebDriver Practical Guide, Packt Publishing Ltd, 2014. |
[11] | Y. Zhang, R. Jin, Z. H. Zhou, Understanding bag-of-words model: a statistical framework, Int. J. Mach. Learn. Cybern., 1 (2020), 43-52. |
[12] | Y. HaCohen-Kerner, D. Miller, Y. Yigal, The influence of preprocessing on text classification using a bag-of-words representation, Plos One, 15 (2020), e0232525. |
[13] | H. Zhang, S. Wang, X. Xu, T. W. S. Chow, Q. M. J. Wu, Tree2Vector: learning a vectorial representation for tree-structured data, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 5304-5318. doi: 10.1109/TNNLS.2018.2797060 |
[14] | Breaking Down Mean Average Precision (MAP). Available from: https://towardsdatascience.com/breaking-down-mean-average-precision-map-ae462f623a52. |
[15] | J. Huang, C. X. Ling, Using AUC and accuracy in evaluating learning algorithms, IEEE Trans. Knowl. Data Eng., 17 (2005), 299-310. doi: 10.1109/TKDE.2005.50 |