A reflective look at the epistemological context of data analytics
DOI:
https://doi.org/10.18634/sophiaj.21v.1i.1440Keywords:
data science, field of research, education and research, epistemologyAbstract
The modern abundance and prominence of data has led to the development of “data science” as a new field of research, along with a body of epistemological reflections on its foundations, methods, and consequences. This article is derived from the research exercise on the purposes of education where the analysis of knowledge provides a systematic dynamic and a critical review of important problems and open debates in the epistemology of analytics and data science, proposing a division of epistemology of data science in the following five aspects: Maximalistic and minimalist characterizations, descriptive taxonomies, the knowledge generated by data science, black box problems and science in a data-intensive paradigm, aspects that provide a reflective exercise against to understanding and addressing essential aspects of data interpretation and understanding hidden patterns in them, this being the challenge of analytics as such.References
Alemany Oliver, M. and Vayre, J.-S. (2015). Big data and the future of knowledge production in marketing research: Ethics, digital traces, and abductive reasoning. Journal of Marketing Analytics, 3(1), pp. 5–13. doi: 10.1057/jma.2015.1. https://link.springer.com/article/10.1057/jma.2015.1
Anderson, C. (2008). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete’. Wired. https://www.wired.com/2008/06/pb-theory/
Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Pad, D. (2019). Invariant risk minimization. arXiv preprint, 1907.02893. https://www.researchgate.net/publication/334288906_Invariant_Risk_Minimization
Bareinboim, E., Lee, S., & Zhang, J. (2021). An introduction to causal reinforcement learning. Columbia CausalAI Laboratory, Technical Report (R-65). https://ics.uci.edu/~dechter/courses/ics-295cr/2024-25_Q2_Winter/presentations/P1%20-%20Jiapeng%20Zhao%20-%20An%20Introduction%20to%20Causal%20Reinforcement%20Learning.pdf
Blei, D. M. and Smyth, P. (2017). Science and data science. Proceedings of the National Academy of Sciences, 114(33), 8689–8692. doi: 10.1073/pnas.1702076114. https://www.pnas.org/doi/10.1073/pnas.1702076114
Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231. doi: 10.1214/ss/1009213726. https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.full
Burrell, J. (2016). How the machine “thinks: Understanding opacity in machine learning algorithms. Big Data & Society. doi: 10.1177/2053951715622512. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2660674
Carmichael, I. and Marron, J. S. (2018). Data Science vs. Statistics: Two Cultures? Japanese Journal of Statistics and Data Science, 1(1), 117–138. doi: 10.1007/s42081-018-0009-3. https://arxiv.org/abs/1801.00371
Chambers, J. (1993). Classes and Methods in S.I: Recent Developments Computational Statistics, 8:3, 167-184.
Chambers, J. M. (1993). Greater or lesser statistics: a choice for future research. Statistics and Computing, 3(4), 182–184. doi: 10.1007/BF00141776. https://link.springer.com/article/10.1007/BF00141776
Cifuentes et al. (2016). Métodos de análisis para la investigación, desarrollo e innovación (I+D+i) de procesos agrícolas y agroindustriales. En https://www.ugc.edu.co/sede/armenia/files/editorial/metodos_de_analisis_para_la_investigacion.pdf
Donoho, D. (2017). 50 Years of Data Science. doi: https://doi.org/10.1080/10618600.2017.1384734 745-766 https://www.tandfonline.com/doi/full/10.1080/10618600.2017.1384734
Doshi-Velez, F. and Kim, B. (2017). Towards A Rigorous Science of Interpretable Machine Learning. http://arxiv.org/abs/1702.08608
Frické, M. (2015). Big data and its epistemology. Journal of the Association for Information Science and Technology. 66(4), pp. 651–661. doi: 10.1002/asi.23212. https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.23212
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J. & Krüger, L. (2013). The empire of chance: How probability changed science and everyday life. New York: Cambridge University Press. https://books.google.com.cu/books/about/The_Empire_of_Chance.html?id=Bw2yKfpvts8C&redir_esc=y
Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models. Frontiers in genetics, 10, 524. https://doi.org/10.3389/fgene.2019.00524
Hacking, I. (1975). The emergence of probability: A philosophical study of early ideas about probability, induction, and statistical inference. New York: Cambridge University Press. https://www.cambridge.org/core/books/emergence-of-probability/9852017A380C63DA30886D25B80336A7
Harman, G. (1965). The inference to the best explanation. Philosophical Review, 74(1), 88-95. https://www.jstor.org/stable/2183532
Harman, G. & Kulkarni, S. (2007). Reliable reasoning: Induction and statistical learning theory. Cambridge, MA: The MIT Press. https://direct.mit.edu/books/monograph/2565/Reliable-ReasoningInduction-and-Statistical
Hey, T., Tansley, S. and Tolle, K. (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery. pp 287. https://www.microsoft.com/en-us/research/publication/fourth-paradigm-data-intensive-scientific-discovery/
Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), p. 2053951714528481. doi: 10.1177/2053951714528481. https://www.researchgate.net/publication/271525133_Big_Data_New_Epistemologies_and_Paradigm_Shift
Kitcher, P. (1976). Explanation, Conjunction, and Unification. The Journal of Philosophy, 73(8), 207–212. doi:10.2307/2025559. https://www.jstor.org/stable/2025559
Kitcher, P. (1989). Explanatory unification and the causal structure of the world. In P. Kitcher & W. Salmon (eds.), Scientific Explanation, 410-505. Minneapolis: University of Minnesota Press. https://conservancy.umn.edu/server/api/core/bitstreams/8f6f9fe7-b511-43cd-8d75-5c8570fefd59/content
Krishnan, M. (2020). Against Interpretability: a Critical Examination of the Interpretability Problem in Machine Learning. Philosophy & Technology, 33(3), 487–502. doi: 10.1007/s13347-019-00372-9. https://www.researchgate.net/publication/335148516_Against_Interpretability_a_Critical_Examination_of_the_Interpretability_Problem_in_Machine_Learning
Kuhn, T. S. (1970). The structure of scientific revolutions. 2nd Edition. Chicago: University of Chicago Press. https://www.lri.fr/~mbl/Stanford/CS477/papers/Kuhn-SSR-2ndEd.pdf
Leonelli, S. (2014). What difference does quantity make? On the epistemology of Big Data in biology. Big Data & Society, 1(1), 2053951714534395. doi: 10.1177/2053951714534395. https://journals.sagepub.com/doi/10.1177/2053951714534395
Lipton, P. (1991). Inference to the best explanation. London: Routledge. https://books.google.es/books?id=WIfYNExpSC0C
Lipton, Z. C. (2018). The mythos of model interpretability. Communications of the ACM, 61(10), 36–43. doi: 10.1145/3233231. https://arxiv.org/abs/1606.03490
MacKenzie, D. (1984). Statistics in Britain, 1865-1930: The social construction of scientific knowledge. Edinburgh: Edinburgh University Press. https://gwern.net/doc/statistics/1981-mackenzie-statisticsinbritain18651930.pdf
Mallows, C. (2006). Tukey’s Paper After 40 Years. Technometrics, 48, pp. 319–325. doi: 10.1198/004017006000000219. https://www.researchgate.net/publication/238879758_Tukey%27s_Paper_After_40_Years
Mayo, D. (1996). Error and the growth of experimental knowledge. Chicago: University of Chicago Press. https://errorstatistics.com/wp-content/uploads/2020/10/egek-pdf-red.pdf
Mayo, D. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. New York: Cambridge University Press. https://www.cambridge.org/core/books/statistical-inference-as-severe-testing/D9DF409EF568090F3F60407FF2B973B2
Napoletani, D., Panza, M. and Struppa, D. (2018). The Agnostic Structure of Data Science Methods. p. 17. https://arxiv.org/abs/2101.12150
Nash, J. (1950). Non-Cooperative Games. PhD thesis, Princeton University.
Nash, J. (1951). Non-Cooperative Games. The Annals of Mathematics,54(2):286–295. https://www.jstor.org/stable/1969529
Nie, X. and Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), pp. 299–319. doi:10.1093/biomet/asaa076. https://arxiv.org/abs/1712.04912
Niiniluoto, I. (2018). Truth-seeking by abduction. Cham, Switzerland: Springer. https://link.springer.com/book/10.1007/978-3-319-99157-3
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, England: Cambridge University Press. https://bayes.cs.ucla.edu/BOOK-2K/neuberg-review.pdf
Pearl, J. (2009) Causality. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511803161. https://www.cambridge.org/core/books/causality/B0046844FAE10CBF274D4ACBDAEB5F5B
Peters, J., Janzing, D., & Schölkopf, B. (2017). The elements of causal inference: Foundations and learning algorithms. Cambridge, MA: The MIT Press. Pietsch, W. (no date) ‘Big Data – The New Science of Complexity. https://books.google.com.cu/books/about/Elements_of_Causal_Inference.html?id=XPpFDwAAQBAJ&redir_esc=y
Prensky, M. (2009). H. Sapiens Digital: From Digital Immigrants and Digital Natives to Digital Wisdom, p. 11. https://eric.ed.gov/?id=ej834284
Ratti, E. and López-Rubio, E. (2018). MECHANISTIC MODELS AND THE EXPLANATORY LIMITS OF MACHINE LEARNING. Machine Learning, p. 18. https://philsci-archive.pitt.edu/14452/1/manuscript%20philsci%20-%20Ratti%20%26%20Lopez-Rubio.pdf
Schmidt, M. and Lipson, H. (2009). Distilling Free-Form Natural Laws from Experimental Data. Science, 324(5923), 81–85. doi: 10.1126/science.1165893. https://www.science.org/doi/10.1126/science.1165893
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. Cambridge, MA: The MIT Press. https://direct.mit.edu/books/monograph/2057/Causation-Prediction-and-Search
Steadman, I. (2013). Big data and the death of the theorist. Wired UK, 25 January. https://www.wired.co.uk/article/big-data-end-of-theory
Tukey, J. W. (1962). The Future of Data Analysis. Ann. Math. Statist. 33(1): 1-67 (March, 1962). DOI: 10.1214/aoms/1177704711. https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-33/issue-1/The-Future-of-Data-Analysis/10.1214/aoms/1177704711.full
Van Fraassen, B. C. (1980) The Scientific Image. Oxford University Press. https://epistemh.pbworks.com/f/2.+Oxford.University.Press.USA.The.Scientific.Image.Okt.1980.pdf
Vélez, B. (2018). Fines y estrategias de un modelo de universidad socialmente responsable. Sophia-Educación, 4 (2). https://dialnet.unirioja.es/servlet/articulo?codigo=6996273
Wigner, E.P. (1960). The unreasonable effectiveness of mathematics in the natural sciences. Richard Courant lecture in mathematical sciences delivered at New York University, May 11, 1959, Communications on Pure and Applied Mathematics, 13(1), 1–14. doi:10.1002/cpa.3160130102. https://www.researchgate.net/publication/227990770_The_unreasonable_effectiveness_of_mathematics_in_the_natural_sciences_Richard_Courant_lecture_in_mathematical_sciences_delivered_at_New_York_University_May_11_1959
Wu, C. F. J. (1997). Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence’, Philosophy and Technology, 1–24. doi: 10.1007/s13347-019-00382-7. https://arxiv.org/pdf/1903.04361/1000
Downloads
Published
Issue
Section
License
Creative Commosn Licence 4.0

