Una oteada reflexiva hacia el contexto epistemológico de la analítica de datos

Luis Miguel Mejia  Giraldo; Ximena Cifuentes Wchima; Bibiana Vélez Medina; John Edward Herrera; Luis Fernando  Restrepo Betancur

doi:10.18634/sophiaj.21v.1i.1440

Authors

Luis Miguel Mejia Giraldo Universidad La Gran Colombia Author
Ximena Cifuentes Wchima Universidad La Gran Colombia Author
Bibiana Vélez Medina Universidad La Gran Colombia Author
John Edward Herrera Universidad La Gran Colombia Author
Luis Fernando Restrepo Betancur Universidad La Gran Colombia Author

DOI:

https://doi.org/10.18634/sophiaj.21v.1i.1440

Keywords:

data science, field of research, education and research, epistemology

Abstract

The modern abundance and prominence of data has led to the development of “data science” as a new field of research, along with a body of epistemological reflections on its foundations, methods, and consequences. This article is derived from the research exercise on the purposes of education where the analysis of knowledge provides a systematic dynamic and a critical review of important problems and open debates in the epistemology of analytics and data science, proposing a division of epistemology of data science in the following five aspects: Maximalistic and minimalist characterizations, descriptive taxonomies, the knowledge generated by data science, black box problems and science in a data-intensive paradigm, aspects that provide a reflective exercise against to understanding and addressing essential aspects of data interpretation and understanding hidden patterns in them, this being the challenge of analytics as such.

Author Biographies

Luis Miguel Mejia Giraldo, Universidad La Gran Colombia

Master's in Sustainable Development and Environment. Associate Professor - Faculty of Engineering, La Gran Colombia University. Armenia, Colombia. Leader of the GIDA Research Group. Email: mejiagluismiguel@miugca.edu.co
Ximena Cifuentes Wchima, Universidad La Gran Colombia

Master's in Sustainable Development and Environment. Dean - Faculty of Engineering, La Gran Colombia University. Armenia, Colombia. Member of the Land Management Research Group. Email: defingenieria@ugca.edu.co
Bibiana Vélez Medina, Universidad La Gran Colombia

Ph.D. in Educational Sciences. M.A. in Education. Acting Rector of La Gran Colombia University. Armenia, Colombia. Leader of the PAIDEIA research group. Email: rectoraugca@ugca.edu.co
John Edward Herrera, Universidad La Gran Colombia

Master's degree in Integrated Quality Management Systems. Associate Professor - Faculty of Engineering, La Gran Colombia University. Armenia, Colombia. Email: herreraquijohn@miugca.edu.co
Luis Fernando Restrepo Betancur, Universidad La Gran Colombia

Master's in Sustainable Development and Environment. Associate Professor - Faculty of Engineering, La Gran Colombia University. Armenia, Colombia. Leader of the GIDA Research Group. Email: mejiagluismiguel@miugca.edu.co

References

Alemany Oliver, M. and Vayre, J.-S. (2015). Big data and the future of knowledge production in marketing research: Ethics, digital traces, and abductive reasoning. Journal of Marketing Analytics, 3(1), pp. 5–13. doi: 10.1057/jma.2015.1. https://link.springer.com/article/10.1057/jma.2015.1

Anderson, C. (2008). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete’. Wired. https://www.wired.com/2008/06/pb-theory/

Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Pad, D. (2019). Invariant risk minimization. arXiv preprint, 1907.02893. https://www.researchgate.net/publication/334288906_Invariant_Risk_Minimization

Bareinboim, E., Lee, S., & Zhang, J. (2021). An introduction to causal reinforcement learning. Columbia CausalAI Laboratory, Technical Report (R-65). https://ics.uci.edu/~dechter/courses/ics-295cr/2024-25_Q2_Winter/presentations/P1%20-%20Jiapeng%20Zhao%20-%20An%20Introduction%20to%20Causal%20Reinforcement%20Learning.pdf

Blei, D. M. and Smyth, P. (2017). Science and data science. Proceedings of the National Academy of Sciences, 114(33), 8689–8692. doi: 10.1073/pnas.1702076114. https://www.pnas.org/doi/10.1073/pnas.1702076114

Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231. doi: 10.1214/ss/1009213726. https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.full

Burrell, J. (2016). How the machine “thinks: Understanding opacity in machine learning algorithms. Big Data & Society. doi: 10.1177/2053951715622512. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2660674

Carmichael, I. and Marron, J. S. (2018). Data Science vs. Statistics: Two Cultures? Japanese Journal of Statistics and Data Science, 1(1), 117–138. doi: 10.1007/s42081-018-0009-3. https://arxiv.org/abs/1801.00371

Chambers, J. (1993). Classes and Methods in S.I: Recent Developments Computational Statistics, 8:3, 167-184.

Chambers, J. M. (1993). Greater or lesser statistics: a choice for future research. Statistics and Computing, 3(4), 182–184. doi: 10.1007/BF00141776. https://link.springer.com/article/10.1007/BF00141776

Cifuentes et al. (2016). Métodos de análisis para la investigación, desarrollo e innovación (I+D+i) de procesos agrícolas y agroindustriales. En https://www.ugc.edu.co/sede/armenia/files/editorial/metodos_de_analisis_para_la_investigacion.pdf

Donoho, D. (2017). 50 Years of Data Science. doi: https://doi.org/10.1080/10618600.2017.1384734 745-766 https://www.tandfonline.com/doi/full/10.1080/10618600.2017.1384734

Doshi-Velez, F. and Kim, B. (2017). Towards A Rigorous Science of Interpretable Machine Learning. http://arxiv.org/abs/1702.08608

Frické, M. (2015). Big data and its epistemology. Journal of the Association for Information Science and Technology. 66(4), pp. 651–661. doi: 10.1002/asi.23212. https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.23212

Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J. & Krüger, L. (2013). The empire of chance: How probability changed science and everyday life. New York: Cambridge University Press. https://books.google.com.cu/books/about/The_Empire_of_Chance.html?id=Bw2yKfpvts8C&redir_esc=y

Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models. Frontiers in genetics, 10, 524. https://doi.org/10.3389/fgene.2019.00524

Hacking, I. (1975). The emergence of probability: A philosophical study of early ideas about probability, induction, and statistical inference. New York: Cambridge University Press. https://www.cambridge.org/core/books/emergence-of-probability/9852017A380C63DA30886D25B80336A7

Harman, G. (1965). The inference to the best explanation. Philosophical Review, 74(1), 88-95. https://www.jstor.org/stable/2183532

Harman, G. & Kulkarni, S. (2007). Reliable reasoning: Induction and statistical learning theory. Cambridge, MA: The MIT Press. https://direct.mit.edu/books/monograph/2565/Reliable-ReasoningInduction-and-Statistical

Hey, T., Tansley, S. and Tolle, K. (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery. pp 287. https://www.microsoft.com/en-us/research/publication/fourth-paradigm-data-intensive-scientific-discovery/

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), p. 2053951714528481. doi: 10.1177/2053951714528481. https://www.researchgate.net/publication/271525133_Big_Data_New_Epistemologies_and_Paradigm_Shift

Kitcher, P. (1976). Explanation, Conjunction, and Unification. The Journal of Philosophy, 73(8), 207–212. doi:10.2307/2025559. https://www.jstor.org/stable/2025559

Kitcher, P. (1989). Explanatory unification and the causal structure of the world. In P. Kitcher & W. Salmon (eds.), Scientific Explanation, 410-505. Minneapolis: University of Minnesota Press. https://conservancy.umn.edu/server/api/core/bitstreams/8f6f9fe7-b511-43cd-8d75-5c8570fefd59/content

Krishnan, M. (2020). Against Interpretability: a Critical Examination of the Interpretability Problem in Machine Learning. Philosophy & Technology, 33(3), 487–502. doi: 10.1007/s13347-019-00372-9. https://www.researchgate.net/publication/335148516_Against_Interpretability_a_Critical_Examination_of_the_Interpretability_Problem_in_Machine_Learning

Kuhn, T. S. (1970). The structure of scientific revolutions. 2nd Edition. Chicago: University of Chicago Press. https://www.lri.fr/~mbl/Stanford/CS477/papers/Kuhn-SSR-2ndEd.pdf

Leonelli, S. (2014). What difference does quantity make? On the epistemology of Big Data in biology. Big Data & Society, 1(1), 2053951714534395. doi: 10.1177/2053951714534395. https://journals.sagepub.com/doi/10.1177/2053951714534395

Lipton, P. (1991). Inference to the best explanation. London: Routledge. https://books.google.es/books?id=WIfYNExpSC0C

Lipton, Z. C. (2018). The mythos of model interpretability. Communications of the ACM, 61(10), 36–43. doi: 10.1145/3233231. https://arxiv.org/abs/1606.03490

MacKenzie, D. (1984). Statistics in Britain, 1865-1930: The social construction of scientific knowledge. Edinburgh: Edinburgh University Press. https://gwern.net/doc/statistics/1981-mackenzie-statisticsinbritain18651930.pdf

Mallows, C. (2006). Tukey’s Paper After 40 Years. Technometrics, 48, pp. 319–325. doi: 10.1198/004017006000000219. https://www.researchgate.net/publication/238879758_Tukey%27s_Paper_After_40_Years

Mayo, D. (1996). Error and the growth of experimental knowledge. Chicago: University of Chicago Press. https://errorstatistics.com/wp-content/uploads/2020/10/egek-pdf-red.pdf

Mayo, D. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. New York: Cambridge University Press. https://www.cambridge.org/core/books/statistical-inference-as-severe-testing/D9DF409EF568090F3F60407FF2B973B2

Napoletani, D., Panza, M. and Struppa, D. (2018). The Agnostic Structure of Data Science Methods. p. 17. https://arxiv.org/abs/2101.12150

Nash, J. (1950). Non-Cooperative Games. PhD thesis, Princeton University.

Nash, J. (1951). Non-Cooperative Games. The Annals of Mathematics,54(2):286–295. https://www.jstor.org/stable/1969529

Nie, X. and Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), pp. 299–319. doi:10.1093/biomet/asaa076. https://arxiv.org/abs/1712.04912

Niiniluoto, I. (2018). Truth-seeking by abduction. Cham, Switzerland: Springer. https://link.springer.com/book/10.1007/978-3-319-99157-3

Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, England: Cambridge University Press. https://bayes.cs.ucla.edu/BOOK-2K/neuberg-review.pdf

Pearl, J. (2009) Causality. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511803161. https://www.cambridge.org/core/books/causality/B0046844FAE10CBF274D4ACBDAEB5F5B

Peters, J., Janzing, D., & Schölkopf, B. (2017). The elements of causal inference: Foundations and learning algorithms. Cambridge, MA: The MIT Press. Pietsch, W. (no date) ‘Big Data – The New Science of Complexity. https://books.google.com.cu/books/about/Elements_of_Causal_Inference.html?id=XPpFDwAAQBAJ&redir_esc=y

Prensky, M. (2009). H. Sapiens Digital: From Digital Immigrants and Digital Natives to Digital Wisdom, p. 11. https://eric.ed.gov/?id=ej834284

Ratti, E. and López-Rubio, E. (2018). MECHANISTIC MODELS AND THE EXPLANATORY LIMITS OF MACHINE LEARNING. Machine Learning, p. 18. https://philsci-archive.pitt.edu/14452/1/manuscript%20philsci%20-%20Ratti%20%26%20Lopez-Rubio.pdf

Schmidt, M. and Lipson, H. (2009). Distilling Free-Form Natural Laws from Experimental Data. Science, 324(5923), 81–85. doi: 10.1126/science.1165893. https://www.science.org/doi/10.1126/science.1165893

Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. Cambridge, MA: The MIT Press. https://direct.mit.edu/books/monograph/2057/Causation-Prediction-and-Search

Steadman, I. (2013). Big data and the death of the theorist. Wired UK, 25 January. https://www.wired.co.uk/article/big-data-end-of-theory

Tukey, J. W. (1962). The Future of Data Analysis. Ann. Math. Statist. 33(1): 1-67 (March, 1962). DOI: 10.1214/aoms/1177704711. https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-33/issue-1/The-Future-of-Data-Analysis/10.1214/aoms/1177704711.full

Van Fraassen, B. C. (1980) The Scientific Image. Oxford University Press. https://epistemh.pbworks.com/f/2.+Oxford.University.Press.USA.The.Scientific.Image.Okt.1980.pdf

Vélez, B. (2018). Fines y estrategias de un modelo de universidad socialmente responsable. Sophia-Educación, 4 (2). https://dialnet.unirioja.es/servlet/articulo?codigo=6996273

Wigner, E.P. (1960). The unreasonable effectiveness of mathematics in the natural sciences. Richard Courant lecture in mathematical sciences delivered at New York University, May 11, 1959, Communications on Pure and Applied Mathematics, 13(1), 1–14. doi:10.1002/cpa.3160130102. https://www.researchgate.net/publication/227990770_The_unreasonable_effectiveness_of_mathematics_in_the_natural_sciences_Richard_Courant_lecture_in_mathematical_sciences_delivered_at_New_York_University_May_11_1959

Wu, C. F. J. (1997). Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence’, Philosophy and Technology, 1–24. doi: 10.1007/s13347-019-00382-7. https://arxiv.org/pdf/1903.04361/1000

A reflective look at the epistemological context of data analytics

Authors

DOI:

Keywords:

Abstract

Author Biographies

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

registro

codigo-qr

Information

Language

bases

licencia

Usuario:
Contraseña:
No cerrar sesión