Data science academic programs in the pre-ChatGPT era in the Midwestern United States: a curated dataset

Published in Mathematics

Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

We often hear about announcements for new data science degrees, certificates, and bootcamps. Still, when we try to answer basic questions about these programs, we quickly discover that it is challenging to paint a systematic picture of what data science education actually looks like in many regions in the US.​

How many data science degrees exist in our region? Are they more computing‑, statistics‑, or business‑focused? How are community colleges, regional universities, and research universities each approaching data science? At the same time, we are watching generative AI move from research papers to everyday tools, and this will trigger another wave of curriculum changes. To understand how data science education responds to technological shocks like large language models, we first need a solid baseline of what exists just before such a shift.​

This is the motivation behind our curated dataset: to capture a reproducible snapshot of data science-related academic programs in the Midwestern United States as of 2023, right before generative AI tools became truly mainstream in teaching and practice.​

The dataset documents 404 distinct academic programs whose names include the word Data, offered by 225 school systems across 12 Midwestern states. Each row captures a single program, not just a school, and includes the state, institution, campus city, program name, level (undergraduate or graduate), program type (for example bachelor’s, master’s, certificate, minor), whether it is a major or minor (or both), the delivery format (online, campus, hybrid, or not specified), and a classification into one of four categories, along with notes and links to program webpages. All data and code are openly available via Harvard Dataverse at https://doi.org/10.7910/DVN/0H4ZHG

Very quickly we ran into the core problem: despite the consistent use of the phrase data science, there lacked a consistent meaning. We encountered bachelor’s degrees called Data Science that were primarily statistics with a light programming component, Data Analytics programs housed in business schools with little or no computing or mathematics, minors called Data Science that consisted of a few applied statistics courses, and programs run jointly by mathematics, computer science, and other departments. If we simply scraped program names, we would propagate this ambiguity rather than clarify it.​

To move beyond labels, we designed a typology and a transparent classification process that considered both who directs the program and what students actually study. We ended up with four mutually exclusive categories: i) Data Science (DS) programs are led by mathematics, statistics, computer science, or data‑science‑related departments and require both substantial mathematical or statistical content and computer science or computing content, without requiring a secondary major or minor; ii) Interdisciplinary Data Science (IDS) programs meet similar curricular criteria but either require a minor or second major or are directed jointly with departments outside mathematics or computing; iii) Data Science as a Concentration (DSC) refers to data‑science tracks only offered as concentrations within another major; and iv) Data Analytics (DA) programs include data in the title but either lack the combined math‑plus‑CS requirement or are overseen by non‑mathematical/computational departments regardless of curriculum.​

To make these decisions reproducible, we created flowcharts that walk a user step by step through classification: first by department leadership and the presence of both mathematics and computing in the curriculum, and, secondarily, when departments are unclear, by course prefixes and content. These flowcharts are part of the released materials so that others can apply or adapt the typology to new regions or later years.​

Between February and December 2023, we systematically worked through lists of institutions obtained from CollegeBoard’s College Search tool, filtered to the 12 Midwestern states. For each institution we searched its academic offerings for titles including Data and used the site’s search function for Data Science and Data Analytics, then manually reviewed program pages to identify responsible departments, required courses, and delivery format. Every program that met our inclusion criteria (name contains Data and fits one of the four classifications) was entered into a shared spreadsheet, with notes justifying non-Data Science classifications.​

A few practical issues shaped our decisions. While we were cleaning the data, several institutions updated or completely revamped their programs. We chose not to chase these updates, to preserve temporal consistency: the dataset is explicitly a snapshot as of 2023, not a living catalog. For university systems where the same program appeared at multiple regional campuses, often with online delivery, we combined these into a single program entry with a Multiple Campuses indicator and listed all relevant cities. In cases where departmental leadership was opaque, we relied on course prefixes and curriculum details via our second flowchart or excluded the program if it could not be classified reliably.​

When we summarized the data, several patterns emerged. Of the 404 programs, 221 (about 55%) are classified as Data Science, 137 (around 34%) as Data Analytics, 20 (about 5%) as Interdisciplinary Data Science, and 26 (about 6%) as Data Science as a Concentration. Universities account for most programs, with Data Science offerings dominating but Data Analytics remaining substantial; community colleges and technical or engineering institutions lean heavily toward Data Analytics, often with shorter credentials. Institutions in the Other colleges category more frequently brand their offerings as Data Science rather than Data Analytics, though in smaller absolute numbers.​

Although our dataset focuses on 12 Midwestern states, the issues it surfaces are broader. Without program‑level, curriculum‑aware datasets, it is difficult to know whether we are preparing enough students with strong technical foundations, how community‑college pathways in data analytics relate to four‑year degrees, or how closely programs align with regional labor‑market needs. By making both the data and the classification protocol public, we hope others will replicate the approach in other regions, link programs to labor‑market outcomes, and revisit these same institutions in 5, 10, or 25 years to quantify how curricula change as AI and large language models become part of everyday data work.​

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in