Skip to main content

I Did a Boo Boo

Last night, I looked at a chart that had been tweeted out by Marco Learning, a terrific source for information about The College Board's AP Program.  It showed the percentage of all scores graded 4 and 5 over time by subject, and there were some glaring points: Lots of big increases in certain subjects that didn't seem to make sense.  Turns out, their data was correct.

Wanting to dive down a little deeper, I went to the College Board website to look at the data myself, and to "download" it for some additional analysis.  I put the word download in quotation marks on purpose.

I have a history with College Board, of course.  I used to download the very rich AP data by state, exam, and ethnicity they'd post on their site and put it into an interactive format that pulled out insight better than the large, text-exclusive spreadsheets they'd post.  Then--despite the organization's oft-cited commitment to transparency--they stopped.  

In an example of Newspeak worthy of the novel 1984 that they might want to use in a future AP English Literature Exam, College Board said they were going to implement a "streamlined" reporting protocol for the data.  Less data, and less insight, in other words, was better. (They also announced that their "Landscape" product was being pulled down while they were saying they were making it more transparent, by the way, and no high school person has access to it today.)

Anyway, this chart shows incorrect data for AP Psych, suggesting that the percentage of 4 and 5 scores increased by 42 percentage points between 2022 and 2024.  Let me explain how it found its way into my tweet, and the larger issues it points out.


You can still download summary data at the subject level (but not more detailed than that) on the College Board website, but it comes in a messy format that makes one think they don't really want you to do any analysis on it.  It has hidden rows, hidden columns, merged cells, and different formats by row that make anything other than tedious manual extraction almost impossible. It looks like this; the data are clearly intended for casual users who want a quick answer, and not in a way that makes it easy to study in-depth.



So, after getting frustrated after wrangling this and admitting I'd been foiled by the data people on Vesey Street, I settled not for raw data, but for summaries on their website, on pages like this for 2024 and this for 2022.  I manually copied all the tables, pasted them into Excel, and then set about cleaning them up.  Even that was frustrating:  In some years, College Board calls its exam "AP English Language & Composition," while in other years, it's "AP English Language and Composition."  Similarly, it's either  "AP 2D Art & Design" or  "AP 2-D Art and Design." Some years, data are rounded to the nearest whole number; in others, to one decimal point. These are insignificant differences to human readers, but they're a big deal for computers.  

All seemed to be going well, although the year-to-year changes in nomenclature and formatting seem capricious and undisciplined from a data standpoint, especially for an organization that prides itself on its research and analysis capabilities.

And, finally, on the 2024 link, above, guess what? AP Psych is listed twice: First under "History and Social Sciences" 


and then again under "Sciences."  So, AP Psych in 2024 (but not the other years) got counted twice.

Had I been successful in just downloading and cleaning the numbers, this would not have happened because I calculate the percentage of the totals of raw numbers.  But because I had to scrape this off a website, this error showed up.  I should have checked this a couple of ways before posting, but I didn't, and that's my fault.

This would normally be where I'd call on College Board to make their data more accessible to the general public in the interest of transparency, but a) they don't listen, b) they don't give a crap about the members, and c) they just wait for people to forget how bad they are at the most simple things and keep paying their executives multi-million dollar salaries.  

And these are the people, I'd remind you, who are being asked to fix the FAFSA, and despite the massive conflict of interest it creates, gleefully and arrogantly agree to do so. 

All is good.  Carry on.  I'll post the complete data soon after I do more more auditing. 


Comments

Popular posts from this blog

Educational Attainment and the Presidential Elections

I've been fascinated for a while by the connection between political leanings and education: The correlation is so strong that I once suggested that perhaps Republicans were so anti-education because, in general, places with a higher percentage of bachelor's degree recipients were more likely to vote for Democrats. The 2024 presidential election puzzled a lot of us in higher education, and perhaps these charts will show you why: We work and probably hang around mostly people with college degrees (or higher).  Our perception is limited. With the 2024 election data just out , I thought I'd take a look at the last three elections and see if the pattern I noticed in 2016 and 2020 held.  Spoiler: It did, mostly. Before you dive into this, a couple of tips: Alaska's data is always reported in a funky way, so just ignore it here.  It's a small state (in population, that is) and it's very red.  It doesn't change the overall trends even if I could figure out how to c...

First-year student (freshman) migration, 2022

A new approach to freshman migration, which is always a popular post on Higher Ed Data Stories. If you're a regular reader, you can go right to the visualization and start interacting with it.  And I can't stress enough: You need to use the controls and click away to get the most from these visualizations. If you're new, this post focuses on one of the most interesting data elements in IPEDS: The geographic origins of first-year (freshman) students over time.  My data set includes institutions in the 50 states and DC.  It includes four-year public and four-year, private not-for-profits that participate in Title IV programs; and it includes traditional institutions using the Carnegie classification (Doctoral, Masters, Baccalaureate, and Special Focus Schools in business, engineering, and art/design. Data from other institutions is noisy and often unreliable, or (in the case of colleges in Puerto Rico, American Samoa, and other territories, often shows close to 100% of enro...

Education Levels in the US, by State and Attainment

Attainment has always been an interesting topic for me, every since I first got stunned into disbelief when I looked at the data over time.  Even looking at shorter periods can lead to some revelations that many don't make sense at first. Here is the latest data from NCES, published in the Digest of Education Statistics . Please note that this is for informational purposes only, and I've not even attempted to visualize the standard errors in this data, which vary from state-to-state.  There are four views year, all looking at educational attainment by state in 2012 and 2022.   The first shows data on a map: Choose the year, and choose the level of attainment.  Note that the top three categories can be confusing: BA means a Bachelor's degree only; Grad degree means at least a Master's (or higher, of course); and BA or more presumably combines those two.  Again, standard errors might mean the numbers don't always add up perfectly. The second shows the data o...