Skip to main content

I Did a Boo Boo

Last night, I looked at a chart that had been tweeted out by Marco Learning, a terrific source for information about The College Board's AP Program.  It showed the percentage of all scores graded 4 and 5 over time by subject, and there were some glaring points: Lots of big increases in certain subjects that didn't seem to make sense.  Turns out, their data was correct.

Wanting to dive down a little deeper, I went to the College Board website to look at the data myself, and to "download" it for some additional analysis.  I put the word download in quotation marks on purpose.

I have a history with College Board, of course.  I used to download the very rich AP data by state, exam, and ethnicity they'd post on their site and put it into an interactive format that pulled out insight better than the large, text-exclusive spreadsheets they'd post.  Then--despite the organization's oft-cited commitment to transparency--they stopped.  

In an example of Newspeak worthy of the novel 1984 that they might want to use in a future AP English Literature Exam, College Board said they were going to implement a "streamlined" reporting protocol for the data.  Less data, and less insight, in other words, was better. (They also announced that their "Landscape" product was being pulled down while they were saying they were making it more transparent, by the way, and no high school person has access to it today.)

Anyway, this chart shows incorrect data for AP Psych, suggesting that the percentage of 4 and 5 scores increased by 42 percentage points between 2022 and 2024.  Let me explain how it found its way into my tweet, and the larger issues it points out.


You can still download summary data at the subject level (but not more detailed than that) on the College Board website, but it comes in a messy format that makes one think they don't really want you to do any analysis on it.  It has hidden rows, hidden columns, merged cells, and different formats by row that make anything other than tedious manual extraction almost impossible. It looks like this; the data are clearly intended for casual users who want a quick answer, and not in a way that makes it easy to study in-depth.



So, after getting frustrated after wrangling this and admitting I'd been foiled by the data people on Vesey Street, I settled not for raw data, but for summaries on their website, on pages like this for 2024 and this for 2022.  I manually copied all the tables, pasted them into Excel, and then set about cleaning them up.  Even that was frustrating:  In some years, College Board calls its exam "AP English Language & Composition," while in other years, it's "AP English Language and Composition."  Similarly, it's either  "AP 2D Art & Design" or  "AP 2-D Art and Design." Some years, data are rounded to the nearest whole number; in others, to one decimal point. These are insignificant differences to human readers, but they're a big deal for computers.  

All seemed to be going well, although the year-to-year changes in nomenclature and formatting seem capricious and undisciplined from a data standpoint, especially for an organization that prides itself on its research and analysis capabilities.

And, finally, on the 2024 link, above, guess what? AP Psych is listed twice: First under "History and Social Sciences" 


and then again under "Sciences."  So, AP Psych in 2024 (but not the other years) got counted twice.

Had I been successful in just downloading and cleaning the numbers, this would not have happened because I calculate the percentage of the totals of raw numbers.  But because I had to scrape this off a website, this error showed up.  I should have checked this a couple of ways before posting, but I didn't, and that's my fault.

This would normally be where I'd call on College Board to make their data more accessible to the general public in the interest of transparency, but a) they don't listen, b) they don't give a crap about the members, and c) they just wait for people to forget how bad they are at the most simple things and keep paying their executives multi-million dollar salaries.  

And these are the people, I'd remind you, who are being asked to fix the FAFSA, and despite the massive conflict of interest it creates, gleefully and arrogantly agree to do so. 

All is good.  Carry on.  I'll post the complete data soon after I do more more auditing. 


Comments

Popular posts from this blog

Changes in AP Scores, 2022 to 2024

Used to be, with a little work, you could download very detailed data on AP results from the College Board website: For every state, and for every course, you could see performance by ethnicity.  And, if you wanted to dig really deep, you could break out details by private and public schools, and by grade level.  I used to publish the data every couple of years. Those days are gone.  The transparency The College Board touts as a value seems to have its limits, and I understand this to some extent: Racists loved to twist the data using single-factor analysis, and that's not good for a company who is trying to make business inroads with under-represented communities as they cloak their pursuit of revenue as an altruistic push toward access. They still publish data, but as I wrote about in my last post , it's far less detailed; what's more, what is easily accessible is fairly sterile, and what's more detailed seems to be structured in a way that suggests the company doesn&

Freshman Migration, 1986 to 2020

(Note: I discovered that in IPEDS, Penn State Main Campus now reports with "The Pennsylvania State University" as one system.  So when you'd look at things over time, Penn State would have data until 2018, and then The Penn....etc would show up in 2020.  I found out Penn State main campus still reports its own data on the website, so I went there, and edited the IPEDS data by hand.  So if you noticed that error, it should be corrected now, but I'm not sure what I'll do in years going forward.) Freshman migration to and from the states is always a favorite visualization of mine, both because I find it a compelling and interesting topic, and because I had a few breakthroughs with calculated variables the first time I tried to do it. If you're a loyal reader, you know what this shows: The number of freshman and their movement between the states.  And if you're a loyal viewer and you use this for your work in your business, please consider supporting the costs

The Highly Rejective Colleges

If you're not following Akil Bello on Twitter, you should be.  His timeline is filled with great insights about standardized testing, and he takes great effort to point out racism (both subtle and not-so-subtle) in higher education, all while throwing in references to the Knicks and his daughter Enid, making the experience interesting, compelling, and sometimes, fun. Recently, he created the term " highly rejective colleges " as a more apt description for what are otherwise called "highly selective colleges."  As I've said before, a college that admits 15% of applicants really has a rejections office, not an admissions office.  The term appears to have taken off on Twitter, and I hope it will stick. So I took a look at the highly rejectives (really, that's all I'm going to call them from now on) and found some interesting patterns in the data. Take a look:  The 1,132 four-year, private colleges and universities with admissions data in IPEDS are incl