Big Fat Duh

There’s a great deal I know only a little about, still more about which I know nothing, and a terrifyingly-small number of things I know quite a lot about. [/Donald Rumsfeld]

But one of the things which fall into the latter category is that of statistical sampling, because my very first real job was in the Statistics Department of what was then the largest marketing research company in the world (the Great Big Research Company, or A.C. Nielsen).  And my specific area of expertise was in sample selection:  the methodology of creating a sample, the data drawn from which would accurately represent reality.  A single anecdote will suffice.

One of our major clients was the yogurt-producing subsidiary of a large dairy corporation (think: Yoplait).  Our data was always being questioned by this company, because in some cases we would show their market share as being too small (the sales numbers didn’t jibe with their actual deliveries to stores, for example —  a known quantity), or else, paradoxically, far too large, for exactly the same reason:  all dependent on which geographical area we were reporting on.

My job was to investigate this phenomenon, and some months later I discovered the reason.  The various smaller dairies’ yogurts were not being delivered to all the stores in the area, but in stores where they did have fridge space, they sold extremely well.  Using a simple picture shows the problem:

Our sample of stores may have been representative of say, total grocery sales in the area (and it was), but when Yogurt sales were carved out, the sample simply sucked because of how the dairies’ distribution worked.

It’s a very complex problem, and it applies to just about any sample selection.  In this case, there was no solution other than to broaden the sample, which would have cost too much.  So unless the client was prepared to pay a much higher fee to get better data, they’d either have to live with suspect data or cancel their account altogether.  (The end result was that they stopped looking at specific markets, and only bought data at the national level, which was acceptably accurate, but less useful to the local sales teams.)

I told you all that so I could talk about this.

Harris’ So-Called ‘Surge’ Is Thanks To Oversampling: Pollsters

In the meta data from the call centers college educated Dems are 3-4x more likely to answer than non-college. While weighting can help minimize the bias if done correctly it won’t totally eliminate the problem.
— Mark Davin Harris (@markdharris) August 16, 2024

Critics point out that many polls have been sampling a disproportionately smaller share of Republican voters compared to exit poll data from the 2020 presidential election. The result, they say, is a misleading “phantom advantage” for Ms. Harris. According to them, this skewed sampling could be a strategic move to boost enthusiasm and fundraising for Ms. Harris’ campaign.

Usually, when I talk about situations like this, I use a shorthand expression like:  “They must have drawn their sample from the Harvard Faculty Lounge.”

Unscrupulous polling companies can (and do) draw their samples to show exactly what the clients want to see — tailoring the samples to produce the desired results.  We used to call this the “K factor”:  that number which when applied to the data will provide the result most favorable to the client.  It’s more commonly known within the research community as “bullshit”, but it’s bullshit that will generate headlines — so ten guesses as to whether the mainstream media will accept such data uncritically, either because it favors their own bias/opinion or because they are completely incapable of analyzing the data properly.  (If you answered “or both” to the above, go to the head of the class.)

So is the “Kamala Surge” real, or not?  Given all the players in this particular piece of theater… oh please, it’s patent bullshit.

20 comments

  1. Of course it’s bullshit. These people are political operatives, in other words, professional liars. I can’t get too exercised about that, as it’s all a show and I expect nothing less. And why are you still voting, anyway?

    What I do find irritating is that my local store (the mighty HEB), has a giant, 10 foot wide cooler with their store brand yogurt (you brought up yogurt). On any given day, most of the cooler has some of each flavor, except for the little section on the end that is supposed to contain my wife’s favorite yogurt. If you get there within a hour of their restocking, you can have some, otherwise not. All the other types of yogurt, like low-fat, pumpkin-spice, asparagus, are readily available. I was under the impression that these stores have ultra-sophisticated, per store, stock-keeping systems that prevent precisely this problem … sell more = stock more. Judging by my sampling, they could sell 5x my wife’s yogurt if they only kept it in stock. Are they idiots? Unlike the elections, this seems like a problem that could actually be solved.

    1. Iv’e noticed that as well. As I understand it part of the problem is the way the Supermarkets allocate shelf space. In order to accommodate all the vendors, they each get X inches of shelf space. More flavors of Yogurt = more shelf space. Sell out of Pumpkin Spice Banana in the first hour, it’s supposed to restocked ( But stocking/restocking is often the responsibility of the distributor and not the store) so the shelf remains empty. The other problem is at the cashier end. Properly trained chshiers know they are supposed to scan each individual item. But some are either lazy or poorly trained and see 10 yogurt containers so they enter 10 @ $x and scan the one passion fruit flavor and just push the other flavors down to the bagger resulting in bad data and sales that don’t match the stock.

      so yes – Garbage in / Garbage out. Always has been. As an installer of enterprise Data systems for manufacturing we always stressed the need for accurate data entry. Automated as much as possible. Manual data entry by the machine operators is hopelessly inaccurate.

  2. Funny, I’ve been having a running conversation about this on Signal with a few friends. I don’t have your level of expertise, but from my college days in Economics, I have some experience with the importance of quality sampling. Of course it’s total bullshit.

    On literally every issue, illegal immigration, crime, censorship, inflation, entanglement in foreign wars—Trump polls much higher than Kamala. I heard a recent poll on X puts it Trump over Kalala at 68%-32%. Since Elon took over, I suspect that pol is oversampling Trump supporters, but my point is, when you look at Trump rally attendance vs Kamalatoe’s, and the issues polling, in a fair election (paper ballots, Real ID, proof of citizenship) any fool knows Trump would win this thing walking away. Possibly in the neighborhood of 60/40, old cat ladies notwithstanding.

    We’ve also been ruminating about how the deep state will play this, and they are telegraphing with Jaimie Raskin’s saying the quiet part out loud, Merrill Garland telegraphing how the DOJ will deal with “disinformation” around the results of the election, etc.

    In AZ, they’ve admitted the voting machines are open to manipulation by hackers but won’t address it until after the election. In the Detroit county in Michigan, poll watchers are 93% Democrat. My predictive skills are not that good, often times, but I’d put money on this scenario.

    They keep Kamala in the basement because she’s a train wreck every time she’s in front of a microphone. They don’t need her to do anything because the fix is in, the steal this time will be epic, and they don’t care. As soon as Kamala is declared the winner, the FBI is going to start rounding up the big names with 3:00 a.m. full SWAT team raids, people like Elon, Tucker, Sean, Mark Levin, will all be marched out of their homes in their pajamas with coordinated full CNN coverage to send a message to the rest of us.

    The rest of us who gripe too loudly on X, FarceBook, or Instagram, the local Stasi in blue areas will start to round us up, this time instead of 1,500 charged and/or incarcerated, it will be in the tens of thousands. Sheriffs in blue areas will tell the Stasi to FRO and I have no idea how that will play out, but it won’t be pretty.

    It’s going to be an interesting Holiday season this year, folks. We all know they are not going to go quietly. Be ready.

    JC

    1. “Sheriffs in blue areas will tell the Stasi to FRO and I have no idea how that will play out, but it won’t be pretty.”

      Seeing as how ‘FRO’ means “back” or “Away”, did you mean Sheriffs in red areas?

        1. Maybe in YOUR red area. In mine the county sheriff’s office will bend over backward to fellate the feds.

          1. I hear that. Our current “Republican” Sheriff actually campaigned on the enforcement of Red Flag laws saying “you can enforce an unconstitutional law in a constitutional way”.

  3. My brother & I were discussing this and came to the conclusion that they were deliberately pushing “Kamala’s high numbers” so when the cheat occurs, it can be justified. Look, she won by 81 million votes reflecting her popularity. Of course we’ll know it in no way reflects reality but just like in 2020, it will be pushed as truth. Any attempt to counter the cheat will be derided as “conspiracy theory” and the person silenced, again like in 2020.

  4. “In the meta data from the call centers college educated Dems are 3-4x more likely to answer …”
    Among my conservative friends and acquaintances, they are 3-4x more likely to answer “None of your business; shove your push poll up your lying ass.”

  5. I stopped believing polls when it was clear that there was something to gain by its result.

    Considering the violence we saw with the demise of various denizens, Trayvon Martin, George Floyd etc., I’m careful about broadcasting certain information. Pollsters typically get hung up on

  6. Two books come to mind:

    ‘Term Limits’, by Vince Flynn.
    ‘Unintended Consequences’, by John Ross.

    The later is available in a free PDF format.

    1. “Unintended Consequences” by Ross is a very entertaining book.

      Check out Matt Bracken’s Enemies Foreign and Domestic trilogy. They are worth reading.

  7. Trump has firmly established an image as a hater. Not everyone wants to vote for someone filled with hate. Harris’s campaign is I’m Lovable. That’s a lot easier to sell.

  8. ‘Unintended Consequences’, by John Ross.
    The later is available in a free PDF format.

    The free PDF instance is not produced by the actual publisher, who received nothing and continues to receive nothing, for what is a copyright pirate version. In addition it contains some errors (nothing major).
    Do yourself, and the Ross estate a favour and purchase an actual bound copy from Accurate Press (https://stlccw.com/product-category/books/) Don’t try Amazon: the prices are outrageously high.

    Given the way things are going. ‘Unintended Consequences’ is a not-too-unbelievable outcome. The FBI picking up ‘disinformation’ spreaders is all to believable. However, FBI agents might become ‘vaccinated’ if they are individually informed that ‘we know where you live, and where your children go to school’.

  9. The mention of the John Ross book put me in mind of The Battle of Athens, TN. It has been mentioned here before, I bet, but worth a read.

    https://en.m.wikipedia.org/wiki/Battle_of_Athens_(1946)

    I’ve read other novels like Ross’, so likely won’t but take a few minutes with your morning coffee to learn about this little piece of history. Wished we saw some of this after the 2020 election, but perhaps 2024?

    1. I’ve been to Athens, TN. Interesting little stop. There are several murals depicting the events of 1946.

      What other authors do you recommend besides Ross for the dystopian novels?

  10. In 1936, The LIterary Digest conducted a poll which showed Republican Alf Landon winning big. But Roosevelt won by a landslide. TLD had polled its readers, people listed in telephone directories, and automobile owners. In that era, these groups were overwhelmingly middle and upper class households.

    Their result was wrong by 39%. The debacle killed TLD by 1938.

Comments are closed.