Pur Autre Vie

I'm not wrong, I'm just an asshole

Tuesday, June 04, 2013

Beer, Models, and the Looming Dystopia

The title of this post is a bit of an exaggeration, but I am genuinely worried about where our society is going.  In short, I think our inability to understand or criticize the "science" that we increasingly rely on is a major problem.  I'll use a somewhat silly example to explain what I mean.

Consider this (not yet available) app that will produce a "map" of beers.  I won't try to give a detailed explanation, which I would likely mangle.  But the point is that it takes user-generated data from a beer review website and then uses it to determine the degree of similarity between beers.  The app then creates a map such that geographical proximity indicates similarity.

Now, the app might "work" in several different ways.  One way it might work is for the beers to cluster into "regions" that correspond to style categories.  This seems to have happened to a pretty large degree.  Another way for it to work is for it to generate successful predictions of what a beer drinker will like, based on the answers to a few questions.  This is apparently the intended purpose of the app, and I expect it will do fairly well.  As I said, the app isn't yet available, so I can't test its success in this area.

Now I should say that I think it is a cool app, and I will probably buy it if/when it becomes available (assuming I have a device that can run it).  But what strikes me about this endeavor is that it doesn't really have much in the way of theoretical underpinnings.  I don't mean the math—the math looks very impressive (though how would I know).  I mean that if it works, we won't know precisely why it works.  We have a general idea, of course, but in a way it would be serendipitous for the underlying data to yield useful predictions.  If it turns out that the neutrino travels faster than the speed of light, then physicists will have some fucking 'splaining to do.  But if the beer app can't generate a ranked list of 5 stouts I will like based on a ranked list of 5 IPAs I like, then it doesn't really have any broader implications.  It's just not as useful as we might have hoped.

And there's more.  The app might "work" for essentially bad reasons.  You can imagine that people might use a word to describe a particular "style" of beer (whether or not it is meaningfully distinct from other "styles" in any fundamental way), and that you could then get "false positives" in the software.  That is, it might indicate that two beers are similar to each other even though their similarity is purely a social phenomenon.  (You can also imagine that identical beers with different labels might end up in very different "locations" on the map.)  Of course, the program might still faithfully report the way people experience beer, so it would be a success in a way.  But the point is that it is very hard to tell what success means.  Do the style clusters on the map mean anything?  I suspect so, but I don't think the app can tell us that.

And moreover, the app might "work" one day and not the next, or "work" for one person and not another, and we wouldn't know why.  It is essentially a black box.  We know the algorithm, but we don't know why it works sometimes and not others.

It seems to me that a lot of knowledge is like this.  I think there is a temptation to call this kind of thing "scientific" or "empirical," and in a sense it is.  It is certainly data-driven.  But it seems like a very provisional kind of knowledge.  It is deeply qualified and we don't know what the qualifications are.  It might be the best we can do, and we might be willing to trust it quite a bit (for instance, to allocate marketing resources for a brewing company).  But it rests on an unproven foundation and it is very hard to identify the appropriate scope for applying the knowledge.

I recognize that this is sort of a general Humean problem, but I feel it is a bit more of a real-world concern in cases like this than in cases like whether the sun will rise tomorrow.  As I noted, there is a temptation to call these fancy models "scientific" or "empirical" and to discount criticism of them as backwards and ignorant.  But it is dangerous to put more and more of our ideas into black boxes, with all dissent labelled as anti-scientific.  We risk being far too confident in our "knowledge," and allowing that confidence to bestow unearned political victories on particular ideologies.

To give a real-world example (because who really cares if a beer-mapping app doesn't have theoretical underpinnings?), Esther Duflo has been canonized as a saint of the modern Gladwell-inflected technocratic movement.  These are TED people, the kind of people who believe that democracy is dirty and corrupt, and that much better outcomes would be achievable if only we could put "the smart people" in charge.

Now I am sure Esther Duflo's work is very good, but it suffers from some of the same fundamental limitations as the beer app, and I don't think this is appreciated widely enough.  (Here is a paper by Angus Deaton (PDF) making the case.)  Duflo has a lot of influence that stems from what you might call the rhetorically advantageous position of her work.  She may use the influence for good, but that would be somewhat serendipitous.

But having brought her up, let us somewhat abruptly put Duflo aside.  Even in the worst case her work is useful, although it may be given too enthusiastic a reception  by the technocratic "elite."  It is the mindless enthusiasm of that reception that scares me.  I am afraid that we are entering a time when all sorts of policies will be "black boxed," derived from models that are not subject to meaningful inspection.  This is roughly what I think cost-benefit analysis is, at least in theory (in practice I don't think it amounts to much beyond a threat of litigation).  Let's say the black box tells us that we should tear down a poor neighborhood and build a highway or something.  Traditionally this would have been controversial, a political and constitutional issue.  But increasingly I think we will be tempted to "put it beyond politics" by running the numbers to determine whether it is the "optimal policy."  Dissenters will be called anti-scientific (just as people who believe that blacks and whites are equal are branded as anti-scientific), and they will be ridiculed.  After all, they just don't get quantitative science (or genetic science as the case may be).

This is a worry for me partly because I think we will be justified in relying heavily on quantitative models that don't have theoretical underpinnings that can be articulated or assessed.  We don't have a choice, really.  (And I am open to the possibility that all knowledge is like this, though again, I don't think you have to go full Hume to see my point.)  But I suspect we will be very bad at drawing lines and maintaining open minds as we go down this path.  Already I think far too many people think of democracy as a hindrance, a sort of necessary evil that hopefully won't get in the way of the "smart people."  This is of course a deeply flawed idea, but I don't think that will slow it down much.

4 Comments:

Blogger Sarang said...

I think the issue with cost-benefit analysis is more the dodgy normative assumptions than anything else; I agree that these tend to get swept under the rug, and that this is the fatal flaw of technocracy. I also agree that theoretically understood models are superior to pure empirical relationships and that this fact tends to get buried under all the big data hype. But I don't really see that the two things have much to do with each other.

Evolution seems to me a simple counterexample to the conflation. Most people who do believe it _don't_ understand it -- they don't, for instance, realize that natural selection wouldn't work at all w/ blending instead of particulate inheritance -- and believe it b'se of arguments-from-authority; this is inevitable because "math is hard." However, the theory is about as well-specified as any other, and it is certainly not a black box. In this case, skeptics really really are ignorant, though no more so than believers.

10:19 AM  
Blogger James said...

Yeah I guess it's not a perfect analogy, but my point is that cost-benefit analysis operates much like an empirical conclusion that has no theoretical foundation. That is, it is perceived to be rigorous and "scientific" when in fact it smuggles in a bunch of subjective modeling choices and, as you say, dodgy normative assumptions. Its results are essentially imposed on people without meaningful understanding or consideration of the relevant issues. It is treated as though it has more validity than it really does.

I am not sure what to make of your evolution example. I think it is appropriate to shut down debate (in a sense) in cases when the science is clear, and I don't particularly care about the subjective beliefs of the participants in the discussion (many people will be confused about almost any complex topic). My point is just that it is illegitimate to shut down discussion on the basis of scientific validity where that validity is questionable, and I claim that is the case with "big data" type models. The problem is not the models - I like beer maps, I like Duflo - the problem is the mismatch between what the models actually establish and what they are perceived to have established.

11:52 AM  
Blogger James said...

Oh I guess I should also mention I am scared that more and more policy will "literally" be made in a black-box process. For instance if the government approves driverless-car software, and that software results in increased traffic through poor neighborhoods, in turn leading to more deaths in those neighborhoods. The problem is that even if you could prove the statistical fact (itself a difficult task), you will have a hard time "finding it in the code." Defenders of the algorithm will call you anti-science etc. etc.

12:02 PM  
Blogger Grobstein said...

The "beer app" case study is reminiscent of some remarks Noam Chomsky made recently. Peter Norvig responded. (Have not read either closely.) See first few entries here: http://www.chomsky.info/debates.htm

10:18 PM  

Post a Comment

<< Home