Once you invest a few years (or maybe N posts) into blogging, you realize the most idiotic pre-emptive self-criticism is the “I haven’t written much lately so this next post needs to be EPIC.” This is a decidedly non-epic post; another stop gap measure as my less frequent blogging schedule promises to persist for at least another few weeks. Those of you who care about me as a person will be relieved to know that this is for excellent, work-related and personal reasons, so don’t cry for me, Argentina. I’ve got a job I love and I’m doing it.

The functions of sound. One of the obvious things that sound does it trigger memory, whether in the form of a specific event sequence (“I danced to this song with X, at my Junior Prom”), or a non-specific memory of an era (“This reminds me of childhood.”), or a person (“This song always reminds me of Aunt Mary.”). It stands to reason that sounds, like other attributes of history, can be forgotten, and that is the premise behind this blog entry: 11 Sounds That Your Kids Have Probably Never Heard. Complete with sound files!

I’ve been thinking about the cultural process of forgetting because I attended Jean-Baptise Mihel’s talk at the Columbia sociology department. (Here is, perhaps, a better link to his projects and such.) He presented some of the work he’s done on digital archives of english literature–the google books, as you may know them. Basically, he has been able to capture information on the use of single words (or strings of words) in books by year since books in English have been produced (assuming that they have been saved, archived in one of many university libraries and then digitized by google). In other words, he can show you how the use of “penguin” has become more or less common over time. Or, as he described today, when verbs with irregular forms transition into those with regular forms. This is the topic of his 2007 article in Nature.

One of the more interesting results he reported is that there is a characteristic statistical form to the expiration of year mentions. That is, when you look at a count of the mentions of “1910” in books over the succeeding years, this curve looks very much like the one for mentions of “1950” or any other year. [I should note that the question of “special” years like “1492” was raised, but not answered.] This is an interesting result not because we should care about how many times we mention particular years in books, but because the curve is maintained despite massive changes inside and outside the publishing industry over the hundreds of years time horizon he presents. That digitization, changes in copyright, the expansion of higher education, the consolidation of the book industry–that none of these impact the overall shape of the distribution is an extremely interesting result that suggests there is some macro-level cultural phenomenon at work, either at the level of production (people stop writing about topics at equivalent rates over time and across topics), or at the level of text (all years are really alike). Since Mihel is not a sociologist, he doesn’t have a sociological account of these phenomena. In conversation, he was willing to grant that one exists, and is needed, but basically that he would have to learn more in order to be equipped to provide it.

As it happens, Mihel’s team’s work on the google n-gram project gets a mention in a guest post by Trey Causey on Code and Culture today. The only thing I would add is a clarification–because it was not clear in the post–that the n-gram algorithm does not use the “bag-of-words” approach” in which “the order and context of the words is assumed to be unimportant.” The very idea of “N” in “n-grams” is that you can specify a sequence of words that are reported in the results only when they appear in a text, in that exact sequence. And the text is case-sensitive. The example given today in the talk was “The United States are”–which transitioned after the Civil War to a more dominant use of “The United States is”–but you would not capture the same set of documents if you didn’t include the upper-case letters. I can’t tell if the author at Code and Culture Trey is making a mistake of meaning, or placed a discussion of n-gram in the paragraph on “bag of words” for some other reason. [And on the substantive topic of “seven inch heels,” I assume all involved know this particular phrasing is an idiomatic expression for “tall” and not (always) meant as a literal description.]

Finally, and only related via the thin thread of “digitization” is the announcement from ASA of the new policy (recommendations?) for online posting of journal articles. The ASA recommends that authors include this long and bulky text when posting research in the form of a working paper on a personal website or similar:

This paper may not be quoted, reproduced, distributed, transmitted or retransmitted, performed, displayed, downloaded, or adapted in any medium for any purpose, including, without limitation, teaching purposes, without the Author’s express written permission. Permission requests should be directed to [author’s e-mail address].

And if you are posting a working paper, you should include:

Copyright [year]. Name of author. All rights reserved. This paper is for the reader’s personal use only.

I wonder: does anyone have enough current knowledge about copyright to know if this statement would ever matter in any kind of dispute? My understanding was that copyright requires one complete and submit some paperwork to the government. I was completely unaware that we had a system like the mythical way to find an undercover cop: just ask.



6 responses to “Culturonomics or whatever.

  1. Copyright does not require completing and sending something to the government. It’s automatic when you produce a text, though there are many things you can do to make that enforcement easier (and I’m no expert on any of them). That being said, why is ASA promoting moves that make it harder to share our research? Why not, instead, apply a Creative Commons license to your work, and explicitly grant other creators the right to reuse your work (with attribution and not for commercial purposes, if you want)? I guess allowing folks to keep working papers up of articles published in ASA journals is a progressive move, but it just contrasts so much with (say) the stance put forward by our President in re: Wikipedia, the promotion of community and all that jazz.

  2. Trey

    Thanks for the mention of my guest post at Code and Culture. You’re right in that the referenced paragraph does not make clear that n-grams are not part of the “bag of words” approach. Editing should have placed that at the end of the paragraph (note that I state that some are uncomfortable with the “bag-of-words” assumptions and then discuss n-grams). Sorry for the confusion!

    However, as far as the use of seven-inch heels as a hyperbolic reference to large heels, this speaks exactly to my point, rather than undercutting it. The findings are presented as if the actual height of heels varies with economic conditions — the PI of the study wrote in the comments that this is the median height mentioned in their conversation data. Given that it seems both he and I are unaware of the implicit meaning in this phrase, it underscores the need for substantive interpretation of the findings.

  3. Jenn Lena

    @Trey: You won’t find me clarifying the 7 inch heel issue as a way to “undercut” your argument. That’s not on the page. Substantive interpretations of findings are important.

    • Trey

      I’m glad we agree — it seemed like as if you were saying that part of the argument was a non-starter as “seven inch heels” isn’t literal. Uncharitable interpretation on my part.

  4. Jenn,
    Breaking Bad clips and hidden subtexts of fashion slang? I really don’t tell you often enough how much I adore you.

  5. Jenn Lena

    I’m thankful for you too, Gabriel! Hope you’re enjoying the holiday.

