Christopher Andrew Bail (Duke University)


From 09:00
to 10:30
From 09:00
to 10:30

Course description
How do art, music, literature and other cultural forms travel across the world? Cultural globalization has fascinated sociologists, anthropologists, and historians for some time. Yet until recently, studying this important process was extremely challenging. Most research examines the spread of a single cultural form across a handful of countries—and usually those in North America or Western Europe. Such studies not only ignore much of the world, but also ignore the presumably much larger group of cultural forms that never cross-national boundaries—producing an incomplete picture of the process of cultural globalization. The digital revolution provides an unprecedented opportunity to study the spread of cultural forms with remarkable detail. Data from Google, Facebook, and Twitter can now be used to examine how cultural forms spread across the globe. This course will introduce students to the study of cultural globalization as well as the emerging field of computational social science, which leverages new digital data sources to study human behavior. Students will not only read studies that trace the spread of art, music, and culture using new digital sources, but also learn the skills necessary to collect such data and analyze it using social network analysis and automated text analysis using the R software. No prior knowledge of computer programming is required to take this course, and it is specifically designed for people who do not have any background in this area, but would like to explore it at a basic level.

Course goals
This course requires no prior knowledge of computer programming or social science. Students will obtain basic skills that will enable them to automate collection of data about cultural globalization from social media sites, classify these highly unstructured data into discrete variables that can be analyzed using conventional social science models, and analyze them using a combination of techniques that includes screen-scraping, natural language processing and machine learning. We will also discuss the complex ethical and legal issues that arise when working with these novel sources of data.

Formal requirements
This class alternates between discussions of assigned readings and “labs” where you will learn how to code computational social science. You must complete the assigned reading before each discussion class. However, You will complete lab assignments after each lab class. Note that there is no separate lab meeting outside the regular class hours, rather, every other one of these meetings constitutes a lab.
The required readings for this course are relatively short. You are responsible for understanding the readings. Make use of your fellow students, the Internet, a dictionary, and me to ensure that you understand the readings. Discussion sections will be used for substantive discussion and further exploration of the implications of the course readings, not for grasping basic concepts. Remember that this syllabus is a “living document.” By this I mean I reserve the right to change the reading assignments in response to your feedback as well as my own sense of our group achievement. No changes will be made without at least one week’s notice.
Your participation grade will be calculated on a continuous scale from 0 to 100 in order to reflect the quality of your contribution to classroom discussions. Once again, classroom discussions are not intended to clarify key concepts, instead, we will be discussing the pros and cons of each authors’ arguments, or extensions thereof. Therefore, your participation grade assesses the extent to which you have thoughtfully engaged with the reading material.
Lab Assignments
After each “lab” class, you will have a take-home assignment that will be graded on the following scale: 100, 90, 80, 0. Lab assignments will require you to submit your code as an html file (I will explain how to do this in detail well before the first assignment is due).
The bulk of your grade is determined by a 10-15 page final paper that will present an original research project that collects some type of social media data or other form of digital data in order to study how to help a non-profit group of your choosing call attention to their cause. This paper must include at least three visualizations that present analyses of the data you have collected as well as a summary that explains a) the importance of your research question; b) the theories you are using to address the social problem; c) the methods you used to collect and analyze the data; d) the meaning of your visualizations/results; and e) the implications of your research.

Course Evaluation
Your course grade will be calculated as follows:
Participation 20%
Lab Assignments 30%
Final Paper 50%

Readings and resources
Required Readings
Ken Auletta. 2019. Frenemies: The Epic Disruption of the Ad Business (and Everything Else). Penguin
P.W. Singer and Emerson T. Brooking. 2018. LikeWar: The Weaponization of Social Media, Houghton-Mifflin.
Damon Centola. 2018. How Behavior Spreads: The Science of Complex Contagions. Princeton University Press
Annotated Computer Code
At the end of each class, I will upload the code we write together in order to help you complete the lab assignments.
Stack Overflow
The field of computational social science is going so rapidly that none of the resources I give you will remain at the cutting edge for long. You will almost certainly encounter issues unique to the data you collect for your final paper and/or incompatibilities between software packages and/or your computer. Stack Overflow is a website where computer programmers help each other solve such problems. Individuals ask questions, and others earn “reputation points” for solving their problems—these reputation points are awarded by the person who asks the question as well as other site users who vote upon the elegance/efficiency of each solution.
Many of the most important advances in computational social science appear first on Twitter or blogs. I therefore encourage you to open a Twitter account- if you don’t already have one- and follow the authors we read, or check out the people I follow. Of the many blogs that you might read, I recommend R Bloggers, which provides a concise overview of new functions in R as well as solutions to common problems faced by computational social scientists, as well as those in other fields.

Course Schedule
Week 1.1: LikeWar, Chapters 1-3 (pgs. 1-82)
Week 1.2: Lab #1 (Writing your first line of code)
Week 2.1: LikeWar, Chapters 4-6 (pgs. 83-147)
Week 2.2: Lab #2 (Mining Data from Twitter Part 1)
Week 3.1: LikeWar, Chapters 7-9 (pgs. 181-257)
Week 3.2: Lab #3 (Basic Data Structures)
Week 4.1: Frenemies, Introduction and Chapters 1-3 (pgs. 1-50)
Week 4.2: Lab #4 (Data Wrangling Part 1)
Week 5.1: Frenemies, Chapters 4-6 (pgs. 51-118)
Week 5.2: Lab #5 (Data Wrangling Part 2)
Week 6.1: Frenemies, Chapters 7-9 (pgs. 119-154)
Week 6.2: Lab #6 (Basic Programming Part 1)
Week 8.1: Frenemies, Chapters 10-12 (pgs. 171-222)
Week 8.2: Lab #7 (Basic Programming Part 2)
Week 9.1: Frenemies, Chapters 14-19 (pgs. 239-316)
Week 9.2: Lab #8 (Data Visualization Part 1)
Week 10.1: How Behavior Spreads, Chapters 1-4 (pgs. 1-85)
Week 10.2: Lab #9 (Data Visualization Part 2)
Week 11.1: How Behavior Spreads, Chapters 5-7 (pgs. 85-134)
Week 11.2: Lab #10 (Text Analysis Part 1)
Week 12.1: How Behavior Spreads, Chapters 8-10 (135-178)
Week 12.2: Lab #11 (Text Analysis Part 2)
Week 13.1: TBD
Week 13.2: Lab #12 (Open Session: Help with Final Projects)


Isola di San Servolo
30133 Venice,

phone: +39 041 2719511
fax:+39 041 2719510

VAT: 02928970272