Content Sharing in a Social Broadcasting Environment:
Evidence from Twitter
*
Zhan Shi
Huaxia Rui
Andrew Whinston
§
Abstract
The rise of social broadcasting technologies has greatly facilitated open access to
information worldwide, not only by powering decentralized information production and
consumption, but also by expediting information diffusion through social interactions
like content sharing. We study users’ voluntary information sharing in the context of
Twitter, the predominant social broadcasting site, by modeling both the technology
and user behavior. We collect a detailed dataset about the official content-sharing
function on Twitter, called retweet, and document the statistical relationships between
the users’ social network characteristics and their retweeting acts. We then estimate
a more structural model using conditional Maximum Likelihood Estimation (MLE)
method. The empirical results convincingly support our hypothesis that weak ties are
more likely to engage in the social exchange process of content sharing. Specifically, we
find that after an author posts a median quality tweet (as defined in the sample), the
likelihood that a unidirectional follower will retweet is 3.1% higher than the likelihood
that a bidirectional follower will.
Keywords: Information Diffusion, Content Sharing, Social Broadcasting, Twitter,
Weak Tie
*
We thank Rion Snow from Twitter, our editors, the four anonymous reviewers, as well as the attendees
to our panel at SXSW Interactive 2011 and our session at WISE 2010, for useful comments and suggestions.
The University of Texas at Austin, Department of Economics, Austin, TX 78712, email: zs@utexas.edu
The University of Texas at Austin, Department of Information Systems, Risk and Operations Manage-
ment, McCombs School of Business, Austin, TX 78712, email: huaxia@utexas.edu
§
The University of Texas at Austin, Department of Information Systems, Risk and Operations Manage-
ment, McCombs School of Business, Austin, TX 78712, email: abw@uts.cc.utexas.edu
1
1 Introduction
At 10:24 p.m. EST, May 1, 2011, one hour and eleven minutes before the formal announce-
ment of Osama Bin Laden’s death by U.S. President Barack Obama, the following message
was posted on Twitter by Mr. Keith Urbahn,
1
So I’m told by a reputable person they have killed Osama Bin Laden...
The post quickly attracted attention and got forwarded by Mr. Urbahn’s subscribers on
Twitter, and within two minutes, there were already more than 300 reactions to it. In the
following hour, tens of thousands more users in the Twitter world were passing this message,
and the final number of people who got exposed to the information before the formal White
House announcement was even higher.
This example not only shows the sheer power of Twitter as a fast-growing social medium,
but also demonstrates that, the emerging social media can beat even their mainstream com-
petitors in terms of speed, flexibility, and reach, especially in tracking events as they unfold
in real time.
2
The unique advantage of websites like Twitter in disseminating news comes
from their distinctive technological infrastructure. Although Twitter and a number of other
similar online services, such as Tumblr and Sina Weibo, are usually referred to as micro-
blogging or social networking sites, these labels fail to capture their whole essence that
these websites each are simultaneously a broadcasting service and a social network. Like
content from most traditional mass media, user-generated content on these sites is accessible
by the public and is broadcasted through directed subscription. The subscription relation-
ships, as the only kind of user relationship, constitute the accompanying social network. The
coexistence of a broadcasting service and a social network makes the combination of facets
easily distinguishable from each one’s respective standalone peers. On the one hand, the
broadcasting service differs from traditional mass media like TV or radio in its decentralized
structure and its social ingredient; it represents the full spectrum of communications, from
headline news to personal and private communications (Wu et al. 2011). On the other hand,
the social network, derived from content-subscription relationships, also significantly differs
from traditional online social networks, which typically map real-world friendships or con-
nections. For example, the social network on Twitter is quite open and loose compared to
the social network on Facebook because the follower–following relationship on Twitter can
1
@keithurbahn, http://twitter.com/keithurbahn.
2
Indeed, this capability has been proven again and again during events such as the 2009 Iran election,
the 2011 Middle East Revolution, and the 2012 Chinese political scandal.
2
be established unilaterally and usually cuts across long (real-world) social distances. This
combination gives these technologies unique advantages in facilitating information diffusion
and justifies assigning them to a new category, which we call social broadcasting networks.
This view is also explicitly or implicitly shared by many computer and information scientists.
For example, Kwak et al. (2010) suggested that Twitter more closely resembles an informa-
tion sharing site than a traditional social network. Bakshy et al. (2011) noted that “unlike
other user-declared networks, Twitter is expressly devoted to disseminating information.
Social broadcasting networks have blurred the traditional boundary between social networks
and news media by adding the “social” ingredient into the cycle of information production,
exchange, and consumption (Kwak et al. 2010, Wu et al. 2011, Socialflow 2011).
As exemplified by the Bin-Laden case, information diffusion in social broadcasting net-
works critically relies on social interactions, such as content sharing. Indeed, without the
voluntary relaying of Mr. Urbahn’s message by numerous Twitter users, that single post
might never have triggered an avalanche of reactions and reached an audience far beyond
Mr. Urbahn’s own subscribers.
3
Content sharing is a critical mechanism of information dif-
fusion in social broadcasting networks and is vital to a network’s proper functioning and
thriving. When interesting or important information does not get passed on, the social
broadcasting network fails to reach its full potential as a news medium; meanwhile, excess
transmission of redundant or trivial information creates information overload and lowers the
value of a social broadcasting network to the users. Understanding the information relaying
process is thus both interesting and important. The objective of this paper is to make an
early step in this direction by examing the sharing decision-making process at the individual
level. As suggested, one defining feature of social broadcasting networks is that they possess
a large volume of weak interpersonal relationships. Thus, our central goal in this article is
to address the following research question:
Research Question How does the strength of the interpersonal tie moderate people’s vol-
untary content sharing behavior in a social broadcasting network?
Exploring the question might further reveal people’s motivation in passing on information.
Users’ voluntary content sharing is a social exchange process (Blau 1964) that involves
the content’s creator, the sharer, and the sharer’s subscribers. To develop and test a theoret-
ical model explaining how tie strength moderates people’s decisions to engage in the social
3
According to social media company SocialFlow, Keith Urbahn wasn’t the first to speculate Bin Laden’s
death after the news was released about the presidential address. However, Keith Urbahn’s tweet proved to
be a watershed in people’s discussion on Twitter regarding the presidential address.
3
exchange, we draw on two streams of prior research: the literature on tie strength and the
literature on people’s pro-social behavior.
Plenty of literature has looked at the implications of tie strength in a variety of social or
economic settings. For example, Granovetter (1973) did the pioneering work on the role that
weak ties played when people search for jobs, the result of which is famously summarized as
the strength of weak ties
(SWT). The arguments of SWT suggest the importance of weak ties
(i.e., ties with acquaintances, rather than close friends) in enabling novel information to flow
across two densely knit groups of close friends. Levin and Cross (2004) proposed and tested
a model of dyadic knowledge exchange taking into account trust and tie strength between
the two parties. Their results also suggested that weak ties provide access to nonredundant
information. Bapna et al. (2012) studied the link between strength of social ties and trust
in an online social network using data from a Facebook application. They found that for
the average user social tie strength as measured by actively interacting with someone else is
positively linked to trust.
Researchers have also extensively studied people’s motivation of sharing knowledge in
online environment where explicit financial compensation is often absent (Wasko and Faraj
2005, Bock et al. 2005, Chiu et al. 2006, Olivera et al. 2008). However, most of the pre-
vious studies focus on sharing behavior in the form of helping others (often strangers) solve
problems by contributing one’s own knowledge. Bock et al. (2005) surveyed 154 managers
from 27 Korean organizations and found that anticipated reciprocal relationships affect in-
dividual’s attitudes toward knowledge sharing. Chiu et al. (2006) also found that social
interaction ties, reciprocity, and identification increased individuals’ quantity of knowledge
sharing by surveying 310 members of one professional virtual community in Taiwan. Olivera
et al. (2008) developed a framework for understanding contribution behaviors and delineated
three mediating mechanisms : awareness, searching and matching, and formulation and de-
livery. The sharing behavior we study is people’s voluntary information relaying decision,
which is a quite differnt type of contribution. Wasko and Faraj (2005) applied theories of col-
lective action to examine how individual motivations and social capital influence knowledge
contribution in electronic networks. Using survey data and archival data from one electronic
network supporting a professional legal association, they found that people contribute their
knowledge when they perceive that it enhances their professional reputations, when they
have the experience to share, and when they are structurally embedded in the network.
The current paper can be viewed as an extension of Wasko and Faraj (2005) in the sense
that we are also examing people’s contribution behavior on a electronic network. However,
this paper departs from previous IS literature in two important ways. In terms of data and
method, we use micro-level data and a two-stage discrete choice model to study a relatively
4
new form of sharing behavior–relaying information contributed by others–on a social broad-
casting network which is also a new form of virtual community. In terms of theory, we
integrate SWT with the general framework of social exchange to develop a new theoretical
model to examine the relationship between network characteristics and retweeting behavior.
Our theoretical model posits that one’s motivation for engaging in the social exchange
process of content sharing is the latent benefit of perceived reputation enhancement resulting
from consumption of the shared content by one’s subscribers. The majority part of the latent
benefit comes from the subscribers and thus is positively associated with the perceived
novelty of the content to the sharer’s subscribers, which in turn is negatively associated
with the strength of the social tie between the content’s creator and the sharer. Empirically
testing our theory in a real-world social broadcasting network is complicated both by the
challenge of collecting micro-level data from the Internet and by the specifics of the actual
technological environment in which data are produced. To overcome these problems, we
deploy 20 servers over a 140-day period to collect a detailed dataset containing information
on both the content-sharing activity and social relationships from Twitter, and we develop a
two-stage “consumption-sharing” model to help us better understand the machine-mediated
human decision-making process. We then estimate the empirical model using conditional
Maximum Likelihood Estimation (MLE) method, the results of which convincingly support
our theory.
The remainder of this paper proceeds as follows. In Section 2, we briefly introduce
Twitter as an example of social broadcasting networks and describe the technology-mediated
information-sharing mechanism on Twitter. Drawing on social and behavioral theories, we
develop our hypothesis in Section 3. After describing our dataset in Section 4, we conduct
a series of empirical analyses to test our model in Section 5, and we discuss the managerial
implications of our findings in Section 6. Finally, we conclude and discuss future research
directions in Section 7.
2 Twitter and Retweeting
Designed to be the “Short Message Service of the Internet” at start-up, Twitter was launched
in July 2006. During the 2007 South by Southwest (SxSW) festival in Austin, TX, a showcase
of Twitter impressed the highly tech-savvy attendees. Since then, Twitter has entered a phase
of rapid growth and gained popularity far beyond the technology industry insiders. As of
March 2011, Twitter had more than 200 million registered users worldwide, who in total
5
post an average of 150 million updates a day.
4
Twitter is now one of the most vibrant online
communities in the world.
Twitter: A Social Broadcasting Technology
Twitter is an example of a social broadcasting site, where a broadcasting service and a social
network organically constitute the technological infrastructure. On top of that, Twitter users
produce and consume informational content by authoring and reading tweets,
5
which are
text-based updates/messages of up to 140 characters. Like content on most traditional mass
media, tweets are by default open to the public, and there is no restriction on consumption.
Powered by its service, every Twitter user can be a content broadcaster and/or a content
consumer.
Twitter users are networked to each other through a following-follower relationship. A
user’s followers are those who subscribe to receive his or her tweets, and a user’s followings are
the users whose tweets he or she subscribes to receive.
6
This following-follower relationship
is the sole interpersonal link in the Twitter network. It is not only the pathway through
which broadcasted content traverses the Twittersphere but also the channel of person-to-
person communications, such as public reply and direct message. This relationship differs
from friendship on Facebook or some other social network site in two respects: (1) the
following-follower relationship on Twitter is relatively open in the sense that A following B
does not require B’s consent, and they usually do not map to real-world friendships as the
ones on Facebook do;
7
and (2) perhaps more importantly, the following-follower relationship
is directed (A’s following B does not imply B’s following A) while friendship is undirected
(A’s being a friend of B implies B’s being a friend of A). The existence of a large volume of
(loose and directed) subscription relationships is thus a distinctive characteristic of a social
broadcasting network.
4
See http://blog.twitter.com/2011/03/numbers.html and http://en.wikipedia.org/wiki/Twitter for more
statistics.
5
Tweet can also be used as a verb, meaning to post. So “to tweet a tweet” means “to post an update.
6
A user A does not have to follow B to consume B’s tweets. A can access B’s Twitter webpage at any
time to consume B’s tweets, which, like everyone else’s, are always publicly available. But if A follows B,
B’s tweets will be “pushed” to A in real time.
7
The fact that users who are connected in a social broadcasting site are usually neither friends nor even
acquaintances in the real world allows us to narrow our focus just to the online context in studying their
interactions. For instance, we do not have to worry that a favor A does for B online would be reciprocated
offline.
6
Retweeting: Content Sharing on Twitter
Content sharing is an integral part of the Twitter experience. In addition to composing and
posting tweets themselves, Twitter users can also rebroadcast or retweet
8
in Twitter’s
terminology other users’ (most likely their followings’) tweets that they find are of partic-
ular (informational, entertaining, etc) value.
9
Retweeting spreads information by exposing
new audience to the content. Meanwhile, retweeting is a special kind of sharing because a
retweet is simply a copy of the original tweet, and thus the author, content, and format of the
shared information stay exactly the same as the original tweet. Retweeting can also display
a “chain effect”: not only a tweet’s author’s followers, but also sharers’ followers, and so on,
can further retweet, spreading the content onto their respective networks and amplifying the
audience of the content to a potentially massive scale (Socialflow 2011). Thus, retweeting is
evidently a critical mechanism of information diffusion on Twitter. Since it was introduced,
retweeting has been extremely popular on Twitter because of the straightforward idea and
the easy-to-use official retweet button.
10
Therefore, we use retweeting in the Twittersphere
as the primary real-world example of content sharing activity.
11
The mechanism of retweeting is graphically illustrated in Figure 1. Hereafter, we call
the user who writes the original tweet the author, and the author is denoted R in the figure.
The other nodes represent other users who are linked to each other via the following-follower
relationship, together forming a tiny community inside the Twitter world. If two users
mutually follow each other, the edge between them is drawn in solid (e.g., R and A, and we
call A a bidirectional follower of R). Otherwise, if only one of them follows the other, the
edge between them is a dashed line, with an arrow pointing to the user followed (e.g., B
follows R but R doesn’t follow B, so that we call B a unidirectional follower of R). After R
posts an update, if no one retweets it, only R’s followers A, B, C, D, and E would receive it.
8
Retweet is both a verb and a noun, just as tweet is. When user A retweets a tweet t, we call the reposted
copy of t a retweet and call A a retweeter of t.
9
Posting others’ tweets simply by copying and pasting their tweets without mentioning the original author
is technologically possible but is not considered retweeting. Rather, it is a highly criticized misbehavior in
the Twitter community.
10
The official retweet function is built into most mobile applications, as well as Twitter’s official website.
There is no publicly available statistic on the popularity of retweeting vs. other ways of information sharing.
For example, another widely adopted way is to quote a tweet and add “RT” in front. An off-the-record
interview with a Twitter employee confirmed that the official retweeting button had been the more popular
mode of sharing.
11
In addition to Twitter’s dominance in the social broadcasting domain, another important reason we
focus on it is that the openness of Twitter allows us to collect a detailed, micro-level dataset to complete our
study. Section 4 describes our data collection in detail.
7
..
R
.
A
.
B
.
C
.
D
.
E
.
F
.
G
.
H
.
I
.
J
.
K
.
L
Figure 1: An Illustration of Retweeting
But now assume that after reading the message, users A, D, and E retweet (retweeters are
shown in filled circles), thereby making F , G, H, and K, who are not immediate followers
of R, receive a copy of the tweet. Then the new receivers could also retweet (as G and H
do in the Figure 1 example), circulating the information more broadly around the network.
One thing to note is that a retweet is also a content broadcast; because of the technology, a
sharer cannot select a subgroup of his or her followers and only retweet to this subgroup.
12
Using the graphic example in Figure 1 as the context, we emphasize a few things related
to our research question. First, we do not consider network dynamics (the formation and
destruction of personal relationships among the users). In this research, we take a snapshot
of the network structure, consider it as fixed and exogenous, and study user behavior on top
of it. Second, in later econometric analyses, we model potential retweeters only in the first
order (i.e., R’s immediate followers A, B, C, D, and E), but not those in the second and
higher orders (i.e., F , G, H, I, J, and K). As we explain in the data section, the reason is
that we do not have the network graph data for higher order potential sharers. Third, the
variation of user behavior we exploit is different users’ different reactions to a single tweet
(e.g., A, B, C, D, and E’s reactions to a tweet authored by R), rather than one single user’s
different reactions to different tweets (e.g., B’s reactions to different tweets authored by R,
H, and L).
12
In non-broadcasting social networks, such as Facebook, users typically can post messages only to a chosen
subgroup of his or her “friends.
8
3 Theoretical Model
In this section, we develop the hypothesis on how the strength of the interpersonal tie mod-
erates people’s decision of relaying others’ message. Although we often refer to Twitter as we
develop our hypothesis, our theoretical arguments are applicable to other social broadcasting
networks as well.
Content sharing is a social exchange process (Homans 1958; Blau 1964) that involves
three parties: the sharer, the content’s creator, and the group of individuals to whom the
content is shared. By choosing to relay the information, the sharer incurs the cost of sharing
13
without being rewarded in any explicit way. However, the other two parties explicitly benefit:
The subscribers can consume the shared information, and the content’s creator reaches a
larger audience.
Social exchange theory posits that people engage in social exchange in expectation of
getting returns. When no explicit material or financial gains are received, the latent benefit
of a social exchange process can be emotional comforts or social rewards (e.g., reputation).
Indeed, “people’s positive sentiments toward and evaluations of others, such as affection,
approval, and respect, are rewards worth a price that enter into exchange transactions”
(Blau 1964, p 112). Certain acts conducted by members of a community, such as sharing
knowledge, benefit the collective but do not generate any immediate financial returns to the
actors. Such behaviors are often referred to as “pro-social,” because social rewards have
been identified as an important incentive. For example, perceived reputation enhancement
is identified as an important factor in motivating sharing in the information system and
management literature (Wasko and Faraj 2005).
These early research works suggest that the latent benefit for the sharer to engage in
the social exchange process might come from the perception that participation in sharing
information enhances his or her reputation either as a connected person in the network or
as a person that has the capability to filter large amounts of content and dig out valuable
pieces.
How large the latent benefit can be, or the extent to which a user’s reputation can be
enhanced by sharing a message, is determined by two factors: the number of subscribers
who would receive the shared content and the extent to which the subscribers value that
piece of content. The subscribers’ valuation depends partly on the intrinsic quality of the
shared information: The higher the quality is, the more the audience values the content, and
13
The cost could be interpreted as the opportunity cost of choosing not to share.
9
hence the greater the latent benefit of sharing.
14
Moreover, different audiences’ valuations
of the same content (quality) should also differ because they have different preferences and
different knowledge sets. For instance, the early tweet about the death of Osama bin Laden
should indeed have high informational value to most ordinary Twitter users. However, for
anyone inside the White House Situation Room on May 1, 2011, that tweet simply repeated
a story he or she already knew and thus was of little additional value. This case shows that
information consumers with different backgrounds could attach unequal value to the same
piece of content, and, in particular, the novelty of information should affect a particular
consumer’s valuation.
Earlier works in sociology studied the importance of weak ties in enabling the flow of
novel information in a social structure. For example, Granovetter (1973) theorized the re-
lationship between the novelty of information and the strength of the social tie through
which the information is transmitted in the context of people finding jobs. Granovetter’s
results suggested that weak ties those personal connections linking distant acquaintances
were more likely to provide nonredundant information because strong ties link closely re-
lated persons, such as family and friends, who often possess knowledge sets similar to the job
seeker’s. Following Granovetter’s seminal work, subsequent research further demonstrated
that, in both real organizations and virtual communities, weak ties are instrumental in con-
necting diverse groups and enabling a person to access heterogeneous and thus more valuable
opinions (see, e.g., Granovetter 1982; Constant et al. 1996; Hansen 1999; Levin and Cross
2004). Adopting this view in the context of information sharing in a social broadcasting en-
vironment, we hypothesize that the strength of the social tie between the content creator and
a potential sharer mediates the sharer’s latent benefit of sharing. Specifically, on average,
the weaker the tie is, the higher a potential sharer believes the subscribers would value the
information and hence the higher the expected reputation enhancement is. The implication
of this line of argument is the following hypothesized relationship between content-sharing
probability and tie strength.
Hypothesis. In social broadcasting networks, the latent benefit of sharing content is nega-
tively associated with the strength of the social tie between a potential sharer and the content
creator. Thus, given a piece of content, a weak-tie subscriber is more likely to share than a
strong-tie subscriber, everything else being equal.
This hypothesis might look counter-intuitive at first glance for readers who anticipate
that, for example in the Twitter world, a Twitter user is more likely to retweet tweets from
14
Because of this quality effect, we cluster our observations based on each tweet in our analysis.
10
those who are strongly tied to her.
15
However, as we argued, information sharing in a social
broadcasting environment is mainly a social exchange with one’s followers. SWT suggests
that the followers of a weak-tie follower of the content’s creator should on average attach a
higher value to the content, which, we argue, serves as a larger incentive for participating
in the social exchange of forwarding information. Moreover, although our hypothesis is
consistent with SWT, it is not a simple repetition of it. SWT states only that information
obtained from one’s weak-tie connections is expected to be more valuable; it does not say
that weak ties actually promote information dissemination in anticipation of the higher
value from the information receivers. In this sense, our hypothesis extends the original
SWT findings within the social exchange theoretical framework by arguing that in social
broadcasting networks, weak ties, in expectation of higher social exchange returns, are more
likely to provide the path by which information is relayed. We quote the following paragraph
from Friedkin (1980):
Granovetter’s theory, to the extent that it is a powerful theory, rests on the
assumption that local bridges and weak ties not only represent opportunities for
the occurrence of cohesive phenomena ... but that they actually do promote
the occurrence of these phenomena. A major empirical effort in the field of
social network analysis will be required to support this aspect of Granovetter’s
theoretical approach ... It is one thing to argue that when information travels
by means of these ties it is usually novel, and perhaps, important information to
the groups concerned. It is another thing to argue that local bridges and weak
ties promote the regular flow of novel and important information in differentiated
structures. One may agree with the former and disagree with the latter.
Our hypothesis suggests that the two things Friedkin tried to disentangle conceptually might
after all be indistinguishable practically because people’s quest for reputation enhancement
motivates them to facilitate the penetration of novel information into the social network
through weak ties.
User relationships in the Twitter environment are apparently not exactly the same as
the real-world personal relationships Granovetter initially focused on to study the strength
15
Such intuition might have its root in the balance theory in psychology (Heider 1958). Blau (1964, p26)
argued that a strain toward imbalance, as well as toward reciprocity, arises in social associations. If we think
of the action of retweeting as an endorsement or a favor to the content creator, then a user’s retweeting a
tweet from someone who does not follow that user represents a greater imbalance than if that tweet were
from someone who follows that user. In other words, from the perspective of the social exchange between
the sharer and the content creator, a strong tie entails a stronger sense of obligation.
11
of weak ties. Hence, to adapt our hypothesis in the Twitter world and test it with data,
we need to empirically operationalize the strength of social ties in the Twitter network. We
do this based on the observed relationship types and assume that reciprocal relationships
are on average stronger than nonreciprocal ones. This assumption leads to the following
assumption, which is key to our subsequent empirical analysis:
Assumption. A unidirectional link between two Twitter users is expected to be weaker than
a bidirectional one, in the sense of “tie strength” established by Granovetter (1973).
For instance in the Figure 1 example, ties like D-R are expected to be weaker than those
like C-R.
Our measure of tie strength looks natural, but it nonetheless needs to be supported by
convincing theoretical arguments and empirical evidence. We provide the supporting argu-
ment of our assumption in Appendix I for interested readers. Meanwhile, we note here that
the emphasis on reciprocity is consistent with a long tradition in the sociology literature.
Davis (1970) suggests that mutual choices indicate a strong tie while asymmetric pairs in-
dicate weak ties.
16
Granovetter also pointed out that the strength of a tie is a combination
of several factors, including mutual confiding and reciprocal services (Granovetter 1973).
Friedkin (1980) measured tie strength among faculty members in seven biological science
departments of a single university based on whether a discussion about current research is
reciprocated or not reciprocated.
Based on the assumption, our hypothesis, adapted in the Twitter world, becomes an
empirically testable one:
Hypothesis. On expectation, a unidirectional follower is more likely to retweet than a
bidirectional follower.
For instance, in Figure 1, ex ante we expect D is more likely to retweet R’s tweet than
C is. We develop our econometric model based on both these theoretical discussions and the
technological specifics of the Twitter environment. Before discussing the model, we describe
our data in Section 4.
16
Davis measured interpersonal relations on a three-point ordinal scale: mutual positives are the most
positive, mutual negatives are least positive, and asymmetric pairs are intermediate. In sociometry, these
correspond to mutual choices (i chooses j and j chooses i), mutual nonchoices (i does not choose j, and j
does not choose i), and unreciprocated (i chooses j but j does not choose i, or j chooses i but i does not
choose j).
12
..
Tweets Database
.
http://www.twitter.com/toptweets
..
Retweets Database
.
Net-Graph Database
.
Twitter API
.
Twitter API
.
“pick-tweet”
picks 1 tweet per day
.
“fetch-retweeters”
runs once per hour
.
“fetch-graph”
crawls network data
Figure 2: Data Collection Workflow
4 Data
We deployed 20 servers to collect data by querying Twitter’s application programming in-
terface (API).
17
Data Collection
Figure 2 shows the data collection workflow and is a useful illustration for helping readers to
understand the details of our data collection process, described in the following paragraphs.
From July 22, 2010 to December 2, 2010, at 0:05 each day, our “pick-tweet” program fetched
Twitter’s toptweets webpage, which usually showed 17 to 18 popular tweets in the Twitter-
sphere at the visiting time.
18
Sorting these tweets into chronological order, our program
then checked, one by one, the number of followers a tweet’s author had and inserted into
our tweets database the first one it found whose author had less than 1,500 followers; the
17
http://dev.twitter.com
18
Top Tweets is an official Twitter account, which is a “new algorithm that finds tweets that are catching the
attention of other users. The algorithm is proprietary, so we cannot give a definition for a “popular tweet.
Twitter’s Chief Scientist, Abdur Chowdhury, explained, “the algorithm looks at all kinds of interactions
with tweets, including retweets, favorites, and more to identify the tweets with the highest velocity beyond
expectations.
13
rest were discarded. If all the authors had more than 1,500 followers, the program wouldn’t
insert any tweet on that day. In other words, our program picked either 1 tweet or 0 tweets
every day over this period of time.
19
After a tweet entered our tweets database, another “fetch-retweeters” program began to
track and fetch its retweeting data and would do so constantly during the subsequent five
days.
20
At 10 minutes past each clock hour over the 5 days, the program queried Twitter
API to get the user IDs of the retweeters (those in filled circles in the Figure 1 example).
The retweeter IDs were obtained in the order of the time at which the user retweeted.
21
As retweeting data came in, another “fetch-graph” program worked on collecting relevant
network graph information. Specifically, for each tweet, we were interested in its author (R
in Figure 1), the author’s followers (A, B, C, D, E in Figure 1), and the tweet’s (first-order)
retweeters (A, D, E in Figure 1); we called this set of Twitter users our focal set. For each
user in the focal set, our program collected the IDs of both the followings and followers and
stored the data in our network graph database. For some users in the focal set, access to their
following-IDs and follower-IDs was restricted because they explicitly disallowed third-party
access to their data. We used a “protected” flag to indicate this privacy protection status,
with the flag = 1 meaning no public data access. With the retweeting data and network
graph data in hand, we produced a real-world analog of Figure 1 (see Figure 10 in Appendix
II). The figure shows the spread of the first tweet in our database.
We designed our data collection strategy around one important binding constraint: Twit-
ter API allowed only 150 visits/queries per IP per hour,
22
and our computational capacity
was limited. One API visit would return only a limited amount of information, so to finish
one “job” (e.g., getting the entire set of a user’s following-IDs) could require a number of
19
The “pick-tweet” program did not run properly on a few days during our data collection period because
technical problems (e.g., server failure) occurred on either the Twitter side or our side. On those days, no
tweets were added to our database.
20
The decision to track retweeting activities for five days was made on the basis of our judgment about
how long a retweeting process of one tweet could stay active. The log file written by the “fetch-retweeters”
program showed that most retweeting activities of a tweet happened within just one or two days of when
it was first posted. Tracking for five days thus seemed conservative enough to ensure that any truncated
sample problem (a large number of retweets occurring after our tracking period) was unlikely.
21
One important technical constraint was that Twitter API provided IDs for only the 800 most recent
retweeters, so that if more than 800 users retweeted a tweet between two queries, our program was not able
to get the complete set of retweeters. In addition, we found no publicly available way to verify the number
of retweeters our program had missed. We took a conservative approach to deal with this situation: Unless
we were sure we had fetched the complete set of retweeters for a tweet, we discarded that tweet from our
database.
22
This REST API rate limit was as of the second half of 2010: https://dev.twitter.com/docs/rate-limiting.
14
Tweet level t index of tweets/authors
n
t
the number of followers of author t
the number of observations for tweet t
ν
t
the total number of retweeters of tweet t
Follower level ti index of author t’s followers, i {1, 2, . . . , n
t
}
y
ti
binary outcome, = 1 if follower ti retweeted tweet t
w
ti
binary variable, = 1 if follower ti is a unidirectional follower of t (weak tie)
V
ti
the number of ti’s followings
W
ti
the number of ti’s followers
m
ti
the number of times ti’s followings retweeted tweet t (before ti did if y
ti
= 1)
Table 1: Notations
queries (e.g., the actual number of visits required would depend on the number of followings
the user had). As discussed in the previous paragraph, we had to collect all following-IDs
and follower-IDs for all users in the focal set; moreover, we had to finish collecting the data
as quickly as possible to avoid potential significant changes in their following-follower rela-
tionships. This 150-visits limit was the reason why we decided to select only one tweet per
day, select only tweets whose authors had fewer than 1,500 followers, and track retweeting
activity only once per hour, and why we decided not to collect network graph data of follow-
ers’ followers (G in Figure 1).
23
Deciding otherwise would have prevented us from finishing
the workload for one tweet before the next tweet came into our database.
Data Description and Statistics
We provide a list of notations in Table 1.
Tweets, authors, and the number of observations
By the end of the 140-day data collection course, we had successfully completed data
collection for 65 tweets. We index the tweets in order of posting time by an integer, t, ranging
from 1 to 65. The tweets were all authored by different users, so we also denote the author
of tweet t author t, for simplicity of notation.
24
23
As a result, we do not have the “second-order” retweeters’ network characteristics and we do not include
the “second-order” retweeters in later econometric analyses. Studying their retweeting decisions can be a
future research topic.
24
Among the 65 tweets, 3 are in Spanish, 1 is in Italian, 1 is in Portugese, and the remaining are in English.
None of the authors is celebrity, partly because of our 1,500-follower constraint. The textual contents range
from breaking news and comments on news to political jokes and witty quotes.
15
n
t
min max mean median
total 87 1497 457 370
non-protected 54 1189 375 324
Table 2: Number of Observations per Tweet
The two plots in Figure 3 show the distributions of the tweets by month of post and
by hour of post, respectively. The sample frequency of tweets by hour of post is roughly
consistent with the distribution of total volume of tweets posted in each clock hour in the
entire Twitter world. The left subplot of Figure 4 shows the distribution of the number of
followers an author had (n
t
) and the distribution of the total number of retweeters a tweet
gained (ν
t
). Note that for a tweet, ν
t
could be larger than n
t
because retweeters’ followers
who were not immediate followers of the author could also have retweeted. The right subplot
of Figure 4 is a scatter-plot of the 65 tweets on the n
t
-ν
t
plane. More or less surprisingly,
our sample shows no positive correlation between the number of followers an author had and
the total number of retweeters her tweet gained (a linear fitting line shows weakly negative
slope). However, this simple result is actually consistent with Bakshy et al. (2011), which
also finds that the number of an author’s followers is in general a poor predictor of the size
of the retweet cascade.
Because our objective is to model a follower’s binary decision of whether to retweet, n
t
,
the number of followers that author t had is also the number of observations in cluster t.
From this place onward, we exclude users for whom we could not collect following/follower
IDs (flag “protected” = 1) and users with zero following/followers (assuming they were
either new registrants or inactive members). As a result, the total number of observations
(N =
65
t=1
n
t
) in our sample declined from 29,681 to 24,403, a decrease of 17.78%. Table
2 gives the basic descriptive statistics of n
t
before and after dropping the observations, and
Figure 5 shows the number of pre-dropping vs. post-dropping observations in more detail.
Variables
We now summarize the key variables used in the econometric model. For a tweet t, we
use y
ti
, i {1, 2, . . . , n
t
} to index whether each of its observations (i.e., author t’s followers)
retweeted tweet t. The definitions of the key variables can be found in Table 1. These
variables are either directly observed or constructed from observed ones. We provide the
descriptive statistics of these variables in Table 3 and the correlations between them in
16
0
5
10
15
20
25
Jul Aug Sep Oct Nov Dec
Count
(a) Tweets Distribution - Month
0
2
4
6
8
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Count
(b) Tweets Distribution - Hour
Figure 3: Distributions of Tweets by Month of Post and by Hour of Post
0
5
10
15
20
25
0-200 200-400 400-600 600-800 800-10001000-1500
Count
Range
(a) Distribution of Tweets over # of Followers (unshaded) and # of Retweeters (shaded)
followers
retweeters
0
200
400
600
800
1000
1200
1400
0 200 400 600 800 1000 1200 1400
# of Retweeters
# of Followers
(b) # of Followers/Retweeters Scatter Plot
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 4: Distribution of Number of Author’s Followers and Number of Retweeters
0
200
400
600
800
1000
1200
1400
1600
31 34 57 26 35 64 61 17 10 44 62 56 41 37 6 23 15 9 32 58 48 18 11 60 43 30 28 51 7 3 36 53 54 19 5 33 13 65 59 47 22 24 27 14 2 55 25 4 16 12 52 50 39 1 21 63 42 8 38 40 46 29 20 45 49
Obs
n
t
: # of Observations (Protected v.s. Non-protected)
non-protected
protected
Figure 5: Number of Observations per Tweet
17
mean std 5% 15% 50% 85% 95%
y
ti
retweet dummy 0.0427 0.2022 - - - - -
w
ti
unid’l dummy 0.7598 0.4272 - - - - -
V
ti
# of followings 1574 9046 25 69 347 1714 3297
W
ti
# of followers 3304 73124 5 22 190 1117 4970
m
ti
# of repetition 3.2845 7.5216 1 1 1 4 11
Table 3: Descriptive Statistics
y
ti
w
ti
V
ti
W
ti
m
ti
y
ti
retweet dummy 1.0000
w
ti
unid’l dummy 0.0072 1.0000
V
ti
# of followings -0.0225 -0.0921 1.0000
W
ti
# of followers -0.0065 -0.0338 0.4436 1.0000
m
ti
# of repetition 0.0493 -0.1508 0.2002 0.1400 1.0000
Table 4: Correlations
Table 4.
Let y
t
=
n
t
i=1
y
ti
be the number of retweeters among author t’s followers (note that
y
t
̸= ν
t
), and yr
t
= y
t
/n
t
could then be naturally interpreted as the retweeting rate of
t. Figure 6 shows the retweeting rate across the tweets with a 95% error bar. That the
rate varies quite a lot is not surprising given the significant heterogeneity across the tweets
(i.e., the intrinsic quality). Hence, we should consider tweet-specific effects when modeling
retweeting behavior. Over the whole sample (i.e., tweets pooled together), the retweeting
rate is 0.0427, and the 95% confidence interval is (0.0402, 0.0452).
25
w
ti
is the binary indicator of unidirectional relationship, which is also our main opera-
tionalization of a weak tie in econometric analysis. The simple correlation of y
ti
and w
ti
is
positive. w
t
=
n
t
i=1
w
ti
is the number of author t’s followers who were not followed back by t.
wr
t
= w
t
/n
t
is thus the fraction of t’s unidirectional followers. We plot wr
t
in Figure 7, which
shows that for most of the tweets in our sample, wr
t
is in the range (0.5, 0.9). Over the whole
sample, the fraction is wr = 0.7598, and its 95% confidence interval is (0.7545, 0.7652).
26
25
Because we selected popular tweets, this retweeting rate does not generalize to the entire tweet space.
26
We also compute the fraction of unidirectional links among all 110,583,366 relationships observed in our
database (not only those between authors and their followers); the percentage is 75.2%, which is surprisingly
close to wr. In other words, this finding says that, on average, one out of four edges in the Twitter world
is bidirectional. Kwak et al. crawled the entire Twitter network in July 2009 and computed this rate to
18
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
42 50 13 58 53 60 19 33 59 2 10 30 49 47 4 25 43 7 63 24 38 3 41 12 44 36 45 22 21 23 8 16 51 54 9 55 32 57 28 65 27 6 52 11 18 39 48 46 5 20 15 26 37 40 56 34 64 14 35 29 1 17 31 61 62
Ratio
yr
t
, Retweeting Rate
retweeting rate, 95% error bar
Figure 6: Retweeting Rate Across Tweets
Some basic descriptive statistics of the number of followings (V
ti
) and the number of
followers (W
ti
) can be found in Table 3. The median values of both V
ti
and W
ti
are much
smaller than their respective mean values, so both distributions are positively skewed and
have long right tails (i.e., the majority of the users had tens or hundreds of followings and
followers, but a handful of them might have had up to hundreds of thousands of followings
or even millions of followers). Similar statistics can be found in Kwak et al. (2010) and Wu
et al. (2011), but the median numbers are much bigger in our study than in their articles
because we exclude observations with zero followings/followers. The Pearson’s correlation of
V and W is 0.4436, as shown in Table 4, and both V and W are negatively correlated with
y
ti
.
m
ti
is the number of times someone among ti’s followings (re)tweeted t (including author
t’s original tweet). m
ti
also has a heavily positively-skewed distribution: More than half of
the observations received the tweet just once (i.e., none of their followings retweeted). Over
the whole sample, the mean is equal to 3.28, and the standard deviation is equal to 7.53.
We observe that m
ti
is positively correlated with V
ti
, the number of followings a user has,
because m
ti
is by definition the size of a subset of followings. m
ti
is negatively correlated with
w
ti
, meaning bidirectional followers are likely to receive more retweets than unidirectional
ones.
be 77.9%; thus we see more bidirectional links one year after their research. This increment might be an
interesting metric for researchers who study network formation.
19
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
30 59 42 1 47 17 63 41 44 7 32 51 57 62 19 4 60 48 29 15 58 31 10 26 2 38 35 5 55 16 36 33 11 43 34 37 54 25 9 23 3 24 64 56 28 20 22 61 13 6 12 18 14 65 53 50 8 45 21 49 39 52 46 40 27
Ratio
wr
t
: Weak-Tie Rate
weak-ties rate, 95% error bar
Figure 7: Weak-Tie Rate Across Tweets
5 Empirical Model and Results
In this section we use our retweet dataset to perform empirical tests on our hypothesis.
Instead of using standard reduced-form econometric methods for binary response (e.g., probit
or logit), we take a more structural approach, modeling both the user behavior and special
features of social broadcasting technology. We then use MLE technique to estimate the
empirical model and present the results.
5.1 Conditional MLE
We model a two-stage, consumption-retweeting process, in which consumption is the nec-
essary first step for retweeting. We first describe the two stages and derive the likelihood
function that would be used in our final conditional MLE analysis. We then show the results
and discuss our findings.
Stage One: Consumption
The first stage models whether a follower of author t, say, ti, after receiving a tweet, actually
consumes it. Figure 8 illustrates the technological aspect of this stage. The horizontal line
stands for ti’s home timeline or Twitter feed, which is a stream of received tweets for ti to
consume (read), including retweets, listed in chronological order. Note that not only the
original tweet t but also ti’s followings’ retweets of it, if any, appear in ti’s timeline. The
20
..
rt 0
.
rt 1
.
rt 2
.
rt 3
.
rt 4
.
τ
1
.
τ
2
.
L
.
L
Figure 8: (Re)Tweets Entering a Twitter User’s Timeline
downward pointing arrows show the times at which a total of five (re)tweets of t enter the
feed. Between these five (re)tweets, other tweets are also posted by ti’s followings.
In reality, few Twitter users can or will monitor their Twitter feed continuously. We
assume every time they start reading their feeds, they consume only a limited number of
tweets. In the example shown in Figure 8, the upward pointing arrows indicate the times,
τ
1
and τ
2
, when user ti launches her Twitter application. Because the tweets are listed in
chronological order, tweets posted at times close to the τ s are more likely to be consumed.
For simplicity, we use a thick horizontal segment to indicate a “period of attention” of length
L, inside which tweets posted are consumed. In doing so, we implicitly assume that users do
not discriminate between tweets authored by different people. The only factor determining
whether a tweet catches the user’s attention is whether it enters the timeline during a certain
period preceding the time a user checks tweets.
27
Therefore, the cognitive limit restricts a user ti from reading every single tweet she
receives. In Figure 8, tweets that enter into the timeline in the interval (τ
1
, τ
2
L) are
outside any of the periods of attention and would not be consumed by ti. When a tweet t
gets retweeted by ti’s followings, it enters the timeline multiple times, thus increasing the
likelihood that t falls into one of the periods of attention (e.g., rt3 in the figure). If neither
the original tweet t nor the retweets fall into some period of attention, then it is not consumed
and hence would not be retweeted by ti.
Unfortunately, whether tweet t is actually consumed by ti is unobserved. Our task
for this stage is to build a probabilistic model to capture the likelihood that ti consumes
t, conditional on observed variables. Based on previous discussions about the technology,
whether ti consumes tweet t is determined by three factors: (1) m
ti
, the number of times
27
This “random-reading” modeling assumption is only a rough approximation of the real consumption
stage. In reality, great variation exists in how people use Twitter and read their Twitter feed. However,
because most people receive a large amount of tweets, of which they are “able” to consume only a portion, we
believe that without detailed data on individual Twitter usage, “random-reading” is an appropriate modeling
approximation for us to use.
21
t appears in ti’s timeline; (2) the frequency with which ti checks her Twitter feed; and (3)
L, which is determined by the number of tweets ti can read in each consumption and the
number of tweets ti receives per unit of time, which we assume to be a linear function of
V
ti
(i.e., the more people a user follows, on average, the more tweets she receives over a
fixed time span). Therefore, we propose the condition for ti to consume t be the following
equation:
m
ti
bV
ti
> a
ti
, (1)
where b is a positive constant and 1/(bV
ti
) measures L.
28
The unobserved variable a
ti
can
be interpreted as an inverse measure of the frequency with which ti checks her Twitter feed,
and is assumed to be independent of both V
ti
and m
ti
. The left side of equation (1) can
be seen as the scaled frequency with which t appears in the timeline, and the right side as
a user-specific threshold. If a user does not check her feed very often, so that she gets a
high draw of a
ti
, then the scaled frequency needs to be high for the tweet to be consumed,
and vice versa. To derive the likelihood function, we further assume that a
ti
is log-normally
distributed in the population:
log a
ti
|t log a
ti
N (a, σ
2
a
). (2)
So we can rewrite equation (1) as
log b + log m
ti
log V
ti
> log a
ti
a + log b
σ
a
+
1
σ
a
log m
ti
1
σ
a
log V
ti
>
log a
ti
a
σ
a
,
where the term on the right side is a standard normal distribution. Thus, the ex ante
probability that ti consumes tweet t, conditional on receipt, is
p
1
= p(
a + log b
σ
a
+
1
σ
a
log m
ti
1
σ
a
log V
ti
>
log a
ti
a
σ
a
)
= Φ(
a + log b
σ
a
+
1
σ
a
log m
ti
1
σ
a
log V
ti
),
(3)
where Φ is the cumulative distribution function (CDF) of the standard normal distribution.
The outcome of this stage is unobserved, so we cannot estimate the parameters in which we
are interested just on the basis of equation (3).
28
Or more generally, we can assume L =
z
ti
bV
ti
, where z
ti
is the number of tweets ti can read in each
consumption and bV
ti
, b > 0, is the number of tweets received by ti per unit of time. We can still get (1) by
dividing both sides by z
ti
and absorbing the unobserved z
ti
into a
ti
.
22
Stage Two: Retweeting
Recall that a follower ti retweets only if ti consumes the tweet himself. If a user’s first stage
outcome is a failure (he does not consume t), then his final outcome would automatically
be not retweeting, y
ti
= 0. In other words, y
ti
= 1 implies success at both stages. Unlike
the first stage, where success is determined by the broadcasting technology and chance, the
second stage outcome depends on the decision made by the user.
At the second stage, the users who have consumed the tweets each decide whether
to retweet. The decision is made on the basis of a subjective cost-benefit analysis. As
discussed in Section 3, the latent benefit of retweeting depends on both the number of
followers the content is retweeted to, W
ti
, and the mean valuation the followers attach to
the tweet, which we denote α
ti
. Thus, we write the latent benefit α
ti
W
ti
. We expect ti’s
followers’ mean valuation, α
ti
, to be moderated by the strength of the social tie connecting
author t and potential retweeter ti. Finally, for the retweeting act to happen, the latent
benefit should exceed the user-specific reservation utility or cost, denoted c
ti
. Therefore,
after using logarithmic transformation, the necessary and sufficient condition of retweeting
upon consumption can be written (with a slight abuse of the notation α and c):
α
t
+ δw
ti
+ β log W
ti
> c
ti
, (4)
where c
ti
, like a
ti
, is unobserved, and α, sub-indexed by t, is allowed to differ across the
tweets, capturing tweet-specific effect.
29
Technically, we further assume c
ti
is distributed normally among the population. We
also allow the unobservables at the two stages to be correlated:
c
ti
|t c
ti
N (c, σ
2
c
), Cor(c
ti
, a
ti
) = ρ. (5)
We can rewrite equation (4) as
c
σ
c
+
α
t
σ
c
+
δ
σ
c
w
ti
+
β
σ
c
log W
ti
>
c
ti
c
σ
c
,
where the right side is a standard normal distribution. Therefore, the conditional probability
of retweeting can be written as follows:
p
2
= p(
c
σ
c
+
α
t
σ
c
+
δ
σ
c
w
ti
+
β
σ
c
log W
ti
>
c
ti
c
σ
c
|
a + log b
σ
a
+
1
σ
a
log m
ti
1
σ
a
log V
ti
>
log a
ti
a
σ
a
).
(6)
29
α
t
also includes the author-specific effect, since in our sample the tweets are all by different authors.
23
Two-Stage Model For MLE
At this point, we put the two stages together. Equations (2), (3), (5), and (6) represent
all the necessary elements for conducting the MLE analysis. The likelihood of observing
outcome y
ti
= 1 for tweet t and follower ti is the product of p
1
and p
2
, and the likelihood of
observing y
ti
= 0 is 1p(y
ti
= 1). In terms of econometrics, not all the structural parameters
are identified. For example, we can identify δ/σ
c
, but not δ and σ
c
separately. Fortunately,
for our research purpose, we care most about the signs of the parameters rather than their
absolute value. In the example, δ/σ
c
has the same sign as δ ; thus, identifying the ratio is
good enough for understanding w’s partial effect. Therefore, for simplicity of notation, we
rearrange the terms, rescale the parameters following the standard practices in probit and
logit models, and obtain our benchmark specification:
p(y
ti
= 1) = p
1
p
2
p
1
= p(e + b
1
log m
ti
+ b
2
log V
ti
> a
ti
)
p
2
= p(α
t
+ δw
ti
+ β log W
ti
> c
ti
|e + b
1
log m
ti
+ b
2
log V
ti
> a
ti
)
a
ti
, c
ti
N (0, 1)
Cor(a
ti
, c
ti
) = ρ
θ = {e, b
1
, b
2
, α
1
, α
2
, . . . , α
T
, δ, β, ρ},
(7)
where θ is a vector of parameters to estimate. α
t
with t ranging from 1 to T absorbs
the constant term and captures the tweet-specific effects. δ is the coefficient of the weak-tie
indicator, which is of our primary interest. b
1
, b
2
, and β determine the partial effects of the
other social network characteristic variables.
Results
With equation (7) in hand, we estimate the parameters using the conditional MLE method.
We report the results in Table 5.
30
We estimate a total of six different specifications, the first
five of which are described in detail in the following paragraphs. The last one is discussed in
the next subsection. In all specifications, we use dummy variables to capture tweet-specific
effects,
31
α
t
s, and we do not report these fixed effects because they are less important in our
30
*, **, and *** indicate 0.1%, 1% and 5% significance levels, respectively.
31
Technically, we can directly use dummy variables to control for fixed effects without appealing to more
sophisticated econometric specifications because we have a large number of observations for every tweet. See
Figure 5.
24
analysis.
32
All standard errors are computed to be robust to tweet clustering.
Model 1 is a simple probit of y
ti
on the four key variables: w
ti
, m
ti
, V
ti
, and W
ti
. Model
2 corresponds to equation (7), with an additional restriction that a
ti
c
ti
, which implies
ρ = 0. Model 3 strictly follows the benchmark equation (7), allowing correlation between a
ti
and c
ti
. Models 4 and 5 slightly modify model 3: Model 4 includes the interaction term of
w
ti
and W
ti
in the retweeting equation; model 5 includes w
ti
in the consumption equation.
We observe that the fitted likelihood increases from model 1, 2 to 3, {4, 5}, as we grad-
ually relax the model restriction by adding richer structures and more variables. Across the
five columns, we find consistent support for a positive m
ti
coefficient (repetition of retweets)
and a negative V
ti
coefficient (the number of followings). All estimates are significant with
99.9% confidence level. Therefore, the results are consistent with the model prediction de-
scribed in Section 5.1, and in particular with equation (3).
The unidirectional-relationship/weak-tie indicator is found to have a significantly posi-
tive effect on the (conditional) retweeting probability. In the benchmark model (model 3),
its coefficient is positive at the 0.1% significance level. The w
ti
coefficient becomes less sig-
nificant, but is still positive at the 5% significance level, when we allow an interaction effect
of tie-strength and the number of followers (model 4) or when we put the weak-tie indicator
into both the consumption and retweeting equations (model 5). These results show that,
in the retweeting equation, the positive sign of the weak-tie coefficient is robust; thus, they
support our hypothesis: Weak ties are more likely than strong ties to relay information to
their social network neighbors.
In model 4, where we include the weak-tie dummy w
ti
in both the consumption and
retweeting equations, we find that, although its effect on retweeting probability is positive
and significant, its effect on consumption probability is negative but insignificant. This result
shows that messages generated from stronger ties might be more likely to be read than those
from weaker ties. However, the difference in likelihood is not statistically significant. It
supports our assumption that users generally do not discriminate between tweets received
from strong ties and tweets received from weak ties. We believe the separation of the different
effects that weak ties have on the two probabilities, as model 4 reveals, shows the merit of our
two-stage econometric model. It indeed uncovers more structure in the retweeting process
than a reduced-form probit regression.
In all models, the number of followers has a significantly positive coefficient. This rev-
32
We do not control for follower fixed effects because, for each tweet, all followers/observations are by
definition distinct, and when we pool tweets together, among all the 24,403 observations, 24,002 are unique.
25
elation by our econometric models is a new one because, as shown in Table 4, the simple
correlation between y
ti
and W
ti
is negative. This result thus supports our argument in the
theory section that the number of subscribers is positively associated with the latent benefit
of retweeting.
5.2 Theoretical Model Revisited
From model 1 to model 5, we consistently find that, conditional on the consumption of a
piece of information, weak-tie users are more likely to share information with their social
network neighbors. In the theory section, we argued the reason is that a weak-tie follower’s
followers would on average value the information more than a strong-tie follower’s followers;
thus, the latent benefit from the social exchange of content sharing is greater for a weak-tie
follower than for a strong-tie follower, everything else being equal.
In a social broadcasting environment, two possible explanations remain for the higher
mean valuation of the shared content from a weak-tie follower’s followers:
1. New audience effect: Because of the social broadcasting technology (in which whatever
is posted or shared is broadcast to all followers), the possibility exists that the infor-
mation has already been circulated to more of a strong-tie follower’s followers than
to a weak-tie follower’s followers.
33
Holding the total number of a potential sharer’s
followers constant, the expected number of followers who are new to the information
is larger for a weak-tie follower. Therefore, a weak-tie follower can reach a larger new
audience, and hence the sharing gives a greater social exchange benefit.
2. Informational value effect: The information to be shared is intrinsically more valuable
to a weak-tie follower’s followers than to a stronger-tie follower’s followers. Therefore,
a weak-tie follower is more willing to share it because the sharing is expected to yield
higher social exchange benefit.
3. A third possibility is that both of these two effects exist.
We test the three possibilities in model 6 by adding two empirically constructed followers-
overlap measures into the second-stage retweeting equation. Mathematically, we define two
33
One important observation is that a strong-tie follower’s followers are more likely to be simultaneously
following the author than a weak-tie follower’s followers. Readers can refer to Appendix I for an empirical
test.
26
versions of an overlap index of followers:
OI
W 1
ti
=
¯
W
ti
W
t
W
ti
OI
W 2
ti
=
¯
W
ti
min{W
t
, W
ti
}
,
where
¯
W
ti
, W
t
, and W
ti
are the number of mutual followers author t and user ti shared, the
number of followers author t had, and the number of followers ti had, respectively. OI
W 1
ti
and OI
W 2
ti
basically measure how “similar” user ti’s followers and author t’s followers are:
The larger the index is, the more similar the two sets of followers are. The indexes are
also used in Appendix I, where we test whether unidirectional relationships are weaker than
bidirectional ones. Readers can refer to Appendix I to see more discussion on the indexes.
We include OI
W 1
ti
and OI
W 2
ti
to capture the new audience effect, the first explanation. If
it is indeed a driver of the result, we expect OI
W 1
ti
and OI
W 2
ti
collectively to have a negative
effect on retweeting probability: If a user has a large number of followers who also follow
the author, then he or she should be less willing to share the information. Moreover, if the
new audience effect is the sole driver, then the weak-tie indicator w
ti
should have no effect
on retweeting probability once we include the two indexes.
34
If we find the two indexes have
negative coefficients and the weak-tie indicator still has a positive coefficient, then we should
conclude that both the informational value effect and the new audience effect exist.
The result of model 6 shows that the coefficients of the two indexes are indeed negative.
Although the second version of the overlap index, OI
W 2
ti
, separately is insignificant, collec-
tively they are significant with 99.9% confidence level. The magnitude of the coefficient of
w
ti
decreases from model 3, but, it is still positive at 0.1% significance level. These two
findings together support the third possibility: Both the informational value effect and the
new audience effect exist.
34
Assume the two indexes have perfectly captured the new audience effect.
27
Probability of Retweeting Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Coeff. Coeff. Coeff. Coeff. Coeff. Coeff.
(z-value) (z-value) (z-value) (z-value) (z-value) (z-value)
p
1
: Probability of Consumption upon Receipt
log m
ti
# of Repetitions 0.174*** 0.494*** 0.340*** 0.340*** 0.338*** 0.434***
(8.21) (4.81) (4.37) (4.37) (4.47) (5.86)
log V
ti
# of Followings -0.170*** -0.639*** -0.472*** -0.473*** -0.475*** -0.566***
(-9.19) (-10.38) (-5.14) (-5.05) (-5.02) (-7.23)
w
ti
Weak tie -0.076
(-0.46)
p
2
: Probability of Retweeting upon Consumption
w
ti
Weak tie 0.218*** 0.284*** 0.220*** 0.237* 0.249** 0.175***
(5.13) (5.57) (5.52) (2.14) (3.22) (4.11)
log W
ti
# of Followers 0.087*** 0.115*** 0.101*** 0.103*** 0.102*** 0.131***
(5.41) (6.32) (7.46) (4.95) (7.21) (6.83)
w
ti
log W
ti
Weak tie × -0.003
# of Followers (-0.17)
OI
W 1
ti
Overlap Index -2.131*
of Followers I (-2.44)
OI
W 2
ti
Overlap Index -0.429
of Followers II (-1.33)
ρ Correlation - -0.836*** -0.835*** -0.834*** -0.606*
(p-value) (0.000) (0.000) (0.000) (0.034)
# of Observations 24,403 24,403 24,403 24,403 24,403 24,403
Pseudo Log-Likelihood -3,953.823 -3,921.876 -3,913.125 -3,913.112 -3,913.010 -3,892.148
Table 5: Result of Maximum Likelihood Estimation
28
6 Managerial Implications
Impression vs. Consumption
Internet display advertisement is often priced based on cost per impression or cost per
action (e.g., cost per purchase, cost per click). However, it is important to realize that an
ad being displayed is not equivalent to an ad being consumed. In other words, between
the stage of impression and action is a stage of consumption, which does not necessarily
occur after an impression because Internet users are often overloaded with information. The
popularization of social broadcasting technologies, or social media as a whole, has greatly
facilitated decentralized information production, which further leads to an explosion of user-
generated content.
35
Then the question arises: Of the content being produced, how much
is actually being consumed? One answer to this practical, important question suggests the
possibility of another way of pricing for display advertisement: cost per ad consumption.
This approach has largely been ignored in the literature because ascertaining whether an
Internet user actually reads or watches an ad is difficult. Indeed, neither the content creator
nor any third-party can observe whether an individual has consumed a piece of content
supplied to him or her.
What our empirical model contributes is that the estimation of the first stage equation
p
1
= Φ(e + b
1
log m + b
2
log V ),
provides a simple but useful way to approximate the probability of consuming a piece of
content for an average Twitter user with certain social network characteristics. Essentially,
our model solves this problem in the Twitter context by exploiting the fact that observed
interaction with content can be used to infer unobserved consumption. Given that the
number of impressions is usually known or can be easily obtained, one then can calculate
the expected number of consumptions for a piece of published content.
Influence
Measuring a user’s social influence in an online community is of great interest to managers
who want to leverage the power of social media. On Twitter, a user is often regarded as
being influential when many people retweet her tweets. Indeed, the depth of penetration
35
Taking Twitter as an example, as of May 2011, the average volume of tweets posted per day had reached
150 million (i.e., more than 1,700 tweets per second.)
29
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 1000 2000 3000 4000 5000
Prob
W
i
: Number of Followers
p
2
: Probability of Retweeting upon Consumption, w=0 vs w=1, Median Quality Tweet in Sample
(5, p=0.027)
(5, p=0.045)
(190, p=0.060)
(190, p=0.091)
(4970, p=0.111)
(4970, p=0.158)
w = 1
w = 0
Figure 9: The Probability of Retweeting a Tweet upon Consumption
and breadth of reach of one’s words in an online community are important aspects of social
influence. Our model measures the role that social network characteristics play in the infor-
mation diffusion process. Combined with the probability of consumption, we can compute
the expected total number of consumers of a user’s tweet based on his or her social network
characteristics, which may serve as a starting point for measuring his or her social influence.
One important implication of our study is that having more followers does not directly
translate into greater social influence. In particular, the strength of social ties between a user
and her followers should have an important moderating role, because it can greatly affect the
followers’ willingness to forward her messages. To see this more intuitively, we plot the fitted
retweeting probabilities in Figure 9 for w = 0 (solid curve) and w = 1 (dashed curve), fixing
α
t
at the median value in our sample. The difference between the conditional probabilities
of retweeting for a unidirectional follower and for a bidirectional follower is significant. For
example, when W , the number of followers, equals 190, the median number in our sample,
the conditional likelihoods of retweeting are 6.0% for a bidirectional follower and 9.1% for a
unidirectional follower. The latter is more than 50% higher in percentage.
7 Conclusion
An important question in the field of information systems is how information or knowledge is
disseminated in an online community (with or without an organizational form). Large-scale
empirical studies to address this question have traditionally been challenging because of the
difficulty of obtaining detailed micro-level data. To the best of our knowledge, this paper
30
is the first such study in the information systems field, where publicly available data from
Twitter is used to explore people’s voluntary information relay process.
Using a carefully designed data collection process and a series of econometric analyses,
we find that information is more likely to be retweeted through weak ties on Twitter. This
result is complementary to Granovetter’s finding, which advocates for the important role of
weak ties in carrying novel information (Granovetter 1973). The implications of our finding
are far-reaching. On the one hand, our theory, which is based on two highly influential
sociological theories – the social exchange theory and the strength of weak tie theory
and is supported by the latest data from one of today’s largest online social networks,
reveals the important role that weak ties play in facilitating information dissemination in
the social network through people’s voluntarily information relay behavior. On the other
hand, the interesting connection between tie strength and retweeting behavior indicates the
importance of incorporating tie strength when measuring personal influence on Twitter,
which is a question of fundamental importance to both researchers and practitioners.
As one of the first in the information systems field to bring together the huge amount
of public data on Twitter with sociological theories to study information diffusion in so-
cial broadcasting networks, the paper is not without its limitations. First, the tweets in
our dataset were not randomly sampled. By using this dataset to study the effect of tie
strength in information sharing, we implicitly assumed that tweet “quality” changes every-
one’s retweeting probability only uniformly. Relaxing this assumption requires additional
work (including obtaining a new dataset) to test whether our results hold when the quality
of tweets is moderate or low. Second, we measured tie strength using a binary variable
based on whether a link is unidirectional or bidirectional. Measuring tie strength based on
the amount of conversation between two Twitter users would be an alternative approach.
Third, we used only an author’s immediate followers and omitted higher-order potential
followers in empirical analyses. As we discussed in the data section, this was due to the
difficulty of collecting network graph data for all higher-order potential retweeters. In fu-
ture research one could try to overcome the difficulty by, possibly, sampling these users.
Fourth, we observed only one snapshot of the social network and thus modeled it as fixed
and exogenous. Future research can examine the interplay of user behavior and the dynamics
of underlying network structure. Another possibility for extending the current study is to
include more user-specific variables (e.g., demographic information) and tweet-specific vari-
ables (e.g., constructed from natural language processing) into the econometric model. Of
course, these extensions pose new challenges in terms of data collection and data processing.
Nevertheless, they are certainly interesting directions to pursue in the future.
31
References
[1] Bakshy, E., J. Hofman, W. Mason, D. Watts 2011, “Everyone’s an Influencer: Quanti-
fying Influence on Twitter,” Proceedings of the Fourth ACM International Conference
on Web Search and Data Mining.
[2] Bapna, R., A. Gupta, S. Rice, A. Sundararajan 2012, “Trust, Reciprocity and the
Strength of Social Ties: An Online Social Network based Field Experiment,” working
paper.
[3] Blau, P. 1964, Exchange and Power in Social Life, Transaction Publishers.
[4] Bock, G., R. Zmud, Y. Kim, J. Lee 2005, “Behavioral Intention Formation in Knowledge
Sharing: Examining the Roles of Extrinsic Motivators, Social-Psychological Forces, and
Organizational Climate,” MIS Quarterly 29, 87-111.
[5] Chiu, C., M. Hsu, E. Wang 2006, “Understanding Knowledge Sharing in Virtual Com-
munities: An Integration of Social Capital and Social Cognitive Theories,” Decision
Support Systems 42, 1872-1888.
[6] Constant, D., L. Sproull, S. Kiesler 1996, “The Kindness of Strangers: The Usefulness
of Electronic Weak Ties for Technical Advice,” Organization Science 7, 119-135.
[7] Davis, J. 1970, “Clustering and Hierarchy in Interpersonal Relations: Testing Two
Graph Theoretical Models on 742 Sociomatrices,” American Sociological Review 35,
843-851.
[8] Friedkin, N. 1980, “A Test of Structural Features of Granovetter’s Strength of Weak
Ties Theory,” Social Networks 2, 411-422.
[9] Friedkin, N. 1982, “Information Flow Through Strong and Weak Ties in Intraorganiza-
tional Social Networks,” Social Networks 3, 273-285.
[10] Granovetter, M. 1973, “The Strength of Weak Ties,” The American Journal of Sociology
78, 1360-1380.
[11] Granovetter, M. 1983, “The Strength of Weak Ties: A Network Theory Revisited,”
Sociological Theory 1, 201-233.
[12] Hansen, M. 1999, “The Search-transfer Problem: The Role of Weak Ties in Sharing
Knowledge across Organization Subunits,” Administrative Science Quarterly 44 82-111.
32
[13] Heider, F. 1958, The Psychology of Interpersonal Relations, John Wiley & Sons.
[14] Homans, G. 1958, “Social Behavior as Exchange,” The American Journal of Sociology
63, 597-606.
[15] Kwak, H., C. Lee, H. Park, S. Moon 2010, “What is Twitter, a Social Network or a
News Media,” Proceedings of the 19th International Conference Companion on World
Wide Web.
[16] Levin, D., R. Cross 2004, “The Strength of Weak Ties You Can Trust: The Mediating
Role of Trust in Effective Knowledge Transfer,” Management Science, 50 11 1477-1490.
[17] Marlow, C., L. Byron, T. Lento, I. Rosenn 2009, “Maintained Relationships on
Facebook,” online at http://overstated.net/2009/03/09/maintained-relationships-on-
facebook.
[18] Olivera, F., P. Goodman, S. Tan 2008, “Contribution Behaviors in Distributed Envi-
ronments,” MIS Quarterly 32, 23-42.
[19] Onnela, J., J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, and A.-
L. Barabasi 2007, “Structure and Tie Strengths in Mobile Communication Networks,”
Proceedings of the National Academy of Sciences USA, 104 7332-7336.
[20] Socialflow 2011, “Breaking Bin Laden: Visualizing the Power of a Single Tweet,”
available online at http://blog.socialflow.com/post/5246404319/breaking-bin-laden-
visualizing-the-power-of-a-single.
[21] Wasko, M., S. Faraj 2005, “Why Should I Share? Examining Social Capital and Knowl-
edge Contribution in Electronic Networks of Practice,” MIS Quarterly 29, 35-57.
[22] Wu, S., J. Hofman, W. Mason, D. Watts 2011, “Who Says What to Whom on Twitter,”
Proceedings of the 20th International Conference Companion on World Wide Web.
33
Appendix I
In this appendix, we discuss our operationalization of weak ties used in our empirical anal-
yses. We define tie strength based on the following-follower relationships observed in the
Twitter network, and specifically, we claim that unidirectional relationships are on average
weaker than bidirectional ones. We want to stress a few points regarding this assumption.
First, we are not claiming that a bidirectional relationship in the Twitter world is a strong
tie in the absolute sense. Twitter users, even if they are mutually connected online, often
barely know each other in the real world, so to a certain extent, the claim that almost all
ties on Twitter are weak is a fair one to make. The hypothesis only emphasizes the ordinal
strength of the two tie types, and the comparison is carried out in the sense of probabilistic
expectation. The reason why reciprocity makes a difference is that frequent learning or reg-
ular interaction is more likely to happen when a reciprocal relationship exists. By reading
each other’s posts, a pair of users can more easily develop mutual understanding about each
other’s topics of interest and expertise, and sometimes even about detailed aspects of each
one’s personal life. Over time, even though the pair are unknown to each other in the real
world, they could probably become very familiar with each other’s activities and habits in
the online community. Of course, reciprocal following does not guarantee such relationship
development (which is why we emphasize the probabilistic nature of the hypothesis). How-
ever, without it, the relationship development is unlikely. Moreover, our operationalization
is consistent with the previous sociological literature. Granovetter (1973) pointed out the
importance of reciprocity by defining that “the strength of a tie is a (probably linear) com-
bination of the amount of time, the emotional intensity, the intimacy (mutual confiding),
and the reciprocal services which characterize the tie. In Friedkin (1982), asymmetrical con-
tact between college professors was classified as a weak tie, and a reciprocal connection was
classified as a strong tie. Marlow et al. (2009) also applied similar definitions in analyzing
friendships on Facebook.
We perform an empirical test on the hypothesis, using the network graph data we col-
lected. Note that we know not only the number of followings (followers) a user has, but also
whom the followings (followers) are (i.e., we observe the IDs of the user’s immediate social
neighbors in our database). This information should give us more knowledge about, and in
the meantime the ability to build important metrics of, a user’s network characteristics. In
particular, knowing the IDs of two users’ social neighbors, we can compare how “similar”
their social neighborhoods are. In deriving his theory, Granovetter in his 1973 paper claimed
that the stronger the social tie between two persons, the larger the overlap of their friend-
ship circles. Applying this statement in the Twittersphere, under our assumption, we would
34
OI
V 1
OI
V 2
OI
W 1
OI
W 2
w
ti
-0.042*** -0.069*** -0.034*** -0.064***
F (2322.21) (1476.43) (3837.34) (2158.65)
p-value 0.00 0.00 0.00 0.00
Table 6: Results of ANOVA Tests
expect that two users who mutually follow each other, on average, have a larger overlap in
their followings (followers) than two who don’t. Our test is based on this prediction. Opera-
tionally, we do so by empirically verifying whether w
ti
= 0 positively correlates with a higher
“similarity” between user ti and author t’s followings (followers). We measure “similarity”
by computing two overlap indexes of followings (followers) of author t and user ti:
OI
V 1
ti
=
¯
V
ti
V
t
V
ti
OI
V 2
ti
=
¯
V
ti
min{V
t
, V
ti
}
, (8)
where
¯
V
ti
, V
t
, and V
ti
are the number of mutual followings author t and user ti shared, the
number of followings author t had, and the number of followings i had, respectively (Onnela
et al. 2007 defined a similar “neighborhood overlap”). Similarly, we can define and compute
overlap indexes of followers (OI
W 1
ti
, OI
W 2
ti
) by changing V to W in equation (8). Note that
the two numerators in equation (8) are the same:
¯
V
ti
. The difference between OI
V 1
ti
and
OI
V 2
ti
is in the denominators, or in the way by which we scale down
¯
V
ti
based on the number
of followings ti has. Both indexes are in the range [0, 1] because
¯
V
ti
min{V
t
, V
ti
}. The
larger the indexes are, we say the more “similar” the two sets of followings are. When t and
ti have no mutual followings shared, both indexes equal 0. When t and ti have exactly the
same sets of followings, OI
V 1
ti
= 1. When ti’s followings represent a subset/superset of t’s
followings, OI
V 2
ti
= 1.
We investigate wether different w
ti
values lead to significantly different overlap indexes
by running a series of ANOVA tests, the results of which are given in Table 6. In all four
tests, we control tweet-specific effects. As the regression coefficients in the first row show, we
find that a unidirectional relationship (w
ti
= 1) is indeed associated with a smaller overlap
in social neighborhoods. The F statistics and p-values indicate this difference is significant
at 0.1% level, no matter which index we use. Therefore, bidirectional relationships are
associated with higher transitivity in social neighborhoods. The results thus support our
hypothesis that unidirectional relationships are, on average, weaker than bidirectional ones.
35
Appendix II
0
1
2
3
4
5
6
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
54
55
56
57
58
67
69
71
72
73
74
75
77
79
80
81
82
84
85
86
87
88
89
90
91
92
94
95
96
97
98
99
100
102
103
104
105
106
107
108
109
112
113
111
115
118
120
121
122
124
126
127
129
130
128
131
136
137
138
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
60
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
343
344
345
346
347
348
349
350
351
352
353
354
355
356
359
358
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
376
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
395
396
398
399
401
402
403
404
405
406
408
409
410
411
412
413
414
415
416
417
420
421
422
425
424
426
427
428
429
430
431
433
434
435
437
438
439
440
441
443
444
445
446
447
448
449
450
451
453
454
455
456
457
458
459
460
461
462
463
464
466
467
468
469
470
471
472
474
475
476
477
473
478
479
480
481
482
484
486
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
508
509
511
512
513
514
515
516
517
518
519
520
521
522
523
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
555
557
558
559
560
562
563
564
565
568
570
571
572
573
574
566
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
591
592
593
594
595
596
598
601
600
602
604
605
607
608
609
610
612
613
614
615
616
617
621
622
623
626
627
628
629
630
631
632
633
634
635
636
637
638
569
639
641
642
645
646
647
648
649
650
651
652
655
656
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
674
Figure 10: The Spread of a Single Tweet (idx=1) in Our Sample
36