Category Archives: Data sources

Phil 4.26.18

Too much stuff posted yesterday, so I’m putting Kate Starbird’s new paper here:

  • Ecosystem or Echo-System? Exploring Content Sharing across Alternative Media Domains
    • This research examines the competing narratives about the role and function of Syria Civil Defence, a volunteer humanitarian organization popularly known as the White Helmets, working in war-torn Syria. Using a mixed-method approach based on seed data collected from Twitter, and then extending out to the websites cited in that data, we examine content sharing practices across distinct media domains that functioned to construct, shape, and propagate these narratives. We articulate a predominantly alternative media “echo-system” of websites that repeatedly share content about the White Helmets. Among other findings, our work reveals a small set of websites and authors generating content that is spread across diverse sites, drawing audiences from distinct communities into a shared narrative. This analysis also reveals the integration of government funded media and geopolitical think tanks as source content for anti-White Helmets narratives. More broadly, the analysis demonstrates the role of alternative newswire-like services in providing content for alternative media websites. Though additional work is needed to understand these patterns over time and across topics, this paper provides insight into the dynamics of this multi-layered media ecosystem.

7:00 – 5:00 ASRC MKT

  • Referencing for Aanton at 5:00
  • Call Charlestown about getting last two years of payments
  • Benjamin D. Horne, Sara Khedr, and Sibel Adali. “Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape” ICWSM 2018
  • Continuing From I to We: Group Formation and Linguistic Adaption in an Online Xenophobic Forum
  • Anchor-Free Correlated Topic Modeling
    • In topic modeling, identifiability of the topics is an essential issue. Many topic modeling approaches have been developed under the premise that each topic has an anchor word, which may be fragile in practice, because words and terms have multiple uses; yet it is commonly adopted because it enables identifiability guarantees. Remedies in the literature include using three- or higher-order word co-occurence statistics to come up with tensor factorization models, but identifiability still hinges on additional assumptions. In this work, we propose a new topic identification criterion using second order statistics of the words. The criterion is theoretically guaranteed to identify the underlying topics even when the anchor-word assumption is grossly violated. An algorithm based on alternating optimization, and an efficient primal-dual algorithm are proposed to handle the resulting identification problem. The former exhibits high performance and is completely parameter-free; the latter affords up to 200 times speedup relative to the former, but requires step-size tuning and a slight sacrifice in accuracy. A variety of real text copora are employed to showcase the effectiveness of the approach, where the proposed anchor-free method demonstrates substantial improvements compared to a number of anchor-word based approaches under various evaluation metrics.
  • Cleaning up the Angular/PHP example. Put on GitHub?