Phil 5.6.20

#COVID

I looked at the COVID-19-TweetIDs GitHub project, and it is in fact lists of ids:

1219755883690774529
1219755875407224832
1219755707001659393
1219755610494861312
1219755586272813057
1219755378428338181
1219755293397012480
1219755288988798981
1219755197645279233
1219755157438828545

These can work by appending that number to the string “twitter.com/anyuser/status/”, like this: twitter.com/anyuser/status/1219755883690774529

The way to get the text in Python appears to be tweepy. This snippet from stackoverflow appears to show how to do it, but I haven’t verified yet.

import tweepy
consumer_key = xxxx
consumer_secret = xxxx
access_token = xxxx
access_token_secret = xxxx

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

tweets = api.statuses_lookup(id_list) # id_list is the list of tweet ids
tweet_txt = []
for i in tweets:
    tweet_txt.append(i.text)

GPT-2 Agents

Continuing with PGNtoEnglish
- Figuring out how to parse the moves text, using the wonderful regex101 site
4:30 meeting
- We set up an Overleaf project with the goal to submit to the Harvard/Kennedy Misinformation Review
- We talked about the GPT-2 as a way of clustering tweets. Going to try finetuning with some Arabic novels first to see if it can work in that language

GOES

Continuing with the MLP sequence-to-sequence NN
- Getting the data to fit into nice, rectangular arrays, which is no straightforward, since the time window of the query can return a varying number of results. So I have to run the query, then trim the arrays down so that they are all the length of the shortest. Here’s the results: