Monthly Archives: April 2021

Phil 4.30.21

April is almost over! And after a couple of summer-like days, we’re back to seasonal

GPT-Agents

  • 3:30 Meeting
  • Working on a first pass at the religion map. Asking the GPT-3 for “A short list of religions:” returned a response that I regex’d into the following:
["Christianity", "Islam", "Hinduism", "Buddhism", "Judaism", "Sikhism", "Confucianism", "Shintoism", "Taoism", "Zoroastrianism", "Jainism", "Wicca"]
  • Next was getting the prompt:
"A short list of the religions that are closest to {}:"
  • working with the model. There are more varied responses, so the parsing is a little more complex. The way that I’m currently working is by having the model return ten (rather than 3) responses that I then organize:
    • The first element is to look for a similar Wikipedia page, which is done as follows:
source_term = "Seax Wicca"
page_list = wikipedia.search(source_term, suggestion=False)
print(page_list)
closest_list = dl.get_close_matches(source_term, page_list)
print(closest_list)

d = {}
total = 0
for p in closest_list:
views = get_wiki_pageviews(p)
d[p] = views
total += views

for k in d:
print("'{}' = {:,} views ({:.1f}%)".format(k, d[k], d[k]/total*100))
  • Which generates:
['Seax-Wica', 'Magical tools in Wicca', 'Raymond Buckland', 'Wicca', 'Wheel of the Year', 'Altar (Wicca)', 'History of Wicca', 'Triple Goddess (Neopaganism)', 'Horned God', 'Faery Wicca']
['Seax-Wica', 'Wicca', 'Faery Wicca']
'Seax-Wica' = 194 views (0.7%)
'Wicca' = 25,755 views (97.7%)
'Faery Wicca' = 417 views (1.6%)
  • I think for the time being, I’ll just pull the first one (closest_list[0]) and see what that looks like, though I could also use all close matches or the one with the largest page views
  • Rolling all the changes into GraphToDB. Urk.
  • I had to tweak out some junk text (maybe UTF-8 issues?) Here’s an example: = “Baháʼí Faith” is being rendered as
https://viztales.files.wordpress.com/2021/04/religion_1.png
  • Here’s a bigger one:
https://viztales.files.wordpress.com/2021/04/religion_2.png

Book

  • 2:00 Meeting with Michelle. Nice progress!
  • 5:00 Meeting With Chris C. That was a really nice chat!
    • Need to write up a paragraph about me and my work.

Phil 4.29.21

SBIR

  • Finish writing and send on for review/submission
    • Summary – done!
    • Incorporate Peter & Loren’s contributions – done!
    • Looks good to submit tomorrow
    • 9:15 standup
    • 10:30 prep meeting
    • 3:00 intro meeting
      • Went nowhere. More than anything, this reminded me of a Defense with a hostile faculty lobbing hand grenades. In my list of management types, this guy was an assassin/power broker

GPT Agents

  • Got a ping from Ashwag on her team’s work, which was nice
    • Did some cleanup editing on the paper
  • Work on religion map if I get all the SBIR work done in time. Nope – tomorrow

Phil 4.28.21

https://www.nytimes.com/interactive/2020/world/asia/india-coronavirus-cases.html?action=click&module=Top%20Stories&pgtype=Homepage

GPT-Agents

  • Spent some time this morning adjusting the code so that experiment-specific regexes can be created and stored in the db. Also played around some with trying to figure out how to choose the best Wikipedia page(s?)

SBIR

  • Working on the status report. Mostly done. Need to do the summary paragraph
  • 2:00 weekly meeting. Asked Peter and Loren to supply content by COB Thursday

JuryRoom

  • 7:00 Meeting

Phil 4.27.21

GPT Agents

  • Did a little housecleaning since I’m going to have to work on the status report for the rest of the week. I’ve moved the experiment-specific code into its own method and added a “node_type”
  • Updated the ICWSM paper to include the NSF grant info
  • 3:00 Meeting
    • Spent a lot of time working on probes for belief systems such as white supremacy. It’s much more complex than countries. The parser needs(?) to be able to:
      • Split on \n as well as [,:;]
      • Ignore leading numbers
      • Match on earlier sections of each text (maybe just cut everything else after n words?)
      • Do a more forgiving match on the wikipedia. For example, the probe: “The great religions are all characterized by” returns a list that contains “Belief in a Messiah or a prophet.” Sending that to the wikipedia returns [‘Messiah’, ‘Messiah in Judaism’, “Judaism’s view of Jesus”, ‘Prophets and messengers in Islam’, ‘Jesus in Islam’, ‘False prophet’, ‘Last prophet’, ‘Prophet’, ‘Al-Masih ad-Dajjal’, ‘Messianism’], while splitting off the first two words (which are common across all results) to create “a Messiah or a prophet.” returns [‘Messiah’, ‘Messiah Prophet’, ‘False prophet’, ‘List of Jewish messiah claimants’, ‘Messiah in Judaism’, “Judaism’s view of Jesus”, ‘Last prophet’, ‘Jesus in Islam’, ‘Al-Masih ad-Dajjal’, ‘Messiah Part I’]

SBIR

  • 9:15 Sprint planning
  • Read the docs that Clay wants me to check out
  • Work on status report
    • Redid the summary as a list of accomplishments that I now need to flesh out
    • Added all the images to the figures directory

NOAA

  • Records Management Training

Phil4.26.21

GPT-Agents

  • Save GML – done!
  • Run multiple responses – done!
  • Save experiments – done!
  • I have a fancy world map! This one is 4kx4x so you can zoom in quite far. It started at 
"A short list of countries that are nearest to United States, separated by commas:"

And worked its way out from that (e.g. “A short list of countries that are nearest to Canada, separated by commas:”). It looks like it had not worked its way over to Africa yet, and there is no Greenland.

https://viztales.files.wordpress.com/2021/04/world.png

SBIR

  • Sprint review – done
  • Start writing second report – not started

Phil 4.22.21

GPT-Agents

  • Getting familiar with my NetworkxGraphing class. I’m going to create the graph first, and then persist it
  • My very first map of “A short list of countries that are nearest to United States, separated by commas:”
  • Some more progress. This is one layer down, with no culling for likely nearest neighbors:
https://viztales.files.wordpress.com/2021/04/image-9.png
  • Here’s a nicer version:
https://viztales.files.wordpress.com/2021/04/image-10.png
  • Presentation to the data science team

SBIR

  • 9:15 standup
  • See how the ensemble turned out with the truncated inputs
  • Start on using MDS data. Having some issues:
https://viztales.files.wordpress.com/2021/04/image-11.png

3:30 presentation of the COVID work to a *dead* room. Sigh. But I got some slides done!

Phil 4.21.21

Here are some silly coding conventions that I had to dig to find the answers for.

headers = {"User-Agent": "someone@someplace.com"}
page_title = "Exergaming"
yesterday = datetime.today() - timedelta(days=1)
last_week = yesterday - timedelta(days=7)
yester_s = yesterday.strftime("%Y%m%d")
lastw_s = last_week.strftime("%Y%m%d")
s = "https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/{}/daily/{}/{}".format(page_title, lastw_s, yester_s)
print(s)
r = requests.get(s, headers=headers)
  • Without that ‘headers’ element, you get a 404. Note that you do not need to spoof a browser header. This is all you need.

The second thing has to do with getting strings safely into databases

  • , when storing values with pymysql that involves strings that need to be escaped, you can now use parameter binding, which is very cool. BUT! Just because it uses ‘%s’, doesn’t mean that you use %d and %f. Here’s an example that uses strings, floats, and ints:
sql = "insert into gpt_maps.table_experiment (date, description, engine, max_tokens, temperature, top_p, logprobs, num_responses, presence_penalty, frequency_penalty)" \
      " values(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
values = (date_str, description, self.engine, self.max_tokens, self.temperature, self.top_p, self.logprobs, self.num_responses, self.presence_penalty, self.frequency_penalty)
msi.write_sql_values_get_row(sql, values)

And here’s the call that does the actual writing to the db:

def write_sql_values_get_row(self, sql:str, values:Tuple):
try:
with self.connection.cursor() as cursor:
cursor.execute(sql, values)
id = cursor.lastrowid
print("row id = {}".format(id))
return id
except pymysql.err.InternalError as e:
print("{}:\n\t{}".format(e, sql))
return -1

The Power of Scale for Parameter-Efficient Prompt Tuning

  • In this work, we explore “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3’s “few-shot” learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method “closes the gap” and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed “prefix tuning” of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

GPT-Agents

  • Start building out GraphToDB.
    • Use the Wikipedia to verify a node name exists before adding it
    • Check that a (directed) edge exists before adding it. If it does, increment the weight.
  • Digging into what metaphors are:
    • Understanding Figurative Language: From Metaphor to Idioms
      • This book examines how people understand utterances that are intended figuratively. Traditionally, figurative language such as metaphors and idioms has been considered derivative from more complex than ostensibly straightforward literal language. Glucksberg argues that figurative language involves the same kinds of linguistic and pragmatic operations that are used for ordinary, literal language. Glucksberg’s research in this book is concerned with ordinary language: expressions that are used in daily life, including conversations about everyday matters, newspaper and magazine articles, and the media. Metaphor is the major focus of the book. Idioms, however, are also treated comprehensively, as is the theory of conceptual metaphor in the context of how people understand both conventional and novel figurative expressions. A new theory of metaphor comprehension is put forward, and evaluated with respect to competing theories in linguistics and in psychology. The central tenet of the theory is that ordinary conversational metaphors are used to create new concepts and categories. This process is spontaneous and automatic. Metaphor is special only in the sense that these categories get their names from the best examples of the things they represent, and that these categories get their names from the best examples of those categories. Thus, the literal “shark” can be a metaphor for any vicious and predatory being, from unscrupulous salespeople to a murderous character in The Threepenny Opera. Because the same term, e.g.,”shark,” is used both for its literal referent and for the metaphorical category, as in “My lawyer is a shark,” we call it the dual-reference theory. The theory is then extended to two other domains: idioms and conceptual metaphors. The book presents the first comprehensive account of how people use and understand metaphors in everyday life
    • The contemporary theory of metaphor — now new and improved!
      • This paper outlines a multi-dimensional/multi-disciplinary framework for the study of metaphor. It expands on the cognitive linguistic approach to metaphor in language and thought by adding the dimension of communication, and it expands on the predominantly linguistic and psychological approaches by adding the discipline of social science. This creates a map of the field in which nine main areas of research can be distinguished and connected to each other in precise ways. It allows for renewed attention to the deliberate use of metaphor in communication, in contrast with non-deliberate use, and asks the question whether the interaction between deliberate and non-deliberate use of metaphor in specific social domains can contribute to an explanation of the discourse career of metaphor. The suggestion is made that metaphorical models in language, thought, and communication can be classified as official, contested, implicit, and emerging, which may offer new perspectives on the interaction between social, psychological, and linguistic properties and functions of metaphor in discourse.

SBIR

  • 10:00 Meeting
  • See how the new models are doing. If we are still not making progress, then go to a simpler interpolation model
    • It turns out that the frequency problem was actually a visualization bug! Here’s an example going from 20 input vectors to 500 output vectors using attention and 2 3,000 perceptron layers:
https://viztales.files.wordpress.com/2021/04/training_graphs0.png

Phil 4.20.21

Big news this afternoon:

Had an interesting talk with Aaron last night about using FB microtargeting as a mechanism to provide “deprogramming” content to folks that are going down conspiracy rabbit holes. We could also use the GPT-3

Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning Performance of GPT-2

  • Thinking aloud is an effective meta-cognitive strategy human reasoners apply to solve difficult problems. We suggest to improve the reasoning ability of pre-trained neural language models in a similar way, namely by expanding a task’s context with problem elaborations that are dynamically generated by the language model itself. Our main result is that dynamic problem elaboration significantly improves the zero-shot performance of GPT-2 in a deductive reasoning and natural language inference task: While the model uses a syntactic heuristic for predicting an answer, it is capable (to some degree) of generating reasoned additional context which facilitates the successful application of its heuristic. We explore different ways of generating elaborations, including fewshot learning, and find that their relative performance varies with the specific problem characteristics (such as problem difficulty). Moreover, the effectiveness of an elaboration can be explained in terms of the degree to which the elaboration semantically coheres with the corresponding problem. In particular, elaborations that are most faithful to the original problem description may boost accuracy by up to 24%.
  • OCTIS (Optimizing and Comparing Topic models Is Simple) aims at training, analyzing and comparing Topic Models, whose optimal hyper-parameters are estimated by means of a Bayesian Optimization approach.

GPT-Agents

  • Putting together a scratch file that gets page view data from Wikimedia. My plan is to use that value to determine the weight of the node
    • Stackoverflow post
    • This page documents the Pageview API (v1), a public API developed and maintained by the Wikimedia Foundation that serves analytical data about article pageviews of Wikipedia and its sister projects. With it, you can get pageview trends on specific articles or projects; filter by agent type or access method, and choose different time ranges and granularities; you can also get the most viewed articles of a certain project and timespan, and even check out the countries that visit a project the most. Have fun!
    • Wikimedia REST API: This API provides cacheable and straightforward access to Wikimedia content and data, in machine-readable formats.
  • 3:00 Meeting
  • Paper is pending ArXiv!

SBIR

  • Asked Rukan to save of some well trained models to play with

Phil 4.19.21

Performance Trailer Sales

Today I learned about the AoE “timezone”. Latest possible midnight is always in Baker Island, US Minor Outlying Islands

GPT-Agents

  • Workshop paper is done!
  • Starting to work seriously on mapping

SBIR

  • 2:00 meeting to figure out what to do for phase 2?
  • Sync up with Rukan and see how the loss function is going

Phil 4.15.21

https://twitter.com/jure/status/1382743017283493889

GPT-Agents

Dr Fauci:
	[0]: record shows that statistics shows that 340 million Americans died due to It!#COVID19 is @LamestreamMedia!REALLY @JoeBiden called the #coronavirus the #ChinaVirus created by Dr Fauci
	[1]: @WHO]]][[[https://t This is the #CommunistParty #ChinaVirus ;&amp continue to lie about part of their creation of a COVID-19 Covid19,vaccine was developed by Dr Fauci
	[2]: ,the study on the #Coronavirus response to the #COVID19 response to #ChinaVirus.#China Dr]]][[[https://t.co/sCiJ5h0mRz Fauci
	[3]: The #FakeNews #WHOLiedPeopleDied is #FakeNews ,calling the #CCPVirus source of #ChinaVirus it quickly spread and spread this propaganda and panic spread by the Dr?#FakeNewsMedia Fauci
	[4]: can� work for the #ChinaVirus.#ChineseVirus So]]][[[#WuhanVirus the virus was made from in,China in,China China :Coffee Fauci,President Dr Fauci
	[5]: This]]][[[https://t.co/ is what that if China is calling it the #WuhanFlu or #ChineseVirus The?#WuhanCoronavirus people need to know that Dr Fauci
	[6]: Coronavirus-China-China-Coronavirus-China !Joe!?!Virus-China Joe!Joe Joe!Joe!Joe!?!Fauci Dr!Fauci!Sleeping Fauci
	[7]: Covid-19]] Vaccine is cured for China?????Coronavirus #ChinaVirus?China??Virus @WHO Covid-19 Vaccine is one out of @WHO for just giving a vaccine by Dr Fauci
	[8]: https://t.co/rZ 14 Mar 2020 https://t.co/c4vWxnQw0a 13 Mar 2020 https://t.co/0dx0Rp7tCe Dr Fauci
	[9]: #ChinaVirus #ChineseVirus @JRubin]]][[[#WuhanVirus @BorisJohnson @TuckerCarlson @ChrisCuomo.Dr @POTUS @JoeBiden Dr.Dr Fauci

Donald Trump:
	[0]: qt- by #BorderObserver @JackPosobiec]]][[[https://t.co/v2G8m1sE2o @marklevinshow #ChinaVirus #coronavirus Donald Trump
	[1]: #CO #ChinaVirus qt-covid19-news-056 by This]]][[[#BorderObserver China.time pandemics,lockdowns,hiding,lying,lied ;&amp Donald's Trump
	[2]: can’t the spread of #coronavirus so they can spread this Thanks.pandemic for will?this take out #COVID?this #COVID19Pandemic #ChinaVirus #covid19 Donald Trump
	[3]: #China #coronavirus @POTUS]]][[[#COVID19 this is all of the #CoronaVirus #China that #ThousandsDied thousands could die from #ChinaVirus #Trump’s Donald Trump
	[4]: #LamestreamMedia!DISAPPE says #ChinaVirus If.spiking #FakeNewsMedia continue these claims ;&amp states use corrupt @POTUS,#MailinBallots to delay Donald.#Election2020 Trump
	[5]: More]]][[[https://t.co/JnUZQgL than more dead from the #China's response to #ChinaVirus.this trying to tell that more Americans died from Trump,Covid19 Donald Trump
	[6]: @YouTube There was proof that the world created the outbreak of a outbreak in #Coronavirus.America #WHOLiedPeopleDied]]][[[https://t.co/2eHj7tBqE Donald Trump
	[7]: #ChinaVirus for President but,Trump I am standing against #COVID19 in the The.U.S response to the He.@realDonaldTrump called the #WuhanVirus #ChinaVirus for the President Trump J Donald Trump
	[8]: How]]][[[https://t you will get from #ChinaVirus #Coronavirus who everyone wants to call it a #ChineseVirus #CCPVirus that #Chinese will pay for the #ChinaVirus Donald Trump
	[9]: #ChinaV @SenSchumer]]][[[https://t.co/uOc1PtLp2Z #DemocratsHateAmerica #CoronaVirusUpdates #ChinaVirus #CCPVirus Donald Trump
  • The Titan RTX box is still working on this dataset, while my GTX1070 box finished in an hour? Not sure what is going on
  • It looks like I have some mismatched versions of CUDA drivers/TF/Torch installed. Need to do a cleanup tomorrow:

SBIR

  • 9:15 Standup
  • 1:30 GPT Meeting

Phil 4.14.21

GPT Agents

  • Generated reversed version of the chinavirus corpora and am currently training a model. The Huggingface API has changed some, and it seems very slow?
  • Lit review

SBIR

  • Assisting Rukan
  • 10:00 Meeting

Book

  • 5:30 Editing with Michelle

JuryRoom

  • 7:00 Meeting

Phil 4.13.21

GPT Agents

  • Working on paper – barring the lit review, I’m at a first draft, I think
    • Still need to do the abstract! Done!
  • 3:00 Meeting today
    • Banged away on a lot of issues. I need to put together a lit review by tomorrow COB. The due date is the 19th, though!
  • I have a crazy idea for prompt generation. I think I’m going to train a model on text with the word order reversed. Then an ‘answer’ fed in to the reversed system should generate a set of prompts that should have a high chance of generating that answer, once re-reversed.
  • Fixed all the weird parsing issues for POS strings

Book

  • Need to set up meeting for April 30th or May 7th at 1:15 pm PT (4:15 pm ET)

SBIR

  • 9:15 Sprint scheduling