Category Archives: Phil

Phil 5.12.21

SBIR

9:30 Meeting with Rukan to see what our results are from the overnight runs
10:00 Group meeting. Need to discuss proposal and share Overleaf template

GPT Agents

Still filling up the Yelp db. Currently at around 500,000 reviews
Language map – Send a copy to Andreea when done. This one is based on the same repeated prompt, because I screwed up the template code

https://viztales.com/wp-content/uploads/2021/05/image-2.png

Language map using seeds of English, Chinese, and Samoan

https://viztales.com/wp-content/uploads/2021/05/language_3.png

Philosophy Map using seeds of Utilitarianism and Hedonism

https://viztales.com/wp-content/uploads/2021/05/philosophy_1.png

Food Map using seeds of Pasta, Hamburger, Lettuce, Avocado and Cheese

https://viztales.com/wp-content/uploads/2021/05/food_3-1.png

Conspiracy theories seeded with “vaccines cause autism”

https://viztales.com/wp-content/uploads/2021/05/conspiracy_1-2.png

JuryRoom

7:00 Meeting

Phil 5.11.21

Deep Learning applications for COVID-19

This survey explores how Deep Learning has battled the COVID-19 pandemic and provides directions for future research on COVID-19. We cover Deep Learning applications in Natural Language Processing, Computer Vision, Life Sciences, and Epidemiology. We describe how each of these applications vary with the availability of big data and how learning tasks are constructed. We begin by evaluating the current state of Deep Learning and conclude with key limitations of Deep Learning for COVID-19 applications. These limitations include Interpretability, Generalization Metrics, Learning from Limited Labeled Data, and Data Privacy. Natural Language Processing applications include mining COVID-19 research for Information Retrieval and Question Answering, as well as Misinformation Detection, and Public Sentiment Analysis. Computer Vision applications cover Medical Image Analysis, Ambient Intelligence, and Vision-based Robotics. Within Life Sciences, our survey looks at how Deep Learning can be applied to Precision Diagnostics, Protein Structure Prediction, and Drug Repurposing. Deep Learning has additionally been utilized in Spread Forecasting for Epidemiology. Our literature review has found many examples of Deep Learning systems to fight COVID-19. We hope that this survey will help accelerate the use of Deep Learning for COVID-19 research.

Word embeddings quantify 100 years of gender and ethnic stereotypes

Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts—e.g., the women’s movement in the 1960s and Asian immigration into the United States—and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science

How to make a racist AI without really trying

SBIR

Sprint planning – I’m going to be busy
More work with Rukan. We’re going to focus on some simple spikes
- The simple spikes look great. We’re going to do a sensitivity analysis on the MDS data now
Got my fancy query working

create or replace view view_combined as
    select distinct e.id, e.name, e.description, s1.value as dimension_size, s2.value as layers,
                    r1.value as avg_cos_loss, r2.value as avg_l1_loss from
        table_experiment e
            join table_settings s1 on e.id = s1.experiment_id and s1.name = 'dimension_size'
            join table_settings s2 on e.id = s2.experiment_id and s2.name = 'layers'
            join table_results r1 on e.id = r1.experiment_id and r1.name = 'avg cosine loss'
            join table_results r2 on e.id = r2.experiment_id and r2.name = 'avg l1 loss';
select * from view_combined where id = 100;

GPT-Agents

Parsing Yelp
Pew Center: Search Results For: covid
3:00 Meeting

Phil 5.10.21

3:00 Dentist

GPT-Agents

Yelp parser
Try maps of food, fashion(!), movies, books, politicians, etc?
4:30 meeting with Andreea

SBIR

Make slides for sprint review
Sprint review

Phil 5.6.21

Get trailer!

https://twitter.com/__kolesnikov__/status/1390006566796107777

GPT-Agents

Posted some pix to the OpenAI slack channel. Let’s see if there is any response. I should also post to r/dataisbeautiful

SBIR

9:15 standup
Talk to Rukan about this and this
GELUs full form is GAUSSIAN ERROR LINEAR UNIT

1:30 Data Science tag-up

Book

More editing

Phil 5.5.21

GPT-Agents

Update and submit paper (ArXiv and SocialSens) – done!

SBIR

Phase 2 proposal kickoff
Weekly tagup
AI/ML tagup (mention paper acceptance)

Book

Continue rolling in changes

JuryRoom

Worked on the intro to Pryvank’s paper
7:00 Meeting

Phil 5.4.21

Amazing Animated Star Wars Fighter Ships - Best Animations — May the Fourth be with you and all that

See if I can get this trailer – done!

SBIR

9:15 status meeting. It looks like I’ll be working on the phase 2 proposal for the rest of the week?
8:45 pre-standup with Rukan to see how things are going
- Looks like we are going to improve our experiment pipeline since we seem to be loosing data. Rukan is looking into what it takes to get MySql installed on his instance

GPT Agents

3:00 Meeting
- https://ai.stanford.edu/~amaas/data/sentiment/ train a model and get the distributions
- https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/ppc.pdf
- https://www.yelp.com/dataset
- Working to identify bias in the data and mitigate bias in the system
- A list of countries that share a border with {}, separated by commas
I still haven’t entirely fixed my UTF 8 problem
Start writing up something about the belief maps to add to the chess paper, and maybe as an overall article
- Country counts (150 vs 195 with no false positives, excluding six prompt countries, 76% coverage) Missing countries include Guadalupe, Guyana, Israel, Jordan, Lebanon, Madagascar, Liberia, Micronesia, Niger, Paraguay, Senegal, Sri Lanka, Tunisia, Uruguay, Venezuela, and Yemen
- Religion counts?
- New favorite map:

https://viztales.com/wp-content/uploads/2021/05/world_4.png

Central America insert

Compared with actual map

Credit: By Cacahuate, amendments by Joelf – Own work based on the blank world map, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=22746265

Book

Start working on edits
Send Chris email. Done!

Phil 5.3.21

Call about trailer! Sold, dammit

GPT-Agents

10:00 Meeting with Antonio. Nice discussion on moving forward. He suggests using the mapper to create a meta-knowledge graphing tool that works along the lines of the Third Author approach, where an expert can influence and interactively edit the creation of the maps
Worked on my UTF-8 problem, but it’s still not fixed
New Religion Map

https://viztales.com/wp-content/uploads/2021/05/religion_3.png

New World Map

https://viztales.com/wp-content/uploads/2021/05/world_3.png

Good meeting with Andreea about metaphors. It was interrupted by some kind of alert, so we’ll finish next week.

SBIR

Went over Rukan’s progress, which is pretty nice, particularly for ensembles:

https://viztales.com/wp-content/uploads/2021/05/image.png

He’s going to do a big run tonight to see how more training helps
We also talked about adding multihead attention to the middle layers. We may do an experiment on that

Phil 4.30.21

April is almost over! And after a couple of summer-like days, we’re back to seasonal

GPT-Agents

3:30 Meeting
Working on a first pass at the religion map. Asking the GPT-3 for “A short list of religions:” returned a response that I regex’d into the following:

["Christianity", "Islam", "Hinduism", "Buddhism", "Judaism", "Sikhism", "Confucianism", "Shintoism", "Taoism", "Zoroastrianism", "Jainism", "Wicca"]

Next was getting the prompt:

"A short list of the religions that are closest to {}:"

working with the model. There are more varied responses, so the parsing is a little more complex. The way that I’m currently working is by having the model return ten (rather than 3) responses that I then organize:
- The first element is to look for a similar Wikipedia page, which is done as follows:

source_term = "Seax Wicca"
page_list = wikipedia.search(source_term, suggestion=False)
print(page_list)
closest_list = dl.get_close_matches(source_term, page_list)
print(closest_list)

d = {}
total = 0
for p in closest_list:
    views = get_wiki_pageviews(p)
    d[p] = views
    total += views

for k in d:
    print("'{}' = {:,} views ({:.1f}%)".format(k, d[k], d[k]/total*100))

Which generates:

['Seax-Wica', 'Magical tools in Wicca', 'Raymond Buckland', 'Wicca', 'Wheel of the Year', 'Altar (Wicca)', 'History of Wicca', 'Triple Goddess (Neopaganism)', 'Horned God', 'Faery Wicca']
['Seax-Wica', 'Wicca', 'Faery Wicca']
'Seax-Wica' = 194 views (0.7%)
'Wicca' = 25,755 views (97.7%)
'Faery Wicca' = 417 views (1.6%)

I think for the time being, I’ll just pull the first one (closest_list[0]) and see what that looks like, though I could also use all close matches or the one with the largest page views
Rolling all the changes into GraphToDB. Urk.
I had to tweak out some junk text (maybe UTF-8 issues?) Here’s an example: = “Baháʼí Faith” is being rendered as

It looks like this could be a solution: stackoverflow.com/questions/4299675/python-script-to-convert-from-utf-8-to-ascii
But here’s a religion map!

https://viztales.com/wp-content/uploads/2021/04/religion_1.png

Here’s a bigger one:

https://viztales.com/wp-content/uploads/2021/04/religion_2.png

Book

2:00 Meeting with Michelle. Nice progress!
5:00 Meeting With Chris C. That was a really nice chat!
- Need to write up a paragraph about me and my work.

Phil 4.29.21

SBIR

Finish writing and send on for review/submission
- Summary – done!
- Incorporate Peter & Loren’s contributions – done!
- Looks good to submit tomorrow
- 9:15 standup
- 10:30 prep meeting
- 3:00 intro meeting
  - Went nowhere. More than anything, this reminded me of a Defense with a hostile faculty lobbing hand grenades. In my list of management types, this guy was an assassin/power broker

GPT Agents

Got a ping from Ashwag on her team’s work, which was nice
- Did some cleanup editing on the paper
Work on religion map if I get all the SBIR work done in time. Nope – tomorrow

Phil 4.28.21

https://www.nytimes.com/interactive/2020/world/asia/india-coronavirus-cases.html?action=click&module=Top%20Stories&pgtype=Homepage

GPT-Agents

Spent some time this morning adjusting the code so that experiment-specific regexes can be created and stored in the db. Also played around some with trying to figure out how to choose the best Wikipedia page(s?)

SBIR

Working on the status report. Mostly done. Need to do the summary paragraph
2:00 weekly meeting. Asked Peter and Loren to supply content by COB Thursday

JuryRoom

7:00 Meeting

Phil 4.27.21

GPT Agents

Did a little housecleaning since I’m going to have to work on the status report for the rest of the week. I’ve moved the experiment-specific code into its own method and added a “node_type”
Updated the ICWSM paper to include the NSF grant info
3:00 Meeting
- Spent a lot of time working on probes for belief systems such as white supremacy. It’s much more complex than countries. The parser needs(?) to be able to:
  - Split on \n as well as [,:;]
  - Ignore leading numbers
  - Match on earlier sections of each text (maybe just cut everything else after n words?)
  - Do a more forgiving match on the wikipedia. For example, the probe: “The great religions are all characterized by” returns a list that contains “Belief in a Messiah or a prophet.” Sending that to the wikipedia returns [‘Messiah’, ‘Messiah in Judaism’, “Judaism’s view of Jesus”, ‘Prophets and messengers in Islam’, ‘Jesus in Islam’, ‘False prophet’, ‘Last prophet’, ‘Prophet’, ‘Al-Masih ad-Dajjal’, ‘Messianism’], while splitting off the first two words (which are common across all results) to create “a Messiah or a prophet.” returns [‘Messiah’, ‘Messiah Prophet’, ‘False prophet’, ‘List of Jewish messiah claimants’, ‘Messiah in Judaism’, “Judaism’s view of Jesus”, ‘Last prophet’, ‘Jesus in Islam’, ‘Al-Masih ad-Dajjal’, ‘Messiah Part I’]

SBIR

9:15 Sprint planning
Read the docs that Clay wants me to check out
Work on status report
- Redid the summary as a list of accomplishments that I now need to flesh out
- Added all the images to the figures directory

NOAA

Records Management Training

Phil4.26.21

GPT-Agents

Save GML – done!
Run multiple responses – done!
Save experiments – done!
I have a fancy world map! This one is 4kx4x so you can zoom in quite far. It started at

"A short list of countries that are nearest to United States, separated by commas:"

And worked its way out from that (e.g. “A short list of countries that are nearest to Canada, separated by commas:”). It looks like it had not worked its way over to Africa yet, and there is no Greenland.

https://viztales.com/wp-content/uploads/2021/04/world.png

SBIR

Sprint review – done
Start writing second report – not started

Phil 4.23.21

10:00 Chat with Wajanat

Meeting with Antonio?

SBIR

Sent Rukan a plan for next steps

Book

2:00 Meeting with Michelle

GPT-Agents

3:30 Meeting
Maps are progressing nicely!

https://viztales.com/wp-content/uploads/2021/04/image-12.png

3:30 Meeting

Talk about soft prompts

Phil 4.22.21

GPT-Agents

Getting familiar with my NetworkxGraphing class. I’m going to create the graph first, and then persist it
My very first map of “A short list of countries that are nearest to United States, separated by commas:”

Some more progress. This is one layer down, with no culling for likely nearest neighbors:

https://viztales.com/wp-content/uploads/2021/04/image-9.png

Here’s a nicer version:

https://viztales.com/wp-content/uploads/2021/04/image-10.png

Presentation to the data science team

SBIR

9:15 standup
See how the ensemble turned out with the truncated inputs
Start on using MDS data. Having some issues:

https://viztales.com/wp-content/uploads/2021/04/image-11.png

3:30 presentation of the COVID work to a *dead* room. Sigh. But I got some slides done!

Phil 4.21.21

Here are some silly coding conventions that I had to dig to find the answers for.

First, when making a call to https://wikimedia.org/api/rest_v1 using requests, you need to set the call like this:

headers = {"User-Agent": "someone@someplace.com"}
page_title = "Exergaming"
yesterday = datetime.today() - timedelta(days=1)
last_week = yesterday - timedelta(days=7)
yester_s = yesterday.strftime("%Y%m%d")
lastw_s = last_week.strftime("%Y%m%d")
s = "https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/{}/daily/{}/{}".format(page_title, lastw_s, yester_s)
print(s)
r = requests.get(s, headers=headers)

Without that ‘headers’ element, you get a 404. Note that you do not need to spoof a browser header. This is all you need.

The second thing has to do with getting strings safely into databases

, when storing values with pymysql that involves strings that need to be escaped, you can now use parameter binding, which is very cool. BUT! Just because it uses ‘%s’, doesn’t mean that you use %d and %f. Here’s an example that uses strings, floats, and ints:

sql = "insert into gpt_maps.table_experiment (date, description, engine, max_tokens, temperature, top_p, logprobs, num_responses, presence_penalty, frequency_penalty)" \
      " values(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
values = (date_str, description, self.engine, self.max_tokens, self.temperature, self.top_p, self.logprobs, self.num_responses, self.presence_penalty, self.frequency_penalty)
msi.write_sql_values_get_row(sql, values)

And here’s the call that does the actual writing to the db:

def write_sql_values_get_row(self, sql:str, values:Tuple):
    try:
        with self.connection.cursor() as cursor:
            cursor.execute(sql, values)
            id = cursor.lastrowid
            print("row id = {}".format(id))
            return id
    except pymysql.err.InternalError as e:
        print("{}:\n\t{}".format(e, sql))
        return -1

The Power of Scale for Parameter-Efficient Prompt Tuning

In this work, we explore “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3’s “few-shot” learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method “closes the gap” and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed “prefix tuning” of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

GPT-Agents

Start building out GraphToDB.
- Use the Wikipedia to verify a node name exists before adding it
- Check that a (directed) edge exists before adding it. If it does, increment the weight.
Digging into what metaphors are:
- Understanding Figurative Language: From Metaphor to Idioms
  - This book examines how people understand utterances that are intended figuratively. Traditionally, figurative language such as metaphors and idioms has been considered derivative from more complex than ostensibly straightforward literal language. Glucksberg argues that figurative language involves the same kinds of linguistic and pragmatic operations that are used for ordinary, literal language. Glucksberg’s research in this book is concerned with ordinary language: expressions that are used in daily life, including conversations about everyday matters, newspaper and magazine articles, and the media. Metaphor is the major focus of the book. Idioms, however, are also treated comprehensively, as is the theory of conceptual metaphor in the context of how people understand both conventional and novel figurative expressions. A new theory of metaphor comprehension is put forward, and evaluated with respect to competing theories in linguistics and in psychology. The central tenet of the theory is that ordinary conversational metaphors are used to create new concepts and categories. This process is spontaneous and automatic. Metaphor is special only in the sense that these categories get their names from the best examples of the things they represent, and that these categories get their names from the best examples of those categories. Thus, the literal “shark” can be a metaphor for any vicious and predatory being, from unscrupulous salespeople to a murderous character in The Threepenny Opera. Because the same term, e.g.,”shark,” is used both for its literal referent and for the metaphorical category, as in “My lawyer is a shark,” we call it the dual-reference theory. The theory is then extended to two other domains: idioms and conceptual metaphors. The book presents the first comprehensive account of how people use and understand metaphors in everyday life
- The contemporary theory of metaphor — now new and improved!
  - This paper outlines a multi-dimensional/multi-disciplinary framework for the study of metaphor. It expands on the cognitive linguistic approach to metaphor in language and thought by adding the dimension of communication, and it expands on the predominantly linguistic and psychological approaches by adding the discipline of social science. This creates a map of the field in which nine main areas of research can be distinguished and connected to each other in precise ways. It allows for renewed attention to the deliberate use of metaphor in communication, in contrast with non-deliberate use, and asks the question whether the interaction between deliberate and non-deliberate use of metaphor in specific social domains can contribute to an explanation of the discourse career of metaphor. The suggestion is made that metaphorical models in language, thought, and communication can be classified as official, contested, implicit, and emerging, which may offer new perspectives on the interaction between social, psychological, and linguistic properties and functions of metaphor in discourse.

SBIR

10:00 Meeting
See how the new models are doing. If we are still not making progress, then go to a simpler interpolation model
- It turns out that the frequency problem was actually a visualization bug! Here’s an example going from 20 input vectors to 500 output vectors using attention and 2 3,000 perceptron layers: