Monthly Archives: January 2022

Phil 1.31.2022

Sharpened Cosine Similarity (CosSim) is an alternative to Convolution for building features in neural networks. It performs as well as ConvNets with 10x-100x more parameters.

Current Stereotypes: A Little Fading, a Little Faking

Examined the possibility that social-desirability-tainted responses emerge in the study of stereotypes. 60 white male undergraduates were randomly assigned to 1 of 4 experimental conditions. Ss were asked to indicate how characteristic each of 22 adjective traits was of either “Americans” or “Negroes.” 1/2 the Ss responded in a rating situation in which they were presumably free to distort their responses. The remaining Ss responded under “bogus pipeline” conditions; i.e., they were led to believe that the experimenter had an accurate, distortion-free physiological measure of their attitudes, and were asked to predict that measure. Results support the expectation that the stereotype ascribed to Negroes would be more favorable under rating than under bogus pipeline conditions. Americans were more favorably stereotyped under bogus pipeline than under rating conditions. A number of explanations for these results are discussed, and consideration is given to the relationship between verbally expressed attitudes and other, overt, behavior.

Social physics

Recent decades have seen a rise in the use of physics methods to study different societal phenomena. This development has been due to physicists venturing outside of their traditional domains of interest, but also due to scientists from other disciplines taking from physics the methods that have proven so successful throughout the 19th and the 20th century. Here we characterise the field with the term ‘social physics’ and pay our respect to intellectual mavericks who nurtured it to maturity. We do so by reviewing the current state of the art. Starting with a set of topics that are at the heart of modern human societies, we review research dedicated to urban development and traffic, the functioning of financial markets, cooperation as the basis for our evolutionary success, the structure of social networks, and the integration of intelligent machines into these networks. We then shift our attention to a set of topics that explore potential threats to society. These include criminal behaviour, large-scale migration, epidemics, environmental challenges, and climate change. We end the coverage of each topic with promising directions for future research. Based on this, we conclude that the future for social physics is bright. Physicists studying societal phenomena are no longer a curiosity, but rather a force to be reckoned with. Notwithstanding, it remains of the utmost importance that we continue to foster constructive dialogue and mutual respect at the interfaces of different scientific disciplines.

This is really clever: How does fake news spread? Understanding pathways of disinformation spread through APIs

What are the pathways for spreading disinformation on social media platforms? This article addresses this question by collecting, categorizing, and situating an extensive body of research on how application programming interfaces (APIs) provided by social media platforms facilitate the spread of disinformation. We first examine the landscape of official social media APIs, then perform quantitative research on the open‐source code repositories GitHub and GitLab to understand the usage patterns of these APIs. By inspecting the code repositories, we classify developers’ usage of the APIs as official and unofficial, and further develop a four‐stage framework characterizing pathways for spreading disinformation on social media platforms. We further highlight how the stages in the framework were activated during the 2016 US Presidential Elections, before providing policy re-commendations for issues relating to access to APIs, algorithmic content, advertisements, and suggest rapid response to coordinate campaigns, development of collaborative, and participatory approaches as well as government stewardship in the regulation of social media platforms.

The Wikipedia folks have produced a very clear Precision/Recall diagram!

https://en.wikipedia.org/wiki/F-score#/media/File:Precisionrecall.svg

SBIRs

Slides for demo
9:00 Demos
2:00 Meeting with Rukan
Natural Language Processing with Transformers Book
- Train transformers from scratch and learn how to scale to multiple GPUs and distributed environments

Book

More work on intro

GPT Agents

Work on Twitter queries

Phil 1.28.2022

https://twitter.com/Nils_Reimers/status/1487014195568775173

SBIRs

Had a nice chat with Rukan about how to partition models. Which made me think about RCS again. Maybe make a pyRCS library? I have the code written already, just need to pull it out of the PyBullet project
Finished the LAIC roadmap paper
Need to do some more book new chapter one still

Phil 1.27.2022

Dump and shop before 4:00

America Has Split, and It’s Now in ‘Very Dangerous Territory’

Polarization has become a force that feeds on itself, gaining strength from the hostility it generates, finding sustenance on both the left and the right. A series of recent analyses reveals the destructive power of polarization across the American political system.

GPT Agents

I think OpenAI’s embeddings may have gone public – Yes!
Here’s an example of calling the embedding

oai = OpenAIComms()
result = oai.get_embedding('hello, world', 'text-similarity-ada-001')
print(result)

Here’s the result

[0.012463847, 0.02531687, -0.0059246803, 0.022367332, 0.037196957, 0.013784995, 0.019438276, -0.0075837956, 0.012187328, -0.014604311, 0.013569924, -0.022551678, 0.025398802, -0.015515801, -0.005586712, -0.04231768, -0.046905853, -0.025583148, 0.006472598, -0.0036203535, 0.036684882, 

--------------- Lines removed because we don't need to see every embedding ----

-0.015310971, 0.00073034357, -0.013856685, -0.00026291728, -0.0049056555, 0.024436105, -0.0086181825, -0.023248097, 0.008290456, 0.012443365, -0.020278076, 0.024169827, -0.012361433, -0.057515997, 0.045103356, 0.04752034, -0.008510647, -0.05014215, 0.012279501, 0.013979582, 0.05182175, 0.03209671, -0.008920305]

Also, you can store JSON object in MySql/MariaDB as text. Here’s an example table:

Here’s how you insert, create a view, and then select an element from that view

insert into table_json (embedding) values ('{"foo":12, "bar":2}');

create or replace view view_json as
    select id, json_value(embedding, '$.foo') as foo, json_value(embedding, '$.bar') as bar from table_json;

select id, foo from view_json;

Which results in this:

SBIRs

Need to create a story for GSAW deck
9:15 standup
LAIC prep
11:00 LAIC discussion
Aligning Language Models to Follow Instructions
- We’ve trained language models that are much better at following user intentions than GPT-3 while also making them more truthful and less toxic, using techniques developed through our alignment research. These InstructGPT models, which are trained with humans in the loop, are now deployed as the default language models on our API.

Book?

Phil 1.26.2022

GPT Agents

Had a long and winding talk about quality in Twitter data and whether using thread is a way to increase that. Shimei’s thought is that it will bias the data towards a different population. I think that’s reasonable, but I’m not sure that matters as long as you specify what population you’re polling.
Got the recent conversation search working
Working on historical queries
- Getting historical Tweets using the v2 full-archive search endpoint

SBIRs

Slides for GSAW

Phil 1.25.2022

The First Workshop on Intelligent and Interactive Writing Assistants

We invite submissions from the NLP and HCI communities as well as industry practitioners and professional writers on the topic of intelligent writing assistants: those that discuss innovations in building, improving, and evaluating intelligent and interactive writing assistants.
Specific topics include, but not limited to:
- Combining NLP techniques (e.g. style transfer, text planning, controllability) with interaction paradigms between users and writing assistants (e.g. interfaces, iterative processes, feedback), such as a formality style transfer system for revising professional communications
- Assistance on different stages of the writing process (e.g. planning, revising), different types of writing (e.g. expository, persuasive), and different applications (e.g. journalism, fiction)
- Evaluation methodologies for writing assistants, writing process, and resultant text
- Addressing underrepresentation of languages, types of writers (e.g. vernacular variations), and writing tasks for targeted writing assistance (note that for non-English systems, we request that the figures and examples be translated into English prior to review)
- Writing assistant ownership issues, including legal issues with copyright and psychological sense of ownership
- Practical challenges for building real-world systems such as Grammarly and WordTune (e.g. latency, near-perfect quality, personalization, and evolution of language)
- User studies or ethnographic studies of writers who use writing assistants
- Demonstration of simple prototypes of intelligent interfaces or design sketches

Book

Rewriting the first chapter around the concept that “belief is a place”

SBIRs

9:15 Stand up
Helped Aaron set up his DB, more today
Meeting with Rukan
Do RoE map. Add nodes
- The Enemy (“The enemy is”)
- Fire Back (“If someone shoots at you”)
- Masculine (“Be tough”)
- Lawless (“Whatever it takes”)
- Self Protect (“First, defend yourself”)
- Kill the Enemy (“Don’t be complicated”)
- Tactics (“Have a plan and execute it”)
- Proportional (“Don’t escalate”)
- Responsible (“Do the right thing”)
- Independence (“Don’t just follow orders”)
- Civilians (“What to do with non-combatants”)
- Careful (“Don’t get into trouble”)
- Our Guys (“We come first”)
- Hold Fire (“Do not fire unless absolutely necessary”)
- Ethical (“What is the right thing to do?”)
- Duty (“What must we do?”)
- Fire First (“Shoot first, dammit”)
Pretty happy with this:

GPT Agents

3:30 Meeting

Phil 1.24.2022

Cinematic epistemology. It’s a perfect term

https://twitter.com/normative/status/1485313085321719813

SBIRs

Redo the rules of engagement map. Nope. Too much contractor noise to concentrate

GPT Agents

Still can’t access anything on Twitter. Tried setting the os variable BEARER_TOKEN, which didn’t help
It works! I had to:
1. Enable OAuth 2.0
2. Regenerate my bearer token
And now we can get

https://twitter.com/philfeld/status/1484131517878181891

Book

Trying to fix dense, overly complicated text

Phil 1.21.2022

Every morning, I get up, put something mellow on the stereo and do my morning things. So, like every morning I push the power button on the amp, there’s a slight scratching sound, and… nothing. Blown fuse. Sigh. Nothing like looking for a 5mf 8a 125v glass fuse at 5:00am

Spanish

SBIRs

Update the script player
- Hurray! I haven’t broken it!
- Got the basics working again. Need to think about what to do for the charts
Help Aaron with the clustering?

GPT Agents

Start poking at the Twitter API
developer.twitter.com/en/docs/tutorials/analyze-past-conversations
developer.twitter.com/en/docs/tutorials/getting-historical-tweets-using-the-full-archive-search-endpoint
No luck with anything! Just get “unauthorized”:

Book

Read through and see what’s good, bad, missing, awful

Phil 1.20.2022

Tasks

Spanish!

SBIRs

Aim to finish commenting code today
Spent too much time making a better placeholder callback:

def implement_me(self):
    """
    A callback to point to when you you don't have a method ready. 
    Prints "implement me!" to the output and
    an abbreviated version of the call stack to the console
    :return:
    """
    #self.dprint("Implement me!")
    self.dp.dprint("Implement me! (see console for call stack)")
    fi:inspect.FrameInfo
    count = 0
    for fi in inspect.stack():
        filename = re.split(r"(/)|(\\)", fi.filename)
        print("Call stack[{}] = {}() (line {} in {})".format(count, fi.function, fi.lineno, filename[-1]))
        count += 1

9:15 Standup
Aaron has topic clustering working:

GPT Agents

Put together a doc with existing and new prompts (“Vaccines are”, “Vaccines are a”, “I/We/<other groups> think that vaccines are”)
I need to do some research on if the API can really do this, but I’d like to make the new corpus of threaded tweets that are pulled because they mention general terms like “COVID”, “VIRUS”, and “VACCINE”, then train the models and drill down. I still like the idea of training monthly models starting in Nov 2019 to present.
It does look like threads have more engagement and would be harder for bots to generate.
- Change in Threads on Twitter Regarding Influenza, Vaccines, and Vaccination During the COVID-19 Pandemic: Artificial Intelligence–Based Infodemiology Study
- Automatically Identifying Fake News in Popular Twitter Threads
I learned how to make a thread so I can look for it:

Phil 1.19.2021

Todo:

Doctor! Couldn’t connect a call, so requested an appt via the form, which is kind of lame because there is no way to specify the reason for the visit
Spanish!
Order light – done

SBIRs

Putting in comments and cleaning code

GPT-Agents

Put together a list of prompts in a Google Doc and distribute
Add embeddings to the GPT comms class. Doesn’t seem to be available? Applied for access
Tweaked OpenAIComms to have one response with higher penalties and temp

Book

Finish chapter?

Phil 1.18.2022

Ran out of space on the C drive. Seeing if moving the cache will help: docs.microsoft.com/en-us/troubleshoot/windows-client/networking/change-csc-folder-location-with-cachelocation-registry

GPT Agents

Got the paper submitted last Saturday! April 7 is when we’ll find out
4:30 Meeting. We worked on what to do next. We are going to look at the monthly models from 2020 and see how their responses move with respect to embedding space and the same prompt. The first step is to collect the prompts we used from the paper and see if we want to add any new ones

SBIRs

9:15 Sprint planning. Need to write up some stories
Cybersecurity training – done!
Looking for related corpora at Gutenberg

Book

Continue to work on conspiracy chapter

Phil 1.14.2022

GPT Agents

For the three star rating category: :
- Get the total and add it to the Dict – done
- compute an error metric (L1 difference) for the estimated proportion of positive reviews for “gray bars” (GPT with the reviews containing the keywords held out) vs the ground truth “blue bars” . Report this error metric in a table (performance of our method). – done
- simulate the empirical count baseline method in the low data scenario: draw a small number of reviews containing the keyword, let’s say 6 of them). Compute the error metric (L1 difference) for the empirical counts baseline, computed on this subset, vs the ground truth “blue bars”. Repeat this many times (say, 10,000 times). Report the average error metric in a table (performance of the baseline method). – done
Finished the data extraction. Now I have to make spreadsheets and charts.
Very happy with this:

Fix the TODOs – Done
The last thing to do is fill out the ethics form and submit

SBIRs

Add story for paper and clone for Aaron – Done

Phil 1.13.2022

Spanish!

Tim!

SBIRs

Standup was cancelled for today

GPT Agents

Made good progress yesterday
Continue on interpolation section. Set up the pretrained average stars in a table and drop the figure. Show the bar chart and Pearson’s
Add comparison of GPT and GPT(v). Chart? Table? And show Pearson’s
1:00 – 2:30 Meeting
- Good progress. I need to do for the three star rating category:
  - compute an error metric (L1 difference) for the estimated proportion of positive reviews for “gray bars” (GPT with the reviews containing the keywords held out) vs the ground truth “blue bars” . Report this error metric in a table (performance of our method).
  - simulate the empirical count baseline method in the low data scenario: draw a small number of reviews containing the keyword, let’s say 6 of them). Compute the error metric (L1 difference) for the empirical counts baseline, computed on this subset, vs the ground truth “blue bars”. Repeat this many times (say, 10,000 times). Report the average error metric in a table (performance of the baseline method).
- Finished the data extraction. Now I have to make spreadsheets and charts

Phil 1.12.2022

Spanish!

GPT Agents

I think I want to put the results into three sections: 1) Memorization, or the learning of the meta-wrapper, 2) Interpolation, or how the model re-creates correct reviews 3) Extrapolation, how the model creates new (zero shot) reviews
Add a section to the beginning of the methods section stating that all finetuning was done on the Huggingface GPT-2 117M parameter model.
- For speed (easier to produce a model for comparison)
- For the environment
- To show that state-of-the art insight into TLMs does not require building large models

Phil 1.11.2022

Dentist!

Book

Did a little more tweaking on the visualization to make screenshots more legible:

I also can make the philosophy map this way, so that part of the chapter can set up this technique:

JuryRoom

Discussion with Panos and Aaron yesterday. Not sure where things are going, but definitely nearing the end of this stage (Huri Whakato)

SBIRs

9:15 Standup

GPT Agents

Continue working on paper
I’m really liking the Computer-Assisted Keyword and Document Set Discovery paper:
- Our algorithm is human-led and computer-assisted rather than fully automated; it is related to semi supervised learning (Zhu and Goldberg 2009).
Found this in the Scholar list of citing papers: Separating the wheat from the chaff: A topic and keyword-based procedure for identifying research-relevant text. It’s in Poetics, which actually seems like a potential venue for this paper if it gets rejected
I think that a good way to show how this matters is to use TTestIndPower to calculate the minimum sample size to determine if the populations are different as described in this tutorial

Phil 1.10.2022

SBIRs

Write up the points from the discussion with Aaron last Friday. I think it will make a much better direction than trying to figure out how to automate the current manual approach
Continue code cleanup. There is still something that makes the radius of a MoveableNode grow wrong when the item count is incremented (maybe fixed? Line 34 of in MapData.Maptopic.adjust_force_node())

GPT Agents

Start writing the paper and see what shakes out
Ask to reschedule Tuesday’s meeting – done

Book

I got a visualization that I like. Now I need to rewrite the chapter a bit around it. I think just the main visualization should be ok, with maybe different perspectives?

Maybe also generate a “philosophy” terrain? I’d need to have some code to handle a rollover for the z value

viztales

Dimension reduction, State, Orientation, and Speed

Monthly Archives: January 2022

Phil 1.31.2022

Phil 1.28.2022

Phil 1.27.2022

Phil 1.26.2022

Phil 1.25.2022

Phil 1.24.2022

Phil 1.21.2022

Phil 1.20.2022

Phil 1.19.2021

Phil 1.18.2022

Phil 1.14.2022

Phil 1.13.2022

Phil 1.12.2022

Phil 1.11.2022

Phil 1.10.2022