Category Archives: Phil

Phil 1.24.2022

Cinematic epistemology. It’s a perfect term

https://twitter.com/normative/status/1485313085321719813

SBIRs

Redo the rules of engagement map. Nope. Too much contractor noise to concentrate

GPT Agents

Still can’t access anything on Twitter. Tried setting the os variable BEARER_TOKEN, which didn’t help
It works! I had to:
1. Enable OAuth 2.0
2. Regenerate my bearer token
And now we can get

https://twitter.com/philfeld/status/1484131517878181891

Book

Trying to fix dense, overly complicated text

Phil 1.21.2022

Every morning, I get up, put something mellow on the stereo and do my morning things. So, like every morning I push the power button on the amp, there’s a slight scratching sound, and… nothing. Blown fuse. Sigh. Nothing like looking for a 5mf 8a 125v glass fuse at 5:00am

Spanish

SBIRs

Update the script player
- Hurray! I haven’t broken it!
- Got the basics working again. Need to think about what to do for the charts
Help Aaron with the clustering?

GPT Agents

Start poking at the Twitter API
developer.twitter.com/en/docs/tutorials/analyze-past-conversations
developer.twitter.com/en/docs/tutorials/getting-historical-tweets-using-the-full-archive-search-endpoint
No luck with anything! Just get “unauthorized”:

Book

Read through and see what’s good, bad, missing, awful

Phil 1.20.2022

Tasks

Spanish!

SBIRs

Aim to finish commenting code today
Spent too much time making a better placeholder callback:

def implement_me(self):
    """
    A callback to point to when you you don't have a method ready. 
    Prints "implement me!" to the output and
    an abbreviated version of the call stack to the console
    :return:
    """
    #self.dprint("Implement me!")
    self.dp.dprint("Implement me! (see console for call stack)")
    fi:inspect.FrameInfo
    count = 0
    for fi in inspect.stack():
        filename = re.split(r"(/)|(\\)", fi.filename)
        print("Call stack[{}] = {}() (line {} in {})".format(count, fi.function, fi.lineno, filename[-1]))
        count += 1

9:15 Standup
Aaron has topic clustering working:

GPT Agents

Put together a doc with existing and new prompts (“Vaccines are”, “Vaccines are a”, “I/We/<other groups> think that vaccines are”)
I need to do some research on if the API can really do this, but I’d like to make the new corpus of threaded tweets that are pulled because they mention general terms like “COVID”, “VIRUS”, and “VACCINE”, then train the models and drill down. I still like the idea of training monthly models starting in Nov 2019 to present.
It does look like threads have more engagement and would be harder for bots to generate.
- Change in Threads on Twitter Regarding Influenza, Vaccines, and Vaccination During the COVID-19 Pandemic: Artificial Intelligence–Based Infodemiology Study
- Automatically Identifying Fake News in Popular Twitter Threads
I learned how to make a thread so I can look for it:

Phil 1.19.2021

Todo:

Doctor! Couldn’t connect a call, so requested an appt via the form, which is kind of lame because there is no way to specify the reason for the visit
Spanish!
Order light – done

SBIRs

Putting in comments and cleaning code

GPT-Agents

Put together a list of prompts in a Google Doc and distribute
Add embeddings to the GPT comms class. Doesn’t seem to be available? Applied for access
Tweaked OpenAIComms to have one response with higher penalties and temp

Book

Finish chapter?

Phil 1.18.2022

Ran out of space on the C drive. Seeing if moving the cache will help: docs.microsoft.com/en-us/troubleshoot/windows-client/networking/change-csc-folder-location-with-cachelocation-registry

GPT Agents

Got the paper submitted last Saturday! April 7 is when we’ll find out
4:30 Meeting. We worked on what to do next. We are going to look at the monthly models from 2020 and see how their responses move with respect to embedding space and the same prompt. The first step is to collect the prompts we used from the paper and see if we want to add any new ones

SBIRs

9:15 Sprint planning. Need to write up some stories
Cybersecurity training – done!
Looking for related corpora at Gutenberg

Book

Continue to work on conspiracy chapter

Phil 1.14.2022

GPT Agents

For the three star rating category: :
- Get the total and add it to the Dict – done
- compute an error metric (L1 difference) for the estimated proportion of positive reviews for “gray bars” (GPT with the reviews containing the keywords held out) vs the ground truth “blue bars” . Report this error metric in a table (performance of our method). – done
- simulate the empirical count baseline method in the low data scenario: draw a small number of reviews containing the keyword, let’s say 6 of them). Compute the error metric (L1 difference) for the empirical counts baseline, computed on this subset, vs the ground truth “blue bars”. Repeat this many times (say, 10,000 times). Report the average error metric in a table (performance of the baseline method). – done
Finished the data extraction. Now I have to make spreadsheets and charts.
Very happy with this:

Fix the TODOs – Done
The last thing to do is fill out the ethics form and submit

SBIRs

Add story for paper and clone for Aaron – Done

Phil 1.13.2022

Spanish!

Tim!

SBIRs

Standup was cancelled for today

GPT Agents

Made good progress yesterday
Continue on interpolation section. Set up the pretrained average stars in a table and drop the figure. Show the bar chart and Pearson’s
Add comparison of GPT and GPT(v). Chart? Table? And show Pearson’s
1:00 – 2:30 Meeting
- Good progress. I need to do for the three star rating category:
  - compute an error metric (L1 difference) for the estimated proportion of positive reviews for “gray bars” (GPT with the reviews containing the keywords held out) vs the ground truth “blue bars” . Report this error metric in a table (performance of our method).
  - simulate the empirical count baseline method in the low data scenario: draw a small number of reviews containing the keyword, let’s say 6 of them). Compute the error metric (L1 difference) for the empirical counts baseline, computed on this subset, vs the ground truth “blue bars”. Repeat this many times (say, 10,000 times). Report the average error metric in a table (performance of the baseline method).
- Finished the data extraction. Now I have to make spreadsheets and charts

Phil 1.12.2022

Spanish!

GPT Agents

I think I want to put the results into three sections: 1) Memorization, or the learning of the meta-wrapper, 2) Interpolation, or how the model re-creates correct reviews 3) Extrapolation, how the model creates new (zero shot) reviews
Add a section to the beginning of the methods section stating that all finetuning was done on the Huggingface GPT-2 117M parameter model.
- For speed (easier to produce a model for comparison)
- For the environment
- To show that state-of-the art insight into TLMs does not require building large models

Phil 1.11.2022

Dentist!

Book

Did a little more tweaking on the visualization to make screenshots more legible:

I also can make the philosophy map this way, so that part of the chapter can set up this technique:

JuryRoom

Discussion with Panos and Aaron yesterday. Not sure where things are going, but definitely nearing the end of this stage (Huri Whakato)

SBIRs

9:15 Standup

GPT Agents

Continue working on paper
I’m really liking the Computer-Assisted Keyword and Document Set Discovery paper:
- Our algorithm is human-led and computer-assisted rather than fully automated; it is related to semi supervised learning (Zhu and Goldberg 2009).
Found this in the Scholar list of citing papers: Separating the wheat from the chaff: A topic and keyword-based procedure for identifying research-relevant text. It’s in Poetics, which actually seems like a potential venue for this paper if it gets rejected
I think that a good way to show how this matters is to use TTestIndPower to calculate the minimum sample size to determine if the populations are different as described in this tutorial

Phil 1.10.2022

SBIRs

Write up the points from the discussion with Aaron last Friday. I think it will make a much better direction than trying to figure out how to automate the current manual approach
Continue code cleanup. There is still something that makes the radius of a MoveableNode grow wrong when the item count is incremented (maybe fixed? Line 34 of in MapData.Maptopic.adjust_force_node())

GPT Agents

Start writing the paper and see what shakes out
Ask to reschedule Tuesday’s meeting – done

Book

I got a visualization that I like. Now I need to rewrite the chapter a bit around it. I think just the main visualization should be ok, with maybe different perspectives?

Maybe also generate a “philosophy” terrain? I’d need to have some code to handle a rollover for the z value

Phil 1.7.2022

It snowed again! I think that’s more snow in one week than the past two years

GPT Agents

Need to compare the performance of each model for each probe and compare to ground truth. One thing to point out is how little data there is to sample:

SBIRs

Fixing the “find matching”
Make node size log-based

Book

Put together some more data. Need to change the maps a bit

Phil 1.6.2022

One year ago things were pretty crazy here

GPT Agents

From On the Reliability and Validity of Detecting Approval of Political Actors in Tweets (Section 4.2) as an example of keyword SOTA :
- We evaluate OTS and custom methods on the following datasets. While some of these datasets have common targets, for example, Trump is present in four of them, they are all collected in different periods of time, with different keywords (c.f Appendix B). All datasets have stance labels of ‘favor’, ‘against’, and ‘none’ towards the targets. (EMNLP)
Finished with generating the new data, now we get to see if it works!
It’s pretty good. Here’s the two GPT models, one trained on the first 50k reviews of the American dataset (iso) and the other trained on the first 50k of the American dataset that do not contain the string “vegetarian options”. The probes are:
- no vegetarian options
- some vegetarian options
- several vegetarian options
- many vegetarian options

Basically identical
Now I need to compare the response vs the ground truth for each of the probes

Phil 1.5.2022

Jamie Raskin just released a book that apparently has some overlap with my work? Trying to track it down. Here’s something from CBS

GPT Agents

Creating unistar models from the corpora that have ‘vegetarian options’ removed. As they are trained, I’m also generating responses to the vegetarian prompts that I’ll do the star and unigram compares with. Then put that in a table and write the paper around it. Also, add the Floober part or something fanciful.
Models are all created. Finished running the first two and am now adding sentiment to them

SBIRs

Continue code cleanup and documenting. I managed to remove a good deal of code that had to do with handing raw text selection of topics, since that seems to be broken in tk
- Finished commenting QueryFrame. Now I need to fix that listing problem in on_link_existing_clicked()
Set up meeting to discuss LAIC dev plans – done

Phil 1.4.2022

It got really cold last night and I had forgotten to turn the water off to the outside and lost the faucet on the deck. Could have been worse. At least the pipes didn’t burst

Thinking about submitting a writeup on Sanhedrin 17a (Section 10.4 of the dissertation. Mostly) for the We Robot conference

Abstracts due: March 7
Decisions: May 9
Final papers due: August 8

Book

Playing around with negative scalars to see how that works. This resulted in some code cleanup and a better color gradient. Not sure if it looks better though:

Still like this better:

SBIRs

Sprint planning
Working on code cleanup for MabBuilder. First, adding comments!
Fixed the exit condition that happened when clicking the ‘X’ close icon in the text compare popup
Next, check through all the button behavior in QueryFrame
- Set Group
- Add Topic/Seed
- Add Topic
- Add Seed
- Find Closest (and dialog)
- Add Group
- Next Seed
- Rerun Seed
- Get Topic Details
- Direct Prompt
- Wikipedia
- Link Existing (make this work with descending length topics)

GPT Agents

3:30 Meeting. Going to make some models that explicitly are missing the phrase ‘vegetarian options’ from the training corpora. I’ll then run those as to compare to ‘vegetarian options’ in the ground truth by star and the other GPT models

Phil 1.3.2022

This looks interesting: www.oreilly.com/library/view/natural-language-processing/9781098103231/

Book

After a few false starts, I have the terrain extended:

I still need to:
- add a ‘lit’ and ‘unlit’ node for terrain and labels – done
- add a height scalar – done
- toggle grids and axis – done
- Shift keys to move the lights the other direction, plus lambda functions for the parameters – done
- Maybe add fog? docs.panda3d.org/1.10/python/programming/render-attributes/fog – nope, can’t get the fog to be relative to the terrain center

Today’s progress:

GPT Agents

Get the number of POSITIVE and NEGATIVE sentiment for each isolated model and compare to ground truth. Make a chart and add to the draft. This is the part that shows that creating models for a population captures that population’s patterns, and that this method is more accurate and reliable than assuming that one general model has all the information needed in an accessible way. Done

viztales

Dimension reduction, State, Orientation, and Speed

Category Archives: Phil

Phil 1.24.2022

Phil 1.21.2022

Phil 1.20.2022

Phil 1.19.2021

Phil 1.18.2022

Phil 1.14.2022

Phil 1.13.2022

Phil 1.12.2022

Phil 1.11.2022

Phil 1.10.2022

Phil 1.7.2022

Phil 1.6.2022

Phil 1.5.2022

Phil 1.4.2022

Phil 1.3.2022