The overarching goal of the IDeaS Center at Carnegie Mellon University is to enhance social cyber-security to preserve and support and informed democratic society. The challenge today is that disinformation, hate speech, information warfare, and propaganda are amplified by cyber-technology like social media. We must remain informed, thoughtful members of our communities and countries, despite online and informational challenges.
The science to characterize, understand, and forecast cyber-mediated changes in human behavior, social, cultural and political outcomes,
The science and engineering to build the cyber-infrastructure needed for society to persist in its essential character in a cyber-mediated information environment under changing conditions, actual or imminent social cyber-threats
Glamorous Toolkit is the moldable development environment. It is a live notebook. It is a flexible search interface. It is a fancy code editor. It is a software analysis platform. It is a data visualization engine. All in one.
This looks pretty interesting. Need to spend some time digging deeper. Friday?
“The antibody response to the virus has been shown to be transient and these antibodies start to wane after 3 to 4 months,” he said, adding that at 6 months they are “mostly undetectable” in many people who were infected early on in the epidemic,” (via Reuters)
MORS
11:30 – 12:10 New Adaptive Sampling Techniques to Optimally Augment Complex Simulation Models
12:10 – 12:50 Influence Planning Product
12:50 – 1:30 Risk Management for irregular Events
2:00 – 3:00 Social Cybersecurity
3:40 – 4:20 Community Resilience Indicator Analysis
With the rise of online platforms where individuals could gather and spread information came the rise of online cybercrimes aimed at taking advantage of not just single individuals but collectives. In response, researchers and practitioners began trying to understand this digital playground and the way in which individuals who were socially and digitally embedded could be manipulated. What is emerging is a new scientific and engineering discipline—social cybersecurity. This paper defines this emerging area, provides case examples of the research issues and types of tools needed, and lays out a program of research in this area.
In today’s high tech world, beliefs opinions and attitudes are shaped as people engage with others in social media, and through the internet. Stories from creditable news sources and finding from science are challenged by actors who are actively engaged in influence operations on the internet. Lone wolfs, and large propaganda machines both disrupt civil discourse, sew discord and spread disinformation. Bots, cyborgs, trolls, sock-puppets, deep fakes, and memes are just a few of the technologies used in social engineering aimed at undermining civil society and supporting adversarial or business agendas. How can social discourse without undue influence persist in such an environment? What are the types of tools and theories needed to support such open discourse?
Today scientists from a large number of disciplines are working collaboratively to develop these new tools and theories. There work has led to the emergence of a new area of science—social cybersecurity. Herein, this emerging scientific area is described. Illustrative case studies are used to showcase the types of tools and theories needed. New theories and methods are also described.
MORS
Email to Dr. Carley – done!
Really nice talk by Dr. Michiel Deskevich at OptTek:
Information Warfare panel. Started with Gerasimov, which is pretty cool
Need to set up a meeting with Sim to tag-team together a cosine similarity for the GPT embedding.
I think it can be lazy, and calculate the CS as it goes.
Save the current distance matrix out as a csv, and read it in the next time, so that it continues to grow
Can use the training corpora to create a set of words as a baseline matrix
For words that have more than 1 embedding, have subsequent distance be specified in the matrix as “foo”, “foo1”, … “fooN”. That lets distance calculations be performed between the variants, and also to point back at the correct usage easily
I looked into the Association of Computational Linguistics as a possible venue for the chess paper. Aside from being a bit shorter (8 pages), the difference between the ACL papers that I looked at and an mine seems to be mostly the amount of explicit math in the description of the algorithm. Here are some examples from 2020 that I think are in roughly the same area:
Adjusting the citations to include some ACL papers (like the last one should be straightforward). The page count will have to be evaluated once the template is made public. Here’s the 2020 template: http://aacl2020.org/calls/papers/#paper-submission-and-templates
MORS
An amygdala hijack refers to a personal, emotional response that is immediate, overwhelming, and out of measure with the actual stimulus because it has triggered a much more significant emotional threat.[1] The term was coined by Daniel Goleman in his 1996 book Emotional Intelligence: Why It Can Matter More Than IQ
Welcome to the CASOS data page. On this page you can find a wide variety of data for use with our software. This data includes social networks, communication networks, semantic networks, dynamic networks and geo-spatial networks.
UMBC now has a subscription to PolicyMap, a GIS tool that allows users to create maps, tables, and reports from a variety of datasets ranging from demographics, income, health, education, more. Maps can be created as single sheets or with multiple layers from the zip code / block level to worldwide.
Users can create individual accounts to save, share, and print work. A suite of tutorials is available to help both new and experienced users work with the tool effectively.
It’s been a year since we heard about COVID-19 for the first time. Let’s see how things are going. First, the selection of countries that I’ve been tracking:
Ouch. Germany and Finland seem to be doing well in Europe, but the rest… It looks like it’s going to be a bad winter. I think it is interesting how countries like France, Italy and Switzerland that seemed to have things under control are now at USA levels of deaths per million.
The hard-hit eastern states still look a lot like the parts of Europe that are still on top of the spread. Georgia, Mississippi, and the Dakotas look very bad. Washington and California, which were hit early, are still experiencing very low rates. I guess we’ll see how this chart looks in January. If there is a Thanksgiving-related surge, we should see it by then.
Book
Work on attention
GOES
10:00 Meeting with Vadim. Pymoo is much better to install than Pyomo. It’s API seems more straightforward too. Vadim is working on figuring out the examples
The generated HTML file to make that chart is huge, btw. It’s 2.9MB.
And it’s slooooooow if you just use fig.show(). fig.write_html(‘file_name.html’, auto_open=True) is much faster. Ok. That means the figures can be saved as dynamic pages, which is kind of cool.
Got dash running, which set up a server for your interactive graphs. Not really sure which one is better, though I’m guessing that data can be live inside dash graphs. I don’t think this will matter too much with the embedding charts, but it’s good to know
Hot-reloading is cool, and works with data or text changes. And the management of the html is nice. It appears to be based on a React engine and it’s nice to not have to care!
CSS-type styling works! If you make an error in the code, the program bails with an error message
More on cults, probably. Just need to get started writing again after the break – made a lot of progress!
GPT-2
Look at libraries for plotting embeddings interactively. The OpenGL developer in me is digging VisPy
GOES
SATERN training
Register for MORS!!! – Done!
1:30 meeting with Vadim
Went over the Pyomo api, which is very complicated to install. It works, but getting the solvers to work in the API call framework requires all kinds of additional work.
Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents. To what extent do deep models capture information at these scales, and can we force them to better capture structure across this hierarchy? We approach this question by focusing on individual neurons, analyzing the behavior of their activations at different timescales. We show that signal processing provides a natural framework for separating structure across scales, enabling us to 1) disentangle scale-specific information in existing embeddings and 2) train models to learn more about particular scales. Concretely, we apply spectral filters to the activations of a neuron across an input, producing filtered embeddings that perform well on part of speech tagging (word-level), dialog speech acts classification (utterance-level), ortopic classification (document-level), while performing poorly on the other tasks. We also present a prism layer for training models, which uses spectral filters to constrain different neurons to model structure at different scales. Our proposed BERT + Prism model can better predict masked tokens using long-range context and produces multiscale representations that perform better at utterance- and document-level tasks. Our methods are general and readily applicable to other domains besides language, such as images, audio, and video.
The researchers find that companies expecting higher levels of machine readership prepare their disclosures in ways that are more readable by this audience. “Machine readability” is measured in terms of how easily the information can be processed and parsed, with a one standard deviation increase in expected machine downloads corresponding to a 0.24 standard deviation increase in machine readability. For example, a table in a disclosure document might receive a low readability score because its formatting makes it difficult for a machine to recognize it as a table. A table in a disclosure document would receive a high readability score if it made effective use of tagging so that a machine could easily identify and analyze the content.
GPT-2 Agents
I want to create a database for generated output. There are two tables:
table_experiment – done!
Contains the experiment details:
id (key)
Date
Probe list
all hyperparameters
table_output – done!
id
experiment_id
root_id
tag (e.g. “raw”, “date”, “location”, “tweet”
depth (this is the index of each piece of content. Raw is 0, then each parsed out section increases depth by 1)
content
regexes
Created a gpt_experiments database. I need to make sure that I can read from one db and write to another
Good results on the test. Need to try something at a larger scale to test the embeddings:
I think I want to put together a small command-line app that allows a discussion with the language model. All text from the ongoing conversation is saved and and used as the input for the next. A nice touch would be to have some small number of responses to choose from, and the conversation follows that branch.
Come to think of it, that could be a cool artificial JuryRoom/Eliza
Generate compact text for Sim to try training
Look into W2V 3D embedding of outputs, and mapping to adjacent outputs (The wo/man walked into the room). We know that there should be some level of alignment
Graphs are one of the fundamental data structures in machine learning applications. Specifically, graph-embedding methods are a form of unsupervised learning, in that they learn representations of nodes using the native graph structure. Training data in mainstream scenarios such as social media predictions, internet of things(IOT) pattern detection or drug-sequence modeling are naturally represented using graph structures. Any one of those scenarios can easily produce graphs with billions of interconnected nodes. While the richness and intrinsic navigation capabilities of graph structures is a great playground for machine learning models, their complexity posses massive scalability challenges. Not surprisingly, the support for large-scale graph data structures in modern deep learning frameworks is still quite limited. Recently, Facebook unveiled PyTorch BigGraph, a new framework that makes it much faster and easier to produce graph embeddings for extremely large graphs in PyTorch models.
GOES
Add composite rotation vector to ddict output. It’s kind of doing what it’s supposed to
Think about a NN to find optimal contributions? Or simultaneous solution of the scalars to produce the best approximation of the line? I think this is the way to go. I found pymoo: Multi-objective Optimization in Python
Our framework offers state of the art single- and multi-objective optimization algorithms and many more features related to multi-objective optimization such as visualization and decision making. Going to ask Vadim to see if it can be used for our needs
Incorporating context into word embeddings – as exemplified by BERT, ELMo, and GPT-2 – has proven to be a watershed idea in NLP. Replacing static vectors (e.g., word2vec) with contextualized word representations has led to significant improvements on virtually every NLP task. But just how contextual are these contextualized representations?
With the increasing ubiquity of natural language processing (NLP) algorithms, interacting with “conversational artificial agents” such as speaking robots, chatbots, and personal assistants will be an everyday occurrence for most people. In a rather innocuous sense, we can perform a variety of speech acts with them, from asking a question to telling a joke, as they respond to our input just as any other agent would.
Book
Write some of the “Attention + Dominance” paper/chapter outline for Antonio. It’s important to mention that these are monolithic models. It could be a nice place for the Sanhedren 17a discussion too.
GOES
Rework primary_axis_rotations.py to use least-squares. It’s looking pretty good!
It’s still not right, dammit! I’m beginning to wonder if the rwheels are correct? Wheels 1 and 4 are behaving oddly, and maybe 3. It’s like they may be spinning the wrong way?
Nope, it looks like it is the way the reaction wheel contributions are being calculated?
With these challenges in mind, we built and open-sourced the Language Interpretability Tool (LIT), an interactive platform for NLP model understanding. LIT builds upon the lessons learned from the What-If Tool with greatly expanded capabilities, which cover a wide range of NLP tasks including sequence generation, span labeling, classification and regression, along with customizable and extensible visualizations and model analysis. (GitHub)
Read and annotate Michelle’s outline, and add something about attention. That’s also the core of my response to Antonio
More cults
2:00 Meeting
Thinking about how design must address American Gnosticism, and the danger and opportunities of online “research”, and also how things like maps and diversity injection can potentially make profound impacts
GOES
Update test code to use least squares/quaternion technique
Looks like we are getting close to ingesting all the new data
Had a meeting with Ashwag last night (Note – we need to move the time), and the lack of ‘story-ness’ in the training set is really coming out in the model. The meta information works perfectly, but it’s wrapped around stochastic tweets, since there is no threading. I think there needs to be some topic structure in the meta information that allows similar topics to be grouped sequentially in the training set.
3:30 Meeting
GOES
9:30 meeting
Update code with new limits on how small a step can be. Done, but I’m still having normal problems. It could be because I’m normalizing the contributions?
Need to fix the angle rollover in vehicle (and reference?) frames. I don’t think that it will fix anything though. I just don’t get why the satellite drifts after 70-ish degrees:
There is something not right in the normal calculation?
You must be logged in to post a comment.