7:00 – 3:00 ASRC
3rd Annual DoD AI Industry Day
From Stewart Russell, via BBC Business Daily and the AI Alignment podcast:
Although people have argued that this creates a filter bubble or a little echo chamber where you only see stuff that you like and you don’t see anything outside of your comfort zone. That’s true. It might tend to cause your interests to become narrower, but actually that isn’t really what happened and that’s not what the algorithms are doing. The algorithms are not trying to show you the stuff you like. They’re trying to turn you into predictable clickers. They seem to have figured out that they can do that by gradually modifying your preferences and they can do that by feeding you material. That’s basically, if you think of a spectrum of preferences, it’s to one side or the other because they want to drive you to an extreme. At the extremes of the political spectrum or the ecological spectrum or whatever image you want to look at. You’re apparently a more predictable clicker and so they can monetize you more effectively.
So this is just a consequence of reinforcement learning algorithms that optimize click-through. And in retrospect, we now understand that optimizing click-through was a mistake. That was the wrong objective. But you know, it’s kind of too late and in fact it’s still going on and we can’t undo it. We can’t switch off these systems because there’s so tied in to our everyday lives and there’s so much economic incentive to keep them going.
So I want people in general to kind of understand what is the effect of operating these narrow optimizing systems that pursue these fixed and incorrect objectives. The effect of those on our world is already pretty big. Some people argue that operation’s pursuing the maximization of profit have the same property. They’re kind of like AI systems. They’re kind of super intelligent because they think over long time scales, they have massive information, resources and so on. They happen to have human components, but when you put a couple of hundred thousand humans together into one of these corporations, they kind of have this super intelligent understanding, manipulation capabilities and so on.
- Predicting human decisions with behavioral theories and machine learning
- Behavioral decision theories aim to explain human behavior. Can they help predict it? An open tournament for prediction of human choices in fundamental economic decision tasks is presented. The results suggest that integration of certain behavioral theories as features in machine learning systems provides the best predictions. Surprisingly, the most useful theories for prediction build on basic properties of human and animal learning and are very different from mainstream decision theories that focus on deviations from rational choice. Moreover, we find that theoretical features should be based not only on qualitative behavioral insights (e.g. loss aversion), but also on quantitative behavioral foresights generated by functional descriptive models (e.g. Prospect Theory). Our analysis prescribes a recipe for derivation of explainable, useful predictions of human decisions.
- Adversarial Policies: Attacking Deep Reinforcement Learning
- Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another agent’s observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a multi-agent environment so as to create natural observations that are adversarial? We demonstrate the existence of adversarial policies in zero-sum games between simulated humanoid robots with proprioceptive observations, against state-of-the-art victims trained via self-play to be robust to opponents. The adversarial policies reliably win against the victims but generate seemingly random and uncoordinated behavior. We find that these policies are more successful in high-dimensional environments, and induce substantially different activations in the victim policy network than when the victim plays against a normal opponent. Videos are available at this http URL.
“Everything that we see is a shadow cast by that which we do not see.” – Dr. King
ASRC GOES 7:00 – 4:30
- Dissertation – more human study. Pretty smooth progress right now!
- Cleaning up the sim code for tomorrow – done. All the prediction and manipulation to change the position data for the RWs and the vehicle are done in the inference section, while the updates to the drawing nodes are separated.
- I think this is the code to generate GPT-2 Agents?: github.com/huggingface/transformers/blob/master/examples/run_generation.py
Listening to the On Being interview with angel Kyodo williams
We are in this amazing moment of evolving, where the values of some of us are evolving at rates that are faster than can be taken in and integrated for peoples that are oriented by place and the work that they’ve inherited as a result of where they are.
This really makes me think of the Wundt curve (FMRI analysis here?), and how misalignment between a bourgeoisie class (think elites) and a proletariat class. Without day-to-day existence constraints, it’s possible for elites to move individually and in small groups through less traveled belief spaces. Proletarian concerns are have more “red queen” elements, so you need larger workers movements to make progress.
7:00 – 5:00 GOES
- Dissertation – finish up the maps chapter – done!
- Try writing up more expensive information thoughts (added to discussion section as well)
- Game theory comes from an age of incomplete information. Now we have access to mostly complete, but potentially expensive information
- Expense in time – throwing the breakers on high-frequency trading
- Expense in $$ – Buying the information you need from available resources
- Expensive in resources – developing the hardware and software to obtain the information (Operation Hummingbird to TPU/DNN development)
- By handing the information management to machines, we create a human-machine social structure, governed by the rules of dense/sparse,stiff/slack networks
- AI combat is a very good example of an extremely stiff network (varies in density) and the associated time expense. Combat has to happen as fast as possible, due to OODA loop constraints. But if the system does not have designed-in capacity to negotiate a ceasefire (on both/all sides!), there may be no way to introduce it in human time scales, even though the information that one side is losing is readily apparent.
- Online advertising is a case where existing information is hidden from the target of the advertiser, but available to the platform, and to a lesser degree, the client. Because this information asymmetry, the user’s behavior/beliefs are more likely to be exploited in a way that denies the user agency, while granting maximum agency to the platform and clients.
- Deepfakes, spam and the costs of identifying deliberate misinformation
- Call to action: the creation of an information environment impact body that can examine these issues and determine costs. This is too complex a process for the creators to do on their own, and there would be rampant conflict of interest anyway. But an EPA-like structure, where experts in this topic perform as a counterbalance to unconstrained development and exploitation of the information ecosystem
- The Knowledge, Analytics, Cognitive and Cloud Computing (KnACC) lab in the Information Systems department in UMBC aims to address challenging issues at the intersection of Data Science and Cloud Computing. We are located in ITE 415.
- Start creating NN that takes pitch/roll/yaw star tracker deltas and tries to calculate reaction wheel efficiency
- input vector is dp, dr, dy. Assume a fixed timestep
- output vector is effp, effr, effy
- once everything trains up, try running the inferencer on the running sim and display “inferred RW efficiency” for each RW
- Broke out the base class parts of TF2OptimizerTest. I just need to generate the test/train data for now, no sim needed
big ending news for the day