Category Archives: Phil

Phil 10.7.19

AmeriSpeak® is a research panel where members get rewards when they share their opinions. Members represent their communities when they answer surveys online, through the AmeriSpeak App, or by phone. Membership on the AmeriSpeak Panel is by invitation only. Policymakers and business leaders use AmeriSpeak survey results to make decisions that impact our lives. Tell us your thoughts on issues important to you, your everyday life, and topics in the news such as health care, finance, education, technology, and society.

Commission launches call to create the European Digital Media Observatory The European Commission has published a call for tenders to create the first core service of a digital platform to help fighting disinformation in Europe. The European Digital Media Observatory will serve as a hub for fact-checkers, academics and researchers to collaborate with each other and actively link with media organisations and media literacy experts, and provide support to policy makers. The call for tenders opened on 1 October and will run until 16 December 2019.

ASRC GOES 7:00 – 7:00

  • Expense Report!
  • Call Erikson!
  •  Dissertation
    • Change safe to low risk
    • Tweaking the Research Design chapter
  • Evolver
    • See if the run broke or completed this weekend – IT restarted the machine. Restarted and let it cook. I seem to have fixed the GPU bug, since it’s been running all day. It’s 10,000 models!
    • Look into splitting up and running on AWS
    • Rather than explicitly gathering ten runs each time for each genome, I could hash the runs by the genome parameters. More successful genomes will be run more often.
    • Implies a BaseEvolver, LazyEvolver, and RigerousEvolver class
  • Neural Network Based Optimal Control: Resilience to Missed Thrust Events for Long Duration Transfers
    • (pdf) A growing number of spacecraft are adopting new and more efficient forms of in-space propulsion. One shared characteristic of these high efficiency propulsion techniques is their limited thrust capabilities. This requires the spacecraft to thrust continuously for long periods of time, making them susceptible to potential missed thrust events. This work demonstrates how neural networks can autonomously correct for missed thrust events during a long duration low-thrust transfer trajectory. The research applies and tests the developed method to autonomously correct a Mars return trajectory. Additionally, methods for improving the response of neural networks to missed thrust events are presented and further investigated.
  • Ping Will for Thursday rather than Wednesday done – it seems to be a case where the first entry is being duplicated
  • Arpita’s presentation:
    • Information Extraction from unstructured text
    • logfile analysis
    • Why is the F1 score so low on open coding with human tagging?
    • Annotation generation slide is not clear

Phil 10.4.19

7:00 – 2:00 ASRC GOES

10.3.19

7:00 – 10:00 ASRC FAA

  • Dissertation – working on evolved vs. designed
  • Meeting in Boston – I think it was a “making friends” process. Silas says that you need to meet f2f seven times before business really starts to happen. Surprising traction for simulation, though

Phil 10.2.19

ASRC GOES 7:00 – 5:00

  • Dissertation. Working on the differences between designed and evolved systems
  • Status report
  • Add statistical tests to the evolver.
    • based on this post, starting with the scikit-learn resample(). Here are the important bits:
      def calc_fitness_stats(self, resample_size:int = 100):
          boot = resample(self.population, replace=True, n_samples=resample_size, random_state=1)
          s = pd.Series(boot)
          conf = st.t.interval(0.95, len(boot)-1, loc=s.mean(), scale= st.sem(boot))
          self.meta_info = {'mean':s.mean(), '5_conf':conf[0], '95_conf':conf[1], 'max':s.max(), 'min':s.min()}
          self.fitness = s.mean()
    • And the convergence on the test landscape looks good:

Stats

  • Added check that the same genome doesn’t get re-run, since it will be run for n times to produce a distribution:
    # randomly breed new genomes with a chance of mutation
    while len(self.current_genome_list) < self.num_genomes:
        g1i = random.randrange(len(self.best_genome_list))
        g2i = random.randrange(len(self.best_genome_list))
        g1 = self.best_genome_list[g1i]
        g2 = self.best_genome_list[g2i]
        g = self.breed_genomes(g1, g2, crossover_rate, mutation_rate)
        match = False
        for gtest in self.all_genomes_list:
            if g.chromosome_dict == gtest.chromosome_dict:
                match = True
                break
        if not match:
            self.current_genome_list.append(g)
            self.all_genomes_list.append(g)

Phil 10.1.19

Crap – It’s October already!

7:00 – 4:00 ASRC GOES

  • Unsupervised Thinking – a podcast about neuroscience, artificial intelligence and science more broadly
  • Dissertation
    • Cleanup of most of the sections up to and through the terms part of the Theory section.
  • Fix problem with the fitness values? Also,  save the best chromosome always. Fixed I think. Testing.
    • So there’s a big problem, which I kind of knew about. The random initialization of weights makes a HUGE difference in the performance of the model. I discovered this while looking at the results of the evolver, which saves the best of each generation and saves them out to a spreadsheet:

variance

    • If you look at row 8, you see a lovely fitness of 0.9, or 90%. Which was the best value from the evolver runs. However, after sorting on the parameters so that they were grouped, it became obvious that there is a HUGE variance in the results. The lowest fitness is 30%, and the average fitness for those values is actually 50%. I tried running the parameters on multiple trained models and got results that agree. These values are all over the place (the following images are 20%, 30%, 60%, and 80% accuracy, and all using the same parameters):

 

    • To adress this, I need to be able to run a population and get the distribution stats (number of runs, mean, min, max, variance) and add that to the spreadsheet. Started by adding some randomness to the demo data generating function, which should do the trick. I’ll start on the rest tomorrow.
  • Yikes! I’m going to try installing the release version of TF. it should be just pip install tensorflow-gpu. Done! Didn’t break anything 🙂

Phil 9.30.19

7:00 – 7:00 ASRC GOES

  • Dissertation
  • Evolutionary hyperparameter tuning. It’s working (60% is better than my efforts), but there’s a problem with the fitness values? Also, I want to save the best chromosome always

Training

  • Reread weapons paper
  • Meeting with Aaron M – going to try to rework the paper a bit for ICSE 2020. Deadline is Oct 29.
    • Some interesting discussion on how review systems should work
    • Also some thoughts about how military AI in a hyperkinetic environment would have to negotiate cease-fires, sue for peace, etc.

RANDOM.ORG offers true random numbers to anyone on the Internet. The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs. People use RANDOM.ORG for holding drawings, lotteries and sweepstakes, to drive online games, for scientific applications and for art and music. The service has existed since 1998 and was built by Dr Mads Haahr of the School of Computer Science and Statistics at Trinity College, Dublin in Ireland. Today, RANDOM.ORG is operated by Randomness and Integrity Services Ltd.

Phil 9.22.19

Getting ready for a fun trip: VA

12th International Conference on Agents and Artificial Intelligence – Dammit, the papers are due October 4th. This would be a perfect venue for the GPT2 agents

Novelist Cormac McCarthy’s tips on how to write a great science paper

Unveiling the relation between herding and liquidity with trader lead-lag networks

  • We propose a method to infer lead-lag networks of traders from the observation of their trade record as well as to reconstruct their state of supply and demand when they do not trade. The method relies on the Kinetic Ising model to describe how information propagates among traders, assigning a positive or negative “opinion” to all agents about whether the traded asset price will go up or down. This opinion is reflected by their trading behavior, but whenever the trader is not active in a given time window, a missing value will arise. Using a recently developed inference algorithm, we are able to reconstruct a lead-lag network and to estimate the unobserved opinions, giving a clearer picture about the state of supply and demand in the market at all times.
    We apply our method to a dataset of clients of a major dealer in the Foreign Exchange market at the 5 minutes time scale. We identify leading players in the market and define a herding measure based on the observed and inferred opinions. We show the causal link between herding and liquidity in the inter-dealer market used by dealers to rebalance their inventories.

Phil 9.20.19

7:00 – 5:00 ASRC GOES

  • Maryland Anatomy Board (Juan Ortega – Acting Director Anatomical Services Division) Dept of vital records 410 764 2922 – Hopefully done on this
  • Dissertation
    • Rewrote abstract
    • Tweaked games and finished maps?
  • Create TimeSeriesML2 for TF2 – done, and everything is working!
    • Copy project
    • Rename
    • Point to new
  • Write pitch for ASRC funding NZ trip – done
  • Got my linux box working for vacation

Phil 9.19.19

ASRC AI Workshop 8:00 – 3:00

  • Maryland Anatomy Board Dept of vital records 410 764 2922
  • I remember this! Seeking New Physics
  • Dissertation? Some progress on the game section
  • Working on integrating the test DNN into the EO.
    • Need to add a few columns for the output that have the step and set membership.
    • Need to not run genomes that have already been run? Or maybe use an average? More output to spreadsheets for now, but I have to think about this more
    • Ok, I was expecting this:

path_error

Phil 9.18.19

7:00 – 5:00 ASRC GOES

  • Dept of vital records 410 764 2922 maryland.gov
  • Dissertation
  • EvolutionaryOptimizer
    • Work on getting code to work with perceptron model
    • Need to record accuracy and fitness to determine a fitness value. Something like -time*(1 – efficiency) – time. We want a short and accurate to win over long and accurate. I’ll need to play around in excel.
    • A new penalty-based wrapper fitness function for feature subset selection with evolutionary algorithms
      • Feature subset selection is an important preprocessing task for any real life data mining or pattern recognition problem. Evolutionary computational (EC) algorithms are popular as a search algorithm for feature subset selection. With the classification accuracy as the fitness function, the EC algorithms end up with feature subsets having considerably high recognition accuracy but the number of residual features also remain quite high. For high dimensional data, reduction of number of features is also very important to minimize computational cost of overall classification process. In this work, a wrapper fitness function composed of classification accuracy with another penalty term which penalizes for large number of features has been proposed. The proposed wrapper fitness function is used for feature subset evaluation and subsequent selection of optimal feature subset with several EC algorithms. The simulation experiments are done with several benchmark data sets having small to large number of features. The simulation results show that the proposed wrapper fitness function is efficient in reducing the number of features in the final selected feature subset without significant reduction of classification accuracy. The proposed fitness function has been shown to perform well for high-dimensional data sets with dimension up to 10,000.

Phil 9.17.19

7:00 – 6:00 ASRC GOES

  • Dept of vital records 410 764 2922 maryland.gov
  • Working from home today, waiting for a delivery
  • Meet with Will at 3:00. Went smoothly this time.
  • Send Aaron a note that I’ll miss next week and maybe the week after
  • Dissertation – slow going. Wrote a few paragraphs on lists and stories. Need to put the section together on games
  • EvolutionaryOptimizer – done? It’s working nicely on the test set. You can see that it doesn’t always come up with the best answer, but it’s always close and often much faster:
  • Need to write the fitness function that builds and evaluates the model
  • Worked on getting TF 2.0 installed using my instructions, but the TF 2.0 build is broken? Ah, I see that we are now at RC1. Changing the instructions.
  • Everything works now, but my day is done. Need to update my install at work tomorrow.

 

Phil 9.16.19

7:00 – 8:00 ASRC GOES

This makes me happy. Older, but not slower. Yet.

Strave

  • Maryland Anatomy Board Dept of vital records 410 764 2922 – Never got called back
  • Ping Antonio about virtual crowdsourcing of opinion
  • Dissertation – write up dissertation house one-pager
  • Optimizer
    • Generating chromosome sequences.
    • Created a fitness landscape to evaluate

FitnessLandscape

  • Working on breeding and mutation
  • ML Seminar
    • Status, and a few more Andrew Ng segments. How to debug gradient descent
  • Meeting With Aaron M
    • Nice chat
    • GARY MARCUS is a scientist, best-selling author, and entrepreneur. He is Founder and CEO of Robust.AI.
    • His newest book, co-authored with Ernest Davis, Rebooting AI: Building Machines We Can Trust aims to shake up the field of artificial intelligence.
    • Don’t put the transformer research in the dissertation
  • Evolution of Representations in the Transformer (nice looking blog post of deeper paper)
    • We look at the evolution of representations of individual tokens in Transformers trained with different training objectives (MT, LM, MLM – BERT-style) from the Information Bottleneck perspective and show, that:
      • LMs gradually forget past when forming future;
      • for MLMs, the evolution has the two stages of context encoding and token reconstruction;
      • MT representations get refined with context, but less processing is happening.
  • Different Spirals of Sameness: A Study of Content Sharing in Mainstream and Alternative Media
    • In this paper, we analyze content sharing between news sources in the alternative and mainstream media using a dataset of 713K articles and 194 sources. We find that content sharing happens in tightly formed communities, and these communities represent relatively homogeneous portions of the media landscape. Through a mix-method analysis, we find several primary content sharing behaviors. First, we find that the vast majority of shared articles are only shared with similar news sources (i.e. same community). Second, we find that despite these echo-chambers of sharing, specific sources, such as The Drudge Report, mix content from both mainstream and conspiracy communities. Third, we show that while these differing communities do not always share news articles, they do report on the same events, but often with competing and counter-narratives. Overall, we find that the news is homogeneous within communities and diverse in between, creating different spirals of sameness.

Phil 9.14.19

FBMisinfo

Document This document describes the Facebook Privacy-Protected URLs-light release, resulting from a collaboration between Facebook and Social Science One. It was originally prepared for Social Science One grantees and describes the dataset’s scope, structure, and fields.

As part of this project, we are pleased to announce that we are making data from the URLs service available to the broader academic community for projects concerning the effect of social media on elections and democracy. This unprecedented dataset consists of web page addresses (URLs) that have been shared on Facebook starting January 1, 2017 through to and including February 19, 2019. URLs are included if shared by more than on average 100 unique accounts with public privacy settings. Read the complete Request for Proposals for more information.