Category Archives: Tensorflow

Phil 10.15.19

7:00 – ASRC GOES

  • Well, I’m pretty sure I missed the filing deadline for a defense in February. Looks like April 20 now?
  • Dissertation – More simulation. Nope, worked on making sure that I actually have all the paperwork done that will let me defend in February.
  • Evolver? Test? Done! It seems to be working. Here’s what I’ve got
  • Ground Truth: Because the MLP is trained on a set of mathematical functions, I have a definitive ground truth that I can extend infinitely. It’s simple a set of ten sin(x) waves of varying frequency:

GroundTruth

  • All Predictions: If you read back through my posts, I’ve discovered how variable a neural network can be when it has the same architecture and training parameters. This variation is based solely on the different random initialization  of the weights between layers.
  • I’ve put together a genetic-algorithm-based evolver to determine the best hyperparameters, but because of the variation due to initialization, I have to train an ensemble of models and do a statistical analysis just to see if one set of hyperparameters is truly better than another. The reason is easy to see in the following image. What you are looking at is the input vector being run through ten models that are used to calculate the statistical values of the ensemble. You can see that most values are pretty good, some are a bit off, and some are pretty bonkers.

All_predictions

  • Ensemble Average: On the whole though, if you take the average of all the ensemble, you get a pretty nice result. And, unlike the single-shot method of training, the likelihood that another ensemble produced with the same architecture will be the same is much higher.

Ensemble_average

  • This is not to say that the model is perfect. The orange curve at the top of the last chart is too low. This model had a mean accuracy of 67%. I’ve just kicked off a much longer run to see if I can find a better architecture using the evolver over 50 generations rather than just 2.
  • Ok, it’s now tomorrow, and I have the full run of 50 generation. Things did get better. We end with a higher mean, but we also have a higher variance. This means that it’s possible that the architecture around generation 23 might actually be better:

50_generations

  • Because all the values are saved in the spreadsheet, I can try that scenario, but let’s see what the best mean looks like as an ensemble when compared to the early run:

Best_all_predictions

  • Wow, that is a lot better. All the models are much closer to each other, and appear to be clustered around the right places. I am genuinely surprised how tidy the clustering is, based on the previous “All Predictions” plot towards the top of this post. On to the ensemble average:

Best_ensemble_average

  • That is extremely close to the “Ground Truth” chart. The orange line is in the right place, for example. The only error that I can see with a cursory visual inspection is that the height of the olive line is a little lower than it should be.
  • Now, I am concerned that there may be two peaks in this fitness landscape that we’re trying to climb. The one that we are looking for is a generalized model that can fit approximate curves. The other case is that the network has simply memorized the curves and will blow up when it sees something different. Let’s test that.
  • First, let’s revisit the training set. This model was trained with extremely clean data. The input is a sin function with varying frequencies, and the evaluation data is the same sin function, picking up where we cut off the training data. Here’s an example of the clean data that was used to train the model:

Clean_input

  • Now let’s try noising that up, so that the model has to figure out what to do based on data that model has never seen before:

Noisy_input

  • Let’s see what happened! First, let’s look at all the predictions from the ensemble:

Noisy_predictions

  • The first thing that I notice is that it didn’t blow up. Although the paths from each model are somewhat different, each one got all the paths approximately right, and there is no wild deviation. The worst behavior (as usual?) is the orange band, and possibly the green band. But this looks like it should average well. Let’s take a look:

Noisy_average

  • That seems pretty good. And the orange / green lines are in the right place. It’s the blue, olive, and grey lines that are a little low. Still, pretty happy with this.
  • So, ensembles seem to work very well, and make for resilient, predictable behavior in NN architectures. The cost is that there is much more time required to run many, many models through the system.
  • Work on AI paper
    • Good chat with Aaron – the span of approaches to the “model brittleness problem” can be described using three scenarios:
      • Military: Models used in training and at the start of a conflict may not be worth much during hostilities
      • Waste, Fraud, and Abuse. Clever criminals can figure out how not to get caught. If they know the models being used, they may be able to thwart them better
      • Facial recognition and protest. Currently, protesters in cultures that support large-scale ML-based surveillance try to disguise their identity to the facial recognizers. Developing patterns that are likely to cause errors in recognizers and classifiers may support civil disobedience.
  • Solving Rubik’s Cube with a Robot Hand (openAI)
    • To overcome this, we developed a new method called Automatic Domain Randomization (ADR), which endlessly generates progressively more difficult environments in simulation. This frees us from having an accurate model of the real world, and enables the transfer of neural networks learned in simulation to be applied to the real world.

Phil 10.11.19

7:00 – 5:00 ASRC

  • Got Will’s extractor working last night
  • A thought about how Trump’s popularity appears to be crumbling. Primordial jumps don’t have the same sort of sunk cost that stylistic change has. If one big jump doesn’t work out, try something else drastic. It’s the same or less effort than hillclimbing out of a sinking region
  • Dissertation. Fix the proposal sections as per yesterday’s notes
  • Evolver
    • Write out model files as eval_0 … eval_n. If the new fitness is better than the old fitness, replace best_0 … best_n
    • Which si turning out to be tricky. Had to add a save function to save at the right time in the eval loop

Phil 10.10.19

7:00 – 4:00 ASRC GOES

  • The Daily has an episode on how to detach from environmental reality and create a social reality stampede
  • Dissertation, working on finishing up the “unexpected findings” piece of the research plan
    • Tie together explore/exploit, the Three Patterns, and M&R three behaviors.
    • Also, set up the notion that it was initially explore OR exploit, with no thought given to the middle ground. M&R foreshadowed that there would be, though
  • Registered for Navy AI conference Oct 22
  • Get together with Vadim to see how the physics are going on Tuesday?
  • More evolver
    • installed the new timeseriesML2
    • The test run blew up with a tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cwise_ops_common.cc:82 error. Can’t find any direct help, though maybe try this?
      • Reduce your Batchsize of datagen.flow (by default set 32 so you have to set 8/16/24 )
    • Figured it out – I’m saving models in memory. Need to write them out instead.
  • Swing by campus and check on Will

Phil 10.8.19

7:00 – 5:00 ASRC GOES

  • Had a really good discussion in seminar about weight randomness and hyperparameter tuning
  • Got  Will to show me the issue he’s having with the data. The first element of an item is being REPLACED INTO twice, and we’re not seeing the last one
  • Chat with Aaron about the AI/ML weapons paper.
    • He gave me The ethics of algorithms: Mapping the debate to read
      • In information societies, operations, decisions and choices previously left to humans are increasingly delegated to algorithms, which may advise, if not decide, about how data should be interpreted and what actions should be taken as a result. More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper makes three contributions to clarify the ethical importance of algorithmic mediation. It provides a prescriptive map to organise the debate. It reviews the current discussion of ethical aspects of algorithms. And it assesses the available literature in order to identify areas requiring further work to develop the ethics of algorithms.
    • An issue that we’re working through is when an inert object like a hammer becomes something that has a level of (for lack of a better term) agency imbued by the creator, which creates a mismatch in the user’s head as to what should happen. The more intelligent the system, the greater the opportunity for mismatch. My thinking was that Dourish, in  Where the Action Is had some insight (pg 109):
      • This aspect of Heidegger’s phenomenology is already known in HCI. It was one of the elements on which Winograd and Flores (1986) based their analysis of computational theories of cognition. In particular, they were concerned with Heidegger’s distinction between “ready-to-hand” (zuhanden) and “present-at-hand” (vorhanden). These are ways, Heidegger explains, that we encounter the world and act through it. As an example, consider the mouse connected to my computer. Much of the time, I act through the mouse; the mouse is an extension of my hand as I select objects, operate menus, and so forth. The mouse is, in Heidegger’s terms, ready-to-hand. Sometimes, however, such as when I reach the edge of the mousepad and cannot move the mouse further, my orientation toward the mouse changes. Now, I become conscious of the mouse mediating my action, precisely because of the fact that it has been interrupted. The mouse becomes the object of my attention as I pick it up and move it back to the center of the mousepad. When I act on the mouse in this way, being mindful of it as an object of my activity, the mouse is present-at-hand.
  • Dissertation – working on Research Design. Turns out that I had done the pix but only had placeholder text.
  • Left the evolver cooking last night. Hopefully results today, then break up the class and build the lazy version. Arrgh! Misspelled variable. Trying a short run to verify.
  • That seems to work nicely:

Evolver

  • The mean improves from 57% to 68%, so that’s really nice. But notice also that the range from min to max on line 5 is between 100% and 20%. Wow.
  • Here’s 50 generations. I need to record steps and best models. That’s next:

Evolver50

  • Waikato meeting tonight. Chris is pretty much done. Suggested using word clouds to show group discussion markers

Phil 10.7.19

AmeriSpeak® is a research panel where members get rewards when they share their opinions. Members represent their communities when they answer surveys online, through the AmeriSpeak App, or by phone. Membership on the AmeriSpeak Panel is by invitation only. Policymakers and business leaders use AmeriSpeak survey results to make decisions that impact our lives. Tell us your thoughts on issues important to you, your everyday life, and topics in the news such as health care, finance, education, technology, and society.

Commission launches call to create the European Digital Media Observatory The European Commission has published a call for tenders to create the first core service of a digital platform to help fighting disinformation in Europe. The European Digital Media Observatory will serve as a hub for fact-checkers, academics and researchers to collaborate with each other and actively link with media organisations and media literacy experts, and provide support to policy makers. The call for tenders opened on 1 October and will run until 16 December 2019.

ASRC GOES 7:00 – 7:00

  • Expense Report!
  • Call Erikson!
  •  Dissertation
    • Change safe to low risk
    • Tweaking the Research Design chapter
  • Evolver
    • See if the run broke or completed this weekend – IT restarted the machine. Restarted and let it cook. I seem to have fixed the GPU bug, since it’s been running all day. It’s 10,000 models!
    • Look into splitting up and running on AWS
    • Rather than explicitly gathering ten runs each time for each genome, I could hash the runs by the genome parameters. More successful genomes will be run more often.
    • Implies a BaseEvolver, LazyEvolver, and RigerousEvolver class
  • Neural Network Based Optimal Control: Resilience to Missed Thrust Events for Long Duration Transfers
    • (pdf) A growing number of spacecraft are adopting new and more efficient forms of in-space propulsion. One shared characteristic of these high efficiency propulsion techniques is their limited thrust capabilities. This requires the spacecraft to thrust continuously for long periods of time, making them susceptible to potential missed thrust events. This work demonstrates how neural networks can autonomously correct for missed thrust events during a long duration low-thrust transfer trajectory. The research applies and tests the developed method to autonomously correct a Mars return trajectory. Additionally, methods for improving the response of neural networks to missed thrust events are presented and further investigated.
  • Ping Will for Thursday rather than Wednesday done – it seems to be a case where the first entry is being duplicated
  • Arpita’s presentation:
    • Information Extraction from unstructured text
    • logfile analysis
    • Why is the F1 score so low on open coding with human tagging?
    • Annotation generation slide is not clear

Phil 9.30.19

7:00 – 7:00 ASRC GOES

  • Dissertation
  • Evolutionary hyperparameter tuning. It’s working (60% is better than my efforts), but there’s a problem with the fitness values? Also, I want to save the best chromosome always

Training

  • Reread weapons paper
  • Meeting with Aaron M – going to try to rework the paper a bit for ICSE 2020. Deadline is Oct 29.
    • Some interesting discussion on how review systems should work
    • Also some thoughts about how military AI in a hyperkinetic environment would have to negotiate cease-fires, sue for peace, etc.

RANDOM.ORG offers true random numbers to anyone on the Internet. The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs. People use RANDOM.ORG for holding drawings, lotteries and sweepstakes, to drive online games, for scientific applications and for art and music. The service has existed since 1998 and was built by Dr Mads Haahr of the School of Computer Science and Statistics at Trinity College, Dublin in Ireland. Today, RANDOM.ORG is operated by Randomness and Integrity Services Ltd.

Phil 9.20.19

7:00 – 5:00 ASRC GOES

  • Maryland Anatomy Board (Juan Ortega – Acting Director Anatomical Services Division) Dept of vital records 410 764 2922 – Hopefully done on this
  • Dissertation
    • Rewrote abstract
    • Tweaked games and finished maps?
  • Create TimeSeriesML2 for TF2 – done, and everything is working!
    • Copy project
    • Rename
    • Point to new
  • Write pitch for ASRC funding NZ trip – done
  • Got my linux box working for vacation

Phil 9.19.19

ASRC AI Workshop 8:00 – 3:00

  • Maryland Anatomy Board Dept of vital records 410 764 2922
  • I remember this! Seeking New Physics
  • Dissertation? Some progress on the game section
  • Working on integrating the test DNN into the EO.
    • Need to add a few columns for the output that have the step and set membership.
    • Need to not run genomes that have already been run? Or maybe use an average? More output to spreadsheets for now, but I have to think about this more
    • Ok, I was expecting this:

path_error

Phil 9.18.19

7:00 – 5:00 ASRC GOES

  • Dept of vital records 410 764 2922 maryland.gov
  • Dissertation
  • EvolutionaryOptimizer
    • Work on getting code to work with perceptron model
    • Need to record accuracy and fitness to determine a fitness value. Something like -time*(1 – efficiency) – time. We want a short and accurate to win over long and accurate. I’ll need to play around in excel.
    • A new penalty-based wrapper fitness function for feature subset selection with evolutionary algorithms
      • Feature subset selection is an important preprocessing task for any real life data mining or pattern recognition problem. Evolutionary computational (EC) algorithms are popular as a search algorithm for feature subset selection. With the classification accuracy as the fitness function, the EC algorithms end up with feature subsets having considerably high recognition accuracy but the number of residual features also remain quite high. For high dimensional data, reduction of number of features is also very important to minimize computational cost of overall classification process. In this work, a wrapper fitness function composed of classification accuracy with another penalty term which penalizes for large number of features has been proposed. The proposed wrapper fitness function is used for feature subset evaluation and subsequent selection of optimal feature subset with several EC algorithms. The simulation experiments are done with several benchmark data sets having small to large number of features. The simulation results show that the proposed wrapper fitness function is efficient in reducing the number of features in the final selected feature subset without significant reduction of classification accuracy. The proposed fitness function has been shown to perform well for high-dimensional data sets with dimension up to 10,000.

Phil 9.12.19

7:00 – 4:30 ASRC GOES

  • FractalNet: Ultra-Deep Neural Networks without Residuals
    • We introduce a design strategy for neural network macro-architecture based on self-similarity. Repeated application of a simple expansion rule generates deep networks whose structural layouts are precisely truncated fractals. These networks contain interacting subpaths of different lengths, but do not include any pass-through or residual connections; every internal signal is transformed by a filter and nonlinearity before being seen by subsequent layers. In experiments, fractal networks match the excellent performance of standard residual networks on both CIFAR and ImageNet classification tasks, thereby demonstrating that residual representations may not be fundamental to the success of extremely deep convolutional neural networks. Rather, the key may be the ability to transition, during training, from effectively shallow to deep. We note similarities with student-teacher behavior and develop drop-path, a natural extension of dropout, to regularize co-adaptation of subpaths in fractal architectures. Such regularization allows extraction of high-performance fixed-depth subnetworks. Additionally, fractal networks exhibit an anytime property: shallow subnetworks provide a quick answer, while deeper subnetworks, with higher latency, provide a more accurate answer.
  • Structural diversity in social contagion
    • The concept of contagion has steadily expanded from its original grounding in epidemic disease to describe a vast array of processes that spread across networks, notably social phenomena such as fads, political opinions, the adoption of new technologies, and financial decisions. Traditional models of social contagion have been based on physical analogies with biological contagion, in which the probability that an individual is affected by the contagion grows monotonically with the size of his or her “contact neighborhood”—the number of affected individuals with whom he or she is in contact. Whereas this contact neighborhood hypothesis has formed the underpinning of essentially all current models, it has been challenging to evaluate it due to the difficulty in obtaining detailed data on individual network neighborhoods during the course of a large-scale contagion process. Here we study this question by analyzing the growth of Facebook, a rare example of a social process with genuinely global adoption. We find that the probability of contagion is tightly controlled by the number of connected components in an individual’s contact neighborhood, rather than by the actual size of the neighborhood. Surprisingly, once this “structural diversity” is controlled for, the size of the contact neighborhood is in fact generally a negative predictor of contagion. More broadly, our analysis shows how data at the size and resolution of the Facebook network make possible the identification of subtle structural signals that go undetected at smaller scales yet hold pivotal predictive roles for the outcomes of social processes.
    • Add this to the discussion section – done
  • Dissertation
    • Started on the theory section, then realized the background section didn’t set it up well. So worked on the background instead. I put in a good deal on how individuals and groups interact with the environment differently and how social interaction amplifies individual contribution through networking.
  • Quick meetings with Don and Aaron
  • Time prediction (sequence to sequence) with Keras perceptrons
  • This was surprisingly straightforward
    • There was some initial trickiness in getting the IDE to work with the TF2.0 RC0 package:
      import tensorflow as tf
      from tensorflow import keras
      from tensorflow_core.python.keras import layers

      The first coding step was to generate the data. In this case I’m building a numpy matrix that has ten variations on math.sin(), using our timeseriesML utils code. There is a loop that sets up the code to create a new frequency, which is sent off to get back a pandas Dataframe that in this case has 10 sequence rows with 100 samples. First, we set the global sequence_length:

      sequence_length = 100

      then we create the function that will build and concatenate our numpy matrices:

      def generate_train_test(num_functions, rows_per_function, noise=0.1) -> (np.ndarray, np.ndarray, np.ndarray):
          ff = FF.float_functions(rows_per_function, 2*sequence_length)
          npa = None
          for i in range(num_functions):
              mathstr = "math.sin(xx*{})".format(0.005*(i+1))
              #mathstr = "math.sin(xx)"
              df2 = ff.generateDataFrame(mathstr, noise=0.1)
              npa2 = df2.to_numpy()
              if npa is None:
                  npa = npa2
              else:
                  ta = np.append(npa, npa2, axis=0)
                  npa = ta
      
          split = np.hsplit(npa, 2)
          return npa, split[0], split[1]

      Now, we build the model. We’re using keras from the TF 2.0 RC0 build, so things look slightly different:

      model = tf.keras.Sequential()
      # Adds a densely-connected layer with 64 units to the model:
      model.add(layers.Dense(sequence_length, activation='relu', input_shape=(sequence_length,)))
      # Add another:
      model.add(layers.Dense(200, activation='relu'))
      # Add a softmax layer with 10 output units:
      model.add(layers.Dense(sequence_length))
      
      loss_func = tf.keras.losses.MeanSquaredError()
      opt_func = tf.keras.optimizers.Adam(0.01)
      model.compile(optimizer= opt_func,
                    loss=loss_func,
                    metrics=['accuracy'])

      We can now fit the model to the generated data:

      full_mat, train_mat, test_mat = generate_train_test(10, 10)
      
      model.fit(train_mat, test_mat, epochs=10, batch_size=2)

      There is noise in the data, so the accuracy is not bang on, but the loss is nice. We can see this better in the plots above, which were created using this function:

      def plot_mats(mat:np.ndarray, cluster_size:int, title:str, fig_num:int):
          plt.figure(fig_num)
      
          i = 0
          for row in mat:
              cstr = "C{}".format(int(i/cluster_size))
              plt.plot(row, color=cstr)
              i += 1
      
          plt.title(title)

      Which is called just before the program completes:

      if show_plots:
          plot_mats(full_mat, 10, "Full Data", 1)
          plot_mats(train_mat, 10, "Input Vector", 2)
          plot_mats(test_mat, 10, "Output Vector", 3)
          plot_mats(predict_mat, 10, "Predict", 4)
          plt.show()
    • That’s it! Full listing below:
import tensorflow as tf
from tensorflow import keras
from tensorflow_core.python.keras import layers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import timeseriesML.generators.float_functions as FF


sequence_length = 100

def generate_train_test(num_functions, rows_per_function, noise=0.1) -> (np.ndarray, np.ndarray, np.ndarray):
    ff = FF.float_functions(rows_per_function, 2*sequence_length)
    npa = None
    for i in range(num_functions):
        mathstr = "math.sin(xx*{})".format(0.005*(i+1))
        #mathstr = "math.sin(xx)"
        df2 = ff.generateDataFrame(mathstr, noise=0.1)
        npa2 = df2.to_numpy()
        if npa is None:
            npa = npa2
        else:
            ta = np.append(npa, npa2, axis=0)
            npa = ta

    split = np.hsplit(npa, 2)
    return npa, split[0], split[1]

def plot_mats(mat:np.ndarray, cluster_size:int, title:str, fig_num:int):
    plt.figure(fig_num)

    i = 0
    for row in mat:
        cstr = "C{}".format(int(i/cluster_size))
        plt.plot(row, color=cstr)
        i += 1

    plt.title(title)

model = tf.keras.Sequential()
# Adds a densely-connected layer with 64 units to the model:
model.add(layers.Dense(sequence_length, activation='relu', input_shape=(sequence_length,)))
# Add another:
model.add(layers.Dense(200, activation='relu'))
# Add a softmax layer with 10 output units:
model.add(layers.Dense(sequence_length))

loss_func = tf.keras.losses.MeanSquaredError()
opt_func = tf.keras.optimizers.Adam(0.01)
model.compile(optimizer= opt_func,
              loss=loss_func,
              metrics=['accuracy'])

full_mat, train_mat, test_mat = generate_train_test(10, 10)

model.fit(train_mat, test_mat, epochs=10, batch_size=2)
model.evaluate(train_mat, test_mat)

# test against freshly generated data
full_mat, train_mat, test_mat = generate_train_test(10, 10)
predict_mat = model.predict(train_mat)

show_plots = True
if show_plots:
    plot_mats(full_mat, 10, "Full Data", 1)
    plot_mats(train_mat, 10, "Input Vector", 2)
    plot_mats(test_mat, 10, "Output Vector", 3)
    plot_mats(predict_mat, 10, "Predict", 4)
    plt.show()



Phil 9.11 . 19

be7c6582-044a-4a19-aa8b-de388b4a4f83-cincpt_09-11-2016_enquirer_1_b001__2016_09_10_img_xxx_world_trade_11_1_1_9kfm0g4g_l880019336_img_xxx_world_trade_11_1_1_9kfm0g4g

7:00 – 4:00 ASRC GOES

  • Model:DLG3501W SKU:6181264
  • Maryland Anatomy Board Dept of vital records 410 764 2922
  • arxiv-vanity.com  arXiv Vanity renders academic papers from arXiv as responsive web pages so you don’t have to squint at a PDF.
    • It works ok. Tables and cation alignment are a problem for now, but it sounds great for phones
  • DeepPrivacy: A Generative Adversarial Network for Face Anonymization
    • We propose a novel architecture which is able to automatically anonymize faces in images while retaining the original data distribution. We ensure total anonymization of all faces in an image by generating images exclusively on privacy-safe information. Our model is based on a conditional generative adversarial network, generating images considering the original pose and image background. The conditional information enables us to generate highly realistic faces with a seamless transition between the generated face and the existing background. Furthermore, we introduce a diverse dataset of human faces, including unconventional poses, occluded faces, and a vast variability in backgrounds. Finally, we present experimental results reflecting the capability of our model to anonymize images while preserving the data distribution, making the data suitable for further training of deep learning models. As far as we know, no other solution has been proposed that guarantees the anonymization of faces while generating realistic images.
  • Introducing a Conditional Transformer Language Model for Controllable Generation
    • CTRL is a 1.6 billion-parameter language model with powerful and controllable artificial text generation that can predict which subset of the training data most influenced a generated text sequence. It provides a potential method for analyzing large amounts of generated text by identifying the most influential source of training data in the model. Trained with over 50 different control codes, the CTRL model allows for better human-AI interaction because users can control the generated content and style of the text, as well as train it for multitask language generation. Finally, it can be used to improve other natural language processing (NLP) applications either through fine-tuning for a specific task or through transfer of representations that the model has learned.
  • Dissertation
    • Started to put together my Linux laptop for vacation writing
    • More SIH section
  • Verify that timeseriesML can be used as a library
  • Perceptron curve prediction
  • AI/ML status meetings
  • Helped Vadim with some python issues

Phil 9.10.19

ASRC GOES 7:00 – 5:30

  • Got a mention in an article on Albawaba – When the Only Option is ‘Not to Play’? Autonomous Weapons Systems Debated in Geneva 
  • Dissertation – more SIH
  • Just saw this: On Extractive and Abstractive Neural Document Summarization with Transformer Language Models
    • We present a method to produce abstractive summaries of long documents that exceed several thousand words via neural abstractive summarization. We perform a simple extractive step before generating a summary, which is then used to condition the transformer language model on relevant information before being tasked with generating a summary. We show that this extractive step significantly improves summarization results. We also show that this approach produces more abstractive summaries compared to prior work that employs a copy mechanism while still achieving higher rouge scores. Note: The abstract above was not written by the authors, it was generated by one of the models presented in this paper.
  • Working on packaging timeseriesML. I think it’s working!

TimeSeriesML

  • I’ll try it out when I get back after lunch
  • Meeting with Vadim
    • Showed him around and provided svn access
  • Model:DLG3501W SKU:6181264