• Home
  • Startup
  • Blog
  • Kontakt

Education

12
Dez

Bachelor Thesis – Executive Summary

von David Aigner | Tags: Bachelor, R, Sentimentanalyse, statistical computing, Thesis

Bachelor Thesis: “Sentimentanalyse von Jahresberichten S&P 500 gelisteter Unternehmen”

Executive Summary
In the course of this thesis an attempt was made in order to determine the attitude of
authors of several thousand text documents as either positive or negative. Sentiments
obtained from this evaluation have been compared with the S&P 500 index. The thesis
aimed at discovering possible correlations between the stock exchange and certain terms
aggregated to bullish or bearish indices.

This thesis starts with an overview of the software used. Subsequently, the author deals
with his approach to obtain the required annual reports of the companies listed on the
S&P 500 for the analysis. In this chapter, the author points out that every single document
was collected manually. Furthermore, it should be noted that the vast majority
of businesses offer their reports as pdf-files on their websites in an easily accessible and
clearly way.

The next chapter takes a closer look on the steps involved in text mining. The first
step was the preparation of the raw data, especially adapting document metadata. The
latter can be usually defined as data about data. For example, an annual report contains
meta information of its author, title and date of creation. Afterwards these documents
were imported in the computing environment. In this context the open-source-software
R for statistical computing and graphics and its extension tm was used. By using this
software the author experienced that the provided framework can import and process
large data sets in a user-friendly way by applying simple codes. The third step contained
several preprocessing steps such as stopword and whitespace removal to tide up the
texts. Finally, these texts were transformed to a so-called document-term-matrix holding
frequencies of distinct terms in order to be able to apply techniques from statistics and
data mining.

As expected the terms contained in our data set suggest a strong business focus. Some
frequent terms are ‘compani’, ‘asset’ and ‘cost’. Moreover, time series created, confirm
the authors assumptions made that an increase in randomly selected bullish terms (e.g.
‘increas’, ‘respect’) is usually followed by a rise in prices and that an increase in randomly
selected bearish terms (e.g. ‘credit’, ‘loss’) is usually followed by a decline in prices of
the S&P 500 index.

Finally, the appendix contains detailed information about the composition of the bullish-term-
index and the bearish-term-index.

(c) 2010 David Aigner

A downloadable pdf-file is going to be offered soon.

Bookmark and Share


Einen Kommentar schreiben

Sie müssen angemeldet sein um einen Kommentar zu schreiben.

  • Archiv
    • April 2011
    • Dezember 2010
    • Juli 2010
    • Juni 2010
    • Mai 2010
  • Kategorien
    • Allgemein
    • Career
    • Education
    • Mediathek
    • Neuroscience
    • Trading Psychology
    • Trading System
  • Tags
    Angst bbc Career cnbc coyle Discipline DOW Education Emotionen Everett Klipp fehlerangst genius Greatness Greek Greek Letter greek letters Hirnforschung Internship JP Leadership lex van dam Mediathek Meltdown million dollar traders Morgan Neuroscience obsessed with film Optionen OWF Performer Practice recap reportage Secrets Selloff Stocks success talent code Tipps trade management Trading Trading Psychology Tricks Wall Street Work
  • Blogroll
    • Bit.ly – Shorten, share and track your links
    • SMB Capital – Trading Blog
    • szeneportal.com
  • Navigation  
  • Home
  • Startup
  • Blog
  • Kontakt
  • Impressum
  • Social Networks  
  • Facebook
  • Xing
  • Twitter
  • RSS Feeds  
  • Blogeinträge
  • Kommentare
  • Twitter
  • © AI – Invest Alle Rechte vorbehalten.