Bild

Generating Big Spatial Data on Firm Innovation Activity from Text- Mined Firm Websites

    Jan Kinne, Bernd Resch

GI_Forum 2018, Volume 6, Issue 1, pp. 82-89, 2018/06/22

Journal for Geographic Information Science

doi: 10.1553/giscience2018_01_s82

doi: 10.1553/giscience2018_01_s82


PDF
X
BibTEX-Export:

X
EndNote/Zotero-Export:

X
RIS-Export:

X 
Researchgate-Export (COinS)

Permanent QR-Code

doi:10.1553/giscience2018_01_s82



doi:10.1553/giscience2018_01_s82

Abstract

Innovation is one of the major drivers of economic growth, where spatial processes of knowledge spillover play a vital role. Current practices in assessing firms’ innovation activity, including patent analysis and questionnaires, suffer from severe limitations. In this paper, we propose a novel approach to estimate firms’ innovation activity based on the texts on their websites. We use an automated web-scraper to harvest text from the websites, then extract semantic topics in a self-learning, generative topic-modelling approach, and finally analyse these topics using an Artificial Neural Networks (ANN) method to assess each firm’s level of innovation. This procedure results in a large-scale dataset that will be used for further spatial economic analysis of the distribution of innovative firms and the processes that drive the development of innovation in firms.

Keywords: firm location, microgeography, innovation, web scraping, Big Spatial Data, text mining, topic modelling, neural networks