Python & Stata Workshop – German Stata Conference – Frankfurt | 4-5 June 2020

In the case of natural languages you swear in your mother tongue, write papers in English and when in Rome it helps to speak a little bit of Italian. Being a polyglot promotes communication, understanding and expression but it also sometimes increase the probability of confusion. One thing is for certain: in a globalized world for most of us our mother tongue will not suffice.

In the case of programming languages it is very much the same. The workshop is meant for those whose mother tongue is Stata but want to explore the added value of learning python or the reverse.

Besides an introduction to Python the workshop will demonstrate how to use the Stata SFI api to embed python code in a stata program and pass data between stata and python. Examples of when such an embedding is advantageous will be discussed and demonstrated. These include: text mining (python regular expressions), web scraping (programming a web browser in python), using web APIs to get data (e.g. Google Trends, Yahoo finance etc), speeding up with python multiprocessing (e.g. parallelize a for loop), unsupervised learning (e.g. python implementation of Luvain clustering algorithm) etc.

If you want to join here is the conference web page with a registration link and if you do register and have any extra wishes tweet them to me and I will do my best to include them.

The course will be a series of live demonstrations using Jupyter notebooks and the course material will be shared with all participants. For active participation you will need a Laptop (hopefully we will have local wifi) with Stata16 and Anaconda3 (with Python 3.7 or so). If you want to run the Stata Jupyter notebooks you need to have installed the Stata Kernel for Jupyter (alternative you copy paste the code from Jupyter notebooks to Stata16.

PS: Two modules written for the course use the Stata16 sfi to import (some of the) functionality of python modules to Stata. If you have Stata 16 try:

  • Stata command to get stock prices from Yahoo finance

. ssc install stockquote, replace and then run it as follows:

. stockquote AAPL, start_date(2020-01-01) end_date(2020-01-30)

to get 30 days worth of Apple stock price information. The module wraps itself around Python’s yfinance module and uses the following stata/python classes: sfi.Macro and sfi.Data, sfi. Datetime.

  • Stata command to find communities in weighted networks:

. ssc install louvain

. man louvain

On the help page follow the example by clicking on the commands. You will cluster a weighted graph of all numbers from 1 to 10 where two numbers are connected iff they are not coprime. When they are connected the weight is their gcd minus one. It wraps around the python modules python-louvain and uses stata frames and the stata sfi classes: sfi.Data, sfi.Macro and sfi.Frame.

Toll Index January 2020

Annual January to January changes of inbound or outbound lorries (after accounting for working day differences) are rarely non-positive. The drop of 2.1% for inbound and 1.8% for outbound traffic in the first month of 2020 should therefore be seen as a rare and hence significant fact.

Starting in July 2018 the BAG – Bundesamt für Güterverkehr introduced yet another policy change which affected how lorries pay tolls within the MAUT system as well as the data that come out of this process which are used for computing the Toll Index. The change expanded the network of roads in which toll is due by adding all bundesstraßen to it.

While in the long run this is bound to make the Toll Index more accurate in these past twelve months it made it useless for nowcasting. Moreover the BAG had difficulty producing the numbers timely for about year. After July 2019 we can report year on year changes for each month (with a missing value in 2018 for all months from July to December and a missing value in 2019 for all months from January to June.

The Toll Index was first proposed in IZA DP5522 which was published in the Journal of Forecasting. It has been widely covered in national and international media (selection):

The German statistical office, in cooperation with the Bundesamt für Güterverkehr,  has taken the MAUT data in its portfolio of data products and their efforts can be found here. The Destatis document describing the data is here and here is their publication calendar for 2019.