Python & Stata Workshop – German Stata Conference – Frankfurt | 4-5 June 2020

In the case of natural languages you swear in your mother tongue, write papers in English and when in Rome it helps to speak a little bit of Italian. Being a polyglot promotes communication, understanding and expression but it also sometimes increase the probability of confusion. One thing is for certain: in a globalized world for most of us our mother tongue will not suffice.

In the case of programming languages it is very much the same. The workshop is meant for those whose mother tongue is Stata but want to explore the added value of learning python or the reverse.

Besides an introduction to Python the workshop will demonstrate how to use the Stata SFI api to embed python code in a stata program and pass data between stata and python. Examples of when such an embedding is advantageous will be discussed and demonstrated. These include: text mining (python regular expressions), web scraping (programming a web browser in python), using web APIs to get data (e.g. Google Trends, Yahoo finance etc), speeding up with python multiprocessing (e.g. parallelize a for loop), unsupervised learning (e.g. python implementation of Luvain clustering algorithm) etc.

If you want to join here is the conference web page with a registration link and if you do register and have any extra wishes tweet them to me and I will do my best to include them.

The course will be a series of live demonstrations using Jupyter notebooks and the course material will be shared with all participants. For active participation you will need a Laptop (hopefully we will have local wifi) with Stata16 and Anaconda3 (with Python 3.7 or so). If you want to run the Stata Jupyter notebooks you need to have installed the Stata Kernel for Jupyter (alternative you copy paste the code from Jupyter notebooks to Stata16.

PS: Two modules written for the course use the Stata16 sfi to import (some of the) functionality of python modules to Stata. If you have Stata 16 try:

  • Stata command to get stock prices from Yahoo finance

. ssc install stockquote, replace and then run it as follows:

. stockquote AAPL, start_date(2020-01-01) end_date(2020-01-30)

to get 30 days worth of Apple stock price information. The module wraps itself around Python’s yfinance module and uses the following stata/python classes: sfi.Macro and sfi.Data, sfi. Datetime.

  • Stata command to find communities in weighted networks:

. ssc install louvain

. man louvain

On the help page follow the example by clicking on the commands. You will cluster a weighted graph of all numbers from 1 to 10 where two numbers are connected iff they are not coprime. When they are connected the weight is their gcd minus one. It wraps around the python modules python-louvain and uses stata frames and the stata sfi classes: sfi.Data, sfi.Macro and sfi.Frame.

Toll Index January 2020

Annual January to January changes of inbound or outbound lorries (after accounting for working day differences) are rarely non-positive. The drop of 2.1% for inbound and 1.8% for outbound traffic in the first month of 2020 should therefore be seen as a rare and hence significant fact.

Starting in July 2018 the BAG – Bundesamt für Güterverkehr introduced yet another policy change which affected how lorries pay tolls within the MAUT system as well as the data that come out of this process which are used for computing the Toll Index. The change expanded the network of roads in which toll is due by adding all bundesstraßen to it.

While in the long run this is bound to make the Toll Index more accurate in these past twelve months it made it useless for nowcasting. Moreover the BAG had difficulty producing the numbers timely for about year. After July 2019 we can report year on year changes for each month (with a missing value in 2018 for all months from July to December and a missing value in 2019 for all months from January to June.

The Toll Index was first proposed in IZA DP5522 which was published in the Journal of Forecasting. It has been widely covered in national and international media (selection):

The German statistical office, in cooperation with the Bundesamt für Güterverkehr,  has taken the MAUT data in its portfolio of data products and their efforts can be found here. The Destatis document describing the data is here and here is their publication calendar for 2019.

Toll Index December 2019

Starting in July 2018 the BAG – Bundesamt für Güterverkehr introduced yet another policy change which affected how lorries pay tolls within the MAUT system as well as the data that come out of this process which are used for computing the Toll Index. The change expanded the network of roads in which toll is due by adding all bundesstraßen to it.

While in the long run this is bound to make the Toll Index more accurate in these past twelve months it made it useless for nowcasting. Moreover the BAG had difficulty producing the numbers timely for about year. Since July 2019 each month is now comparable to the value of the same month in 2018. Of course we have a missing value for 2018 since it is not comparable to 2017 due to the policy change.

The Toll Index was first proposed in IZA DP5522 which was published in the Journal of Forecasting. It has been widely covered in national and international media (selection):

The German statistical office, in cooperation with the Bundesamt für Güterverkehr,  has taken the MAUT data in its portfolio of data products and their efforts can be found here. The Destatis document describing the data is here and here is their publication calendar for 2019.

Toll Index July-November 2019

Monthly German border crossing activity by lorries has stalled on a year on year basis (accounting for working day differences) from July to November.

Starting in July 2018 the BAG – Bundesamt für Güterverkehr introduced yet another policy change which affected how lorries pay tolls within the MAUT system as well as the data that come out of this process which are used for computing the Toll Index. The change expanded the network of roads in which toll is due by adding all bundesstraßen to it.

While in the long run this is bound to make the Toll Index more accurate in these past twelve months it made it useless for nowcasting. Moreover the BAG had difficulty producing the numbers timely for about year. Since July 2019 each month is now comparable to the value of the same month in 2018. Of course we have a missing value for 2018 since it is not comparable to 2017 due to the policy change.

The Toll Index was first proposed in IZA DP5522 which was published in the Journal of Forecasting. It has been widely covered in national and international media (selection):

The German statistical office, in cooperation with the Bundesamt für Güterverkehr,  has taken the MAUT data in its portfolio of data products and their efforts can be found here. The Destatis document describing the data is here and here is their publication calendar for 2019.

Toll Index November 2019 – stalled

Starting in July 2018 the BAG – Bundesamt für Güterverkehr introduced yet another policy change which affected how lorries pay tolls within the MAUT system as well as the data that come out of this process which are used for computing the Toll Index. The change expanded the network of roads in which toll is due by adding all bundesstraßen to it.

While in the long run this is bound to make the Toll Index more accurate in these past twelve months it made it useless for nowcasting. Moreover the BAG had difficulty producing the numbers timely for about year. Since July 2019 each month is now comparable to the value of the same month in 2018. Of course we have a missing value for 2018 since it is not comparable to 2017 due to the policy change.

The Toll Index was first proposed in IZA DP5522 which was published in the Journal of Forecasting. It has been widely covered in national and international media (selection):

The German statistical office, in cooperation with the Bundesamt für Güterverkehr,  has taken the MAUT data in its portfolio of data products and their efforts can be found here. The Destatis document describing the data is here and here is their publication calendar for 2019.

UK elections 2019 – Odds, Polls, Google buzz

Based on Google Trends data  the Conservative party will beat the Labour party by about 7 percentage points in terms of the popular vote while the LibDems will trail the Conservatives by about 35 points.

The pollsters have the Conservatives leading Labour by anywhere between 6 and 15 percentage points in 14 polls in December with the average prediction at 9.5 pct points.

The bookies have the odds at 1/33 for a Conservative victory and 2/5 for an overall Conservative majority.

In Google search the footprints of Labour, Conservative and LibDems in the last seven days average to 42, 30 and 20 points respectively.

Labour always leads the Conservatives in Google buzz most likely due to demographics

 

We can still use the elections of 2015 and 2017 to take out party composition fixed effects from the Google data. When we do so we project that Labour will fall 5.4 to 8.5 percentage points behind the Conservatives in the popular vote while the LibDems will trail by 33.3 to 36.4 points.

Toll Index October 2019

Starting in July 2018 the BAG – Bundesamt für Güterverkeht introduced yet another policy change which affected how lorries pay tolls within the MAUT system as well as the data the come out of this process which is used for computing the Toll Index. The change expanded the network of roads in which toll is due by adding all bundesstraßen to it.

While in the long run this is bound to make the Toll Index more accurate in these past twelve months it made it useless for nowcasting. Even the BAG had difficulty producing the numbers timely. September 2019 is now comparable to September 2018 values. Of course we have a missing value for September 2018 since it is not comparable to “September 2017” due to the policy change.

The Toll Index was first proposed in IZA DP5522 which ws published in the Journal of Forecasting. It has been widely covered in national and international media (selection):

The German statistical office, in cooperation with the Bundesamt für Güterverkehr,  has taken the MAUT data in its portfolio of data products and their efforts can be found here. The Destatis document describing the data is here and here is their publication calendar for 2019.