Workshop “Stata meets Python”

At this year’s German Stata Conference (June 15-16, Humboldt University, Berlin) I will be teaching a workshop titled “Stata meets Python”.

What is it about?

Attendants will learn how to use the Python integration facilities that Stata has made available in order to embed Python code in Stata (since version 16) or Stata code in Python (since version 17). Stata calls this integration collectively PyStata. It includes among other ingredients a pystata python package written by Stata (which enables embedding of Stata code in Python) as well as the sfi module – Stata Function Interface (SFI) which can be used to access Stata’s current dataset, frames, macros, scalars, matrices, value lables, global Mata matrices etc.

Why should you attend?

Both languages have large lively communities which create user written programs with little overlap. The integration doubles the amount of ammunition you can throw at data problems.

The Program can be found on Stata’s own website as well as on the official conference page.


Python & Stata Workshop – German Stata Conference – Frankfurt | 4-5 June 2020

In the case of natural languages you swear in your mother tongue, write papers in English and when in Rome it helps to speak a little bit of Italian. Being a polyglot promotes communication, understanding and expression but it also sometimes increase the probability of confusion. One thing is for certain: in a globalized world for most of us our mother tongue will not suffice.

In the case of programming languages it is very much the same. The workshop is meant for those whose mother tongue is Stata but want to explore the added value of learning python or the reverse.

Besides an introduction to Python the workshop will demonstrate how to use the Stata SFI api to embed python code in a stata program and pass data between stata and python. Examples of when such an embedding is advantageous will be discussed and demonstrated. These include: text mining (python regular expressions), web scraping (programming a web browser in python), using web APIs to get data (e.g. Google Trends, Yahoo finance etc), speeding up with python multiprocessing (e.g. parallelize a for loop), unsupervised learning (e.g. python implementation of Luvain clustering algorithm) etc.

If you want to join here is the conference web page with a registration link and if you do register and have any extra wishes tweet them to me and I will do my best to include them.

The course will be a series of live demonstrations using Jupyter notebooks and the course material will be shared with all participants. For active participation you will need a Laptop (hopefully we will have local wifi) with Stata16 and Anaconda3 (with Python 3.7 or so). If you want to run the Stata Jupyter notebooks you need to have installed the Stata Kernel for Jupyter (alternative you copy paste the code from Jupyter notebooks to Stata16.

PS: Two modules written for the course use the Stata16 sfi to import (some of the) functionality of python modules to Stata. If you have Stata 16 try:

  • Stata command to get stock prices from Yahoo finance

. ssc install stockquote, replace and then run it as follows:

. stockquote AAPL, start_date(2020-01-01) end_date(2020-01-30)

to get 30 days worth of Apple stock price information. The module wraps itself around Python’s yfinance module and uses the following stata/python classes: sfi.Macro and sfi.Data, sfi. Datetime.

  • Stata command to find communities in weighted networks:

. ssc install louvain

. man louvain

On the help page follow the example by clicking on the commands. You will cluster a weighted graph of all numbers from 1 to 10 where two numbers are connected iff they are not coprime. When they are connected the weight is their gcd minus one. It wraps around the python modules python-louvain and uses stata frames and the stata sfi classes: sfi.Data, sfi.Macro and sfi.Frame.