Programmers: Python pandas and matplotlib or MySQL?

mardi 24 mars 2015

Python pandas and matplotlib or MySQL?

Pandas is a great Python library that provides easy-to-use data structures and data analysis tools. I have been using it to aggregate, and analyze daily weather observation data which I get from an FTP site. These data are stored in comma separated files. Here is a fragment of such a file to give you an idea:

enter image description here

Each file contains hourly observations of weather characteristics for an entire year for a particular weather station. There are thousands of stations out there and around 100 years. So, hundreds of thousands of files in total. What I would do with these files is, first batch download them from the FTP site with Python, and then I would read them in Python as Pandas dataframes, merge, group, calculate annual means, and pivot them. Afterward, the script would visualize the weather values over time (i.e. years) for each station through Matplotlib.

I enjoy doing these with Pandas and I am happy with the solution. I have few experience with MySQL, and when I told a friend who knows MySQL, but not Pandas about what I am doing, he said you don't have to do the analysis and the visualizations using Python scripting. So, he suggested me to store the data in a MySQL database, and then I can do the merging, grouping, averaging, pivoting and the visualizations using MySQL tools. So, I am confused about which solution to look after. Are there significant benefits in one or the other, or are they simply two alternative workarounds which both do the job well?

Programmers

mardi 24 mars 2015

Python pandas and matplotlib or MySQL?

Aucun commentaire:

Enregistrer un commentaire