Pandas is a great Python library that provides easy-to-use data structures and data analysis tools. I have been using it to aggregate, and analyze daily weather observation data which I get from an FTP site. These data are stored in comma separated files. Here is a fragment of such a file to give you an idea:
Each file contains hourly observations of weather characteristics for an entire year for a particular weather station. There are thousands of stations out there and around 100 years. So, hundreds of thousands of files in total. What I would do with these files is, first batch download them from the FTP site with Python, and then I would read them in Python as Pandas dataframes, merge, group, calculate annual means, and pivot them. Afterward, the script would visualize the weather values over time (i.e. years) for each station through Matplotlib.
I enjoy doing these with Pandas and I am happy with the solution. I have few experience with MySQL, and when I told a friend who knows MySQL, but not Pandas about what I am doing, he said you don't have to do the analysis and the visualizations using Python scripting. So, he suggested me to store the data in a MySQL database, and then I can do the merging, grouping, averaging, pivoting and the visualizations using MySQL tools. So, I am confused about which solution to look after. Are there significant benefits in one or the other, or are they simply two alternative workarounds which both do the job well?
Aucun commentaire:
Enregistrer un commentaire