23 February 2018
I’m currently trying to finish up a little side project I’ve kept putting off that involves data from my car (2015 Chevrolet Volt).
I pull data from voltstats.net with a selenium script, the script downloads the two CSV files and then puts them in a directory for Jupyter Notebook to consume for analysis. Once the Jupyter Notebook is run, a dataset is created and visualized with Reflect.io.
I ended up deconstructing the shell script I had perviously into a makefile:
volt-data: # Run script to get data, add CSVs to data folder python acquireData.py jupyter: # Start Jupyter notebook docker-compose -f docker-compose.yml -f docker-compose-local-dev.yml run --service-ports volt-metrics jupyter notebook --no-browser --ip=* --allow-root
So now instead of running a shell script to run both, I can just use the two separate commands to do what I need.
With most of the data anaylsis already done, the next step was to get the Jupyter Notebook output to an online space.
I initially wanted to use google sheets, but after looking into it Boto3 does what I needed with much less work than trying to send a csv to google sheets. I already have an AWS instance so the setup was pretty much already done. I wrote the following code to get the final output from the Jupyter Notebook to be shipped to S3:
import boto3 boto3.setup_default_session(region_name='us-east-1') s3 = boto3.resource('s3') # Upload a CSV to S3 data = open('output/voltstats.csv', 'rb') s3.Bucket('volt-metrics').put_object(Key='voltstats.csv', Body=data)
This isn’t a copy + paste method if you’re trying to accomplish the same thing, you’ll need your own AWS instance as well as credentials to talk to the S3 bucket you choose.
This isn’t really development but it’s the end product that all this work was for. I use reflect.io’s GUI to format the dataset into visualizations to be displayed on my website.
Here’s what I have so far:
This little bit of code (provided by reflect.io) is great for embeddeding.
Before using a Makefile and Boto3, I was using a shell script to start the Jupyter Notebook as well as the python script to pull the data I need, I wanted this to be a little more organized instead of constantly running the data aquisition script every time I started the Jupyter Notebook.
Ideally, I’d like to add the python script that pulls data from voltstats.net into docker so the process isn’t reliant on local python modules (like selenium and chromedriver).
The only problem is I tried doing this in the past and it seems that selenium / chromedriver needs an instance of google chrome, even the headless version didn’t work.
If anyone has been able to run selenium / chromedriver in a docker container I’d love to see how you did it!