I heard about the Cloud challenge some time ago and liked the idea-- of building a practical project to show you Cloud/AWS knowledge. The challenge just builds a simple online resume, which is fine as the challenge is aimed at complete beginners.
I decided to go one step ahead and build more complicated projects. Till now I have 2 projects, the details of which are below. I've also put all the code on Github, so you can follow along if you want.
(And I've got 2-3 more projects planned, keep watching this space!)
- Project 1: Sentiment Duck: Compare Reddit Sentiment with Share Price
- Project 2: Rudeshell: A shell that insults you when you try to run commands
- Project 3: HorrorScope: Astrology for Developers
Project 1: Sentiment Duck: Compare Reddit Sentiment with Share Price
Ok, I'll admit this project is a little too ambitious, and I almost quit. I now realise I should have started with something easy. But I'm glad I did, as this is something I've wanted to build for years, even since I saw the sentdex.com site.
In this project, I get data from 3 UK subreddits - AskUk, UnitedKingdom and UkPolitics. I chose these because they are the biggest in the UK, having roughly 3.5 million subscribers (in total) while still having a balance of positive/negative views.
I then look at the top 10 posts daily and calculate the average sentiment for each. Sentiment Analysis is a machine learning technique that tries to guess if the emotion behind the text is positive or negative.
Finally, we compare this sentiment to the FTSE100, which is the list of top 100 companies listed on UK stock exchange. The goal is to check: IS public sentiment (or Reddit sentiment) related to short term stock movement?
Is it accurate? shrugs Like all machine learning, the answer is it depends. This is just a fun project, so I particularly don't care if it isn't 100% accurate. (And I have a legal notice: Don't take financial advice from people who spend time on Reddit! If we had brains, we would be somewhere posh, like Y-Combinator News).
The Technology behind the site
Since the project is complex, the backend is also complex. The code is about 90-95% Python, with a bit of Go. Funnily enough, I wanted Go for the backend (as it is supposed to be faster). Instead, I ended up with Python doing the backend "resource-intensive" part, while Go is being used for displaying the webpage (an alternative to Flask).
The main steps on the backend are:
- Get data from Reddit -see this Reddit API tutorial
- Carry out sentiment analysis see this tutorial
- Store the data in a database (see below for which one)
- Display the data in a web browser
The 1st 3 steps run as a background job
- I used Terraform to provision AWS, and Ansible to install the tools. Also looked at Packer, but couldn't get it working with Ansible. Already knew Ansible, but this was a good chance to learn about Terraform
- For the DB I used Dynamodb, just because iT iS a CoOl NoSql dAtaBAsE. Did I like it? Well, I had some reservations, mainly not enough documentation (or good docs). But that is a blog for another day
- I was planning to use a Lambda function(with Cloudwatch) to update the data in the background-- but I found that AWS uses a different Linux flavour for Lambdas, and I have to use libraries compiled for that. Too much hassle, I just stuck to a cron job (it works and is cheap!)
- There is a Github Action to automatically deploy new code on a git push
- I also got a chance to play with caddy to get an https site-- and was amazed at how easy it was!
Added lambda function to get the stats from the site daily. On weekdays it emails me the summary, on weekends, also sends a text message.
Found lambda function a bit iffy, not easy to use, seeing how much AWS is pushing them. But still learning up on them, trying to use libraries like Serverless/Sam/Zappa to write lambda code. Still a WIP though.
Project 2: Rudeshell: A shell that insults you when you try to run commands
Update 21-7-2022: After all the hassle with AWS / containers, I moved this to Google Cloud run-- where it ran out of the box without any issues!
I've had more problems setting up the DNS from AS to Google Cloud-- looks like there is a subtle bug that won't let me link from Route53 to Google.
So the code isn't written in Javscript but Go. I found an awesome tool gotty that turns your command-line utility into a web app.
For security-- I make sure the Go code doesn't execute anything the user enters, and am also running it in a docker container just to be sure. Could do more like use podman for the containers(so they don't run in elevated mode), but will stick to docker for now. (Im sure I've forgotten something and will be insulted by an online security expert for groan)
Note: This is the old way, Ive moved to GCP
For this project, I wanted to use Kubernetes, but found that it can become stupidly expensive, at least for hobby sites. I tried the light K3s, but found it unstable. So even though I had a basic kubernetes cluster running, I killed it and went back to simple docker compose. It works!
Everything else is same as above-- Terraform and Ansible to setup the system, Github actions to deploy the code on changes
Caddy again for ssh, this time in docker.
Project 3: HorrorScope: Astrology for Developers
To learn: Serverless / Lambda
I started this to learn about serverless. Built using Aws Lambda & Serverless framework.
The code is written in Python flask, with the static content hosted on S3.
The hardest part was figuring out how to return HTML, as all serverless examples assume Json.