Breakthrough Bliss

responsive image blog header

Posted June 6, 2018


This post came to mind when I thought I was on the verge of getting my MongoDB-powered scraper to work. Ten days and infinite lines of debugging later, I finally have a working app.

I had a Grimm's Fairy Tales anthology when I was a kid — I can still remember its gold-leaf lettering and worn, fat spine that made it easy to spot on my bookshelf. I was partial to the gruesome stories. Trolls, evil fairies, tin soldiers falling on the stove and melting in pursuit of true love ... "Hansel and Gretel" was especially enticing. The thought of traipsing through the woods and thinking you'd been smart by leaving a trail only to realize your breadcrumb path was actually a tasty snack for little forest creatures that came along and devoured it behind you ... yikes!

I'm digressing a little, but I had my own trail-of-breadcrumbs moment while I worked on my scraper. I chose a site that was really hard to extract data from a British newspaper, (guardian.co.uk), and stubbornly refused to find an easier one. I was looking for headline, summary, and URL classes; The Guardian had "kickers," "subheads," and a nesting structure that made all the .parents and .children look like a (insert huge family here) reunion. I tried so many variations creating my Mongo BSON object that I quickly realized I was just going around in circles. Like Hansel and Gretel before me, I was on an ill-fated path.

Finally, through a combination of determination, Googling, trial and error, and possibly some luck, I got an object when I console.logged my three variables. It took another couple of days of tweaking, squinting at code looking for misplaced plurals, a few rounds with Postman and some tips from Cameron and Byron (my coding program's stellar TAs) before I got the object to show up on my scraper home page. I admit it. I squealed when I finally saw those headlines on my URL instead of the Guardian's. That sounds really absurd when I write it, but anyone who's created a scraper can understand the sense of accomplishment you feel for doing nothing more than moving headlines from one site to another.

My Guardian Scraper App isn't perfect. But the Mongo database is storing, the app is working on Heroku, and there's a cute little Bootstrap card with its screen grab, description, and links in my portfolio. That's enough for me to consider this a bonafide breakthrough. The sense of accomplishment — brief though it may be — will propel me at least part of the way into the next assignment: a React matching game. Get the breadcrumbs ready. It's gonna be a long journey.