Big Data Analytics for Healthcare – Experience

One of the cute course slides by Prof. Jimeng Sun

I completed Big Data Analytics for Healthcare (CSE 6250) as part of Spring 2019 of my OMSCS program – my last semester! It was ranked as the hardest course in the program as per OMSCentral, a befitting finale to my 3.5 year long OMSCS journey.

As some might think, I did not choose to take the course because I am a masochist. Rather, the course seemed to cover a lot of ground that is not explored by any other courses in the program. In particular, I was interested in learning more about:

  • Hadoop, Pig, Hive, MapReduce
  • Spark (RDDs, Spark SQL, Mlib, GraphX etc.)
  • CNNs, RNNs
  • Scala

For me, the last reason was one of the major considerations. I consider myself a programming language geek and wouldn’t miss a chance to play around with a new programming language if I can. I must say I was in for a treat. I got to write some significant code in Scala. The sample code, template etc. that they provided for the projects were leveraging some of the nice parts of Scala such as type inference, case classes, pattern matching etc. All this along with its functional nature and full-fledged REPL made Scala a delight to work with.

The course was pretty well designed overall. The well maintained self-paced labs available for almost all topics were really helpful in getting up to speed. Check out their Scala lab to get a feel – link. The course videos were well planned. The TAs were pretty responsive in Piazza. All the good stuff.

The final project was the highlight of the course. It was a group project, worth 40% of the overall grade. Teams were allowed to pick from of a variety of interesting topics. The project milestones were structured in a way similar to how someone would go about pitching a data-science project in a corporate setting (say to your CEO), execute it and come up with a report and presentation summarizing that work.

I was fortunate to get a really good team. We were four. We chose the domain NLP for Healthcare. After exploring a variety of suggested ideas in the domain, we finalized on the following topic: Hierarchical Ensembles of Heterogeneous Models for Prediction of Medical Codes from Clinical Text. We worked well together and did a good amount of research and implementation. We were able to try out various ensemble ML models combining SVMs, DNNs etc as part of the project.

While there were homeworks to keep you busy on most weekends, I felt that the notoriety of the course in terms of difficulty was unfounded. If you go in with an open mind, excited about a fast paced journey across the various big-data tools/technologies being used for data-science in the industry, you won’t be disappointed.

  • Difficulty: 4/5
  • Rating: 5/5

On related news, I got my Georgia Tech Masters degree in my email couple of weeks back! It was truly a moment of happiness. I still need to figure out what to do with all the extra time now that the course is over 🙂

My Georgia Tech Masters degree 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s