Big Data Analytics for Healthcare – Experience

One of the cute course slides by Prof. Jimeng Sun

I completed Big Data Analytics for Healthcare (CSE 6250) as part of Spring 2019 of my OMSCS program – my last semester! It was ranked as the hardest course in the program as per OMSCentral, a befitting finale to my 3.5 year long OMSCS journey.

As some might think, I did not choose to take the course because I am a masochist. Rather, the course seemed to cover a lot of ground that is not explored by any other courses in the program. In particular, I was interested in learning more about:

  • Hadoop, Pig, Hive, MapReduce
  • Spark (RDDs, Spark SQL, Mlib, GraphX etc.)
  • CNNs, RNNs
  • Scala

For me, the last reason was one of the major considerations. I consider myself a programming language geek and wouldn’t miss a chance to play around with a new programming language if I can. I must say I was in for a treat. I got to write some significant code in Scala. The sample code, template etc. that they provided for the projects were leveraging some of the nice parts of Scala such as type inference, case classes, pattern matching etc. All this along with its functional nature and full-fledged REPL made Scala a delight to work with.

The course was pretty well designed overall. The well maintained self-paced labs available for almost all topics were really helpful in getting up to speed. Check out their Scala lab to get a feel – link. The course videos were well planned. The TAs were pretty responsive in Piazza. All the good stuff.

The final project was the highlight of the course. It was a group project, worth 40% of the overall grade. Teams were allowed to pick from of a variety of interesting topics. The project milestones were structured in a way similar to how someone would go about pitching a data-science project in a corporate setting (say to your CEO), execute it and come up with a report and presentation summarizing that work.

I was fortunate to get a really good team. We were four. We chose the domain NLP for Healthcare. After exploring a variety of suggested ideas in the domain, we finalized on the following topic: Hierarchical Ensembles of Heterogeneous Models for Prediction of Medical Codes from Clinical Text. We worked well together and did a good amount of research and implementation. We were able to try out various ensemble ML models combining SVMs, DNNs etc as part of the project.

While there were homeworks to keep you busy on most weekends, I felt that the notoriety of the course in terms of difficulty was unfounded. If you go in with an open mind, excited about a fast paced journey across the various big-data tools/technologies being used for data-science in the industry, you won’t be disappointed.

  • Difficulty: 4/5
  • Rating: 5/5

On related news, I got my Georgia Tech Masters degree in my mail couple of weeks back! It was truly a moment of happiness. I still need to figure out what to do with all the extra time now that the course is over 🙂

My Georgia Tech Masters degree 🙂

Graduate Algorithms – Experience

I completed Graduate Algorithms course (CS 8803) as part of my OMSCS program during Fall 2018. It was an interesting and really informative course.

The main topics covered in the course include dynamic programming; divide and conquer, including FFT; randomized algorithms, including RSA cryptosystem and hashing using Bloom filters;  graph algorithms; max-flow algorithms; linear programming; and NP-completeness.

Course website

Of these, I had not studied FFTs and max-flow algorithms during my undergrad. Also, though I had studied basic DP (dynamic programming) in my undergrad (KnapSack, Matrix multiplication etc.) and had prepared some more for tech interviews, I had never really had a rigorous formal training on DP before. This course had sufficient course work (home-works and exams) with the focus on building that DP intuition that I really liked.

The course text was Algorithms by Dasgupta, Papadimitriou and Vazirani. It is a really good, concise introduction to most advanced algorithm topics. It is especially good as a textbook for colleges because you can realistically expect students to read from cover to cover unlike say Introductions to Algorithms by Cormen, which is better suited as a long term reference manual. My one gripe with the course was that it did not cover the last chapter from Dasgupta, on Quantum Computing.

The course grading breakdown was as follows:

  • Homeworks: 5%
  • Project: 10%
  • Midterms: (two) 25% each
  • Final exam: 35%

The midterms and final exam were closed book 3-hour exams. The bulk of the grade (85%) was from these. There were 8 home-works, spaced roughly once a week which added up to just 5%. These usually involve 4-5 questions which would require you to write your solution in pseudo-code (only in case of DP) or in plain English. One might think that doing the home-works is just not worth the effort. However, I cannot over emphasize how important and helpful to the exams these were.

Another thing that I am ever grateful to the homeworks is for introducing me to Latex. Till now, though I had heard that Latex is pretty good for writing technical documents, I never really had a good use-case or forcing function to make me learn it. The HWs usually involved formulas, bigO complexities, pseudo-code, matrices etc. This gave me a really good opportunity to learn Latex. Its amazing!

While we are on the topic of Latex, if it is something that interests you, may I suggest the Emacs plugin for Latex? I used the Spacemacs latex layer. Among other things, it provides previewing the rendered doc in Emacs itself (SPC m p p)!

The TAs and the Professor were prompt on answering queries on Piazza and office hours. The grading turn-around times were also really fast. Overall, I really enjoyed the course. My ratings:

  • Difficulty: 4/5
  • Rating: 4.5/5

Machine Learning for Trading – Experience

I completed the Machine Learning for Trading (CS 7647-O01) course during the Summer of 2018. This was a fun and light course.

The course was divided into 3 mini-courses:

The first part of the course was mainly about getting familiar with Numpy and Pandas. The key take-away was how small differences (optimizations) in how you handle large arrays using Numpy can lead to substantial performance gains.

For me, the second part was the most enlightening. It introduced the various concepts within the Stock market. This was something I’ve always wanted to and in-fact have been learning on an ad-hoc need-to-know basis from Wikipedia, Investopedia etc. However, I’ve always wanted a more formal and comprehensive introduction to these concepts. Some of the ones discussed include:

  1. Going long or short with shares
  2. Capital Assets Pricing Model (CAPM)
  3. Efficient Markets Hypothesis
  4. The Fundamental Law of active portfolio management
  5. Arbitrage Pricing Theory
  6. Technical analysis (Bollinger Bands, Sharpe ratio etc.)

The third part was about using different ML techniques (supervised – Regression, Decision Trees, unsupervised – Q-learning etc) to predict future stock prices based on past data. I was familiar with most of these techniques from previous AI/ML courses I had taken. However, using those with time-series data in this setup was something new.

There were five (4 proper and 1 intro) projects in this course and 2 closed-book exams. The professor also recommended some good/interesting external resources for students to understand the stock market better. The movie The Big Short was even part of the syllabus. The course was well-paced and well organized. My ratings:

  • Difficulty: 2.5/5
  • Rating: 4/5

Intro to High Performance Computing – Experience

This Spring, I completed the Intro to High Performance Computing course (CSE 6220) as part of OMSCS. It is one of the hardest (4.5/5) and also the highest rated (4.8/5) course in the program as per OMS Central. Based on my experience,  I concur with both ratings.

At a high level, the course covers the algorithmic aspects of maximizing the performance of your code. This includes things like parallelizing your code across all processors or across multiple machines, exploiting the memory hierarchy to your advantage etc. The other ‘high performance’ course in the program – High Performance Computer Architectures (CS 6290), in contrast, discusses maximizing performance more at a processor architecture level.

Prof. Vuduc requires special mention. He has put a lot of effort in making the course videos easy-to-understand and interesting. His hilarious antics make you laugh even while discussing the most complex topics. He is also very active in Piazza and participates in the office hours regularly.

There were 5 hands-on projects in total, all in C/C++, with one due every two weeks. These were really the time-sinks. Interestingly, these were also the most fun part of the course in my experience. These involved implementing the algorithms taught in the lectures, making everything you learn more ‘real’.

Key concepts

At a broad level, these were the key concepts I learned from the course:

  1. Shared memory model (aka dynamic multithreading model)
    1. Concepts of work and span (link) in analyzing parallel programs.
    2. Introduction to OpenMP library.
  2. Distributed memory models
    1. Parallel computing across network using message passing.
    2. The Alpha-Beta model (aka latency & inverse-bandwidth model) for analyzing distributed parallel programs.
    3. Introduction to OpenMPI library.
  3. Two level memory model
    1. I/O aware algorithms that can exploit the cache and main memory structures.
    2. Cache oblivious algorithms that still achieve optimal performance without being aware of the cache/memory structures.

alpha_beta_model
The Alpha-Beta model for measuring the cost of distributed computing

For a more detailed overview, I would recommend going over the Course Syllabus (PDF).

Overall, I really enjoyed the course and am happy that I decided to take it though it was not directly related to my specialization (Interactive Intelligence).

Artificial Intelligence – Experience

I recently completed the Artificial Intelligence course (CS 6601) as part of OMSCS Fall 2017. The course gives an good overview of the different key areas within AI. Having taken Knowledge Based AI (CS 7637), AI for Robotics (CS 8803-001), Machine Learning (CS 7641) and Reinforcement Learning (CS 8803-003) before, I must say that the AI course syllabus had significant overlap in many areas with these courses (which is expected). However, I felt the course was still worthwhile since Prof. Thad taught these topics in his own perspective, which made me look at these topics in a different light. Prof. Thad also tried his best to make the course content interesting and humorous, which I really appreciated.

Course Outline

  1. Game Playing – Iterative Deepening, MinMax trees, Alpha Beta Pruning etc.
  2. Search – Uniform Cost Search, Bidirectional UCS, A*, Bidirectional A* etc.
  3. Simulated Annealing – Hill Climbing, Random restarts, Simulated Annealing, Genetic Algorithms etc.
  4. Constraint Satisfaction – Node, Arc and Path consistency, Backtracking, forward checking etc.
  5. Probability – Bayes Rule, Bayes Nets basics, Dependence etc.
  6. Bayes Nets – Conditional Independence, Cofounding cause, Explaining Away, D Separation, Gibbs Sampling, Monty Hall Problem etc.
  7. Machine Learning – kNN, Expectation Maximization, Decision Trees, Random forests, Boosting, Neural nets etc.
  8. Pattern Recognition through Time – Dynamic Time Warping, Sakoe Chiba bounds, Hidden Markov Models, Viterbi Trellis etc.
  9. Logic and Planning – Propositional Logic, Classic planning, Situation Calculus etc.
  10. Planning under Uncertainty – Markov Decision Processes (MDPs), Value iteration, Policy iteration, POMDPs etc.

The course used the classic textbook in AI –  Artificial Intelligence – A Modern Approach (3rd Edition) by Peter Norvig and Stuart Russell. Some chapters (such as Logic and Planning) was taught by Peter Norvig himself whereas few others were taught by Sebastian Thrun. There is no arguing that the course was taught by the industry best.

Screen Shot 2018-01-07 at 6.30.34 PM
The iconic cover of Artificial Intelligence: A Modern Approach

There were 6 assignments (almost one every alternate week) which required proper understanding of the course material and decent amount of coding (in Python). There was an open book midterm and final exam as well. Even though these were open book, these involved significant amount of work (researching and rereading the text, on paper calculations etc.). Overall, completing these forces one to really understand the concepts, which I really liked.

Summary Stats

  1. Average time spend per week – approx. 20 hours (including whole weekends on assignment due weeks)
  2. Difficulty (out of 5) – 4.25 (which is what I would rate ML too, and these two would top my list)
  3. Rating – 4/5

Introduction to Information Security – Experience

I did the Introduction to Information Security course (CS6035) as part OMSCS Summer 2017 semester.

The course was a good overview of various aspects of Information Security. It broadly covered topics like system security, network security, web security, cryptography, different types of malware etc. The course was lighter in terms of work load compared to the other subjects I’ve taken so far. I really liked the projects which were thoughtfully designed to give the students hands-on experience in each of these topics.

The four projects that we had to do were:

  1.  Implementing Buffer Overflow in a given vulnerable code. This required brushing up on C basics,  understanding how process memory allocation works internally and some playing around with gdb.
  2.  Analyzing provided malware samples using Cuckoo, an automatic malware analyzer and sandbox to identify behaviors such as registry updates, keyboard and mouse sniffing, remote access, privilege escalation etc.
  3. Understanding and implementing the RSA algorithm in python, identifying the weakness in using smaller length keys (64 bit) and decrypting an RSA encrypted message by exploiting this weakness.
  4. Exploiting vulnerabilities in a target (sample) website using Cross-Site Request Forgery (XSRF), Cross Site Scripting (XSS) and SQL injection.

Apart from the projects, there were 10 Quizzes to be completed, one per week throughout the course. The various exploits discussed in the course are fairly easy to be introduced in a codebase if you are not aware of these. Unfortunately, these are pretty common even now, many years after they were first discovered.

Hence, no matter the type of software development one is into (mobile, web, DB, relatively low level languages like C, embedded device programming, bare metal etc.), these exploits and their counter-measures are a must-know.

 

Reinforcement Learning – Experience

I completed the Reinforcement Learning course (link) as part of OMSCS Spring 2017 semester. It was one of the most rewarding courses I took as part of the program till date.

The course was taught by professors Charles Isbell and Michael Littman, the same Profs who had taken the Machine Learning course previously (blog link). The course was really challenging considering the closely packed and research oriented home works and projects as well as the math/theory heavy course material. We had

  • 6 home works which involved implementing different RL algorithms to solve given problems
  • 3 projects out of which two were the reproduction of experiment results from prominent RL research papers and one was solving an RL problem using OpenAI Gym

Summarizing my key learnings from RL below:

  • Reinforcement learning helps you train an AI agent to maximise some form of reward without prior understanding of the environment  -i.e. model-free.
  • E.g: Pacman. Here the agent (or player) can roam around the space using possible actions (left, right, up, down). When it consumes one of the small orbs, it gets points (+ve reward). When it eats the big orbs and then eats the enemy players, it again gets more points. However, if it’s eaten by one of the enemy players, it loses a life (-ve reward). If you let an RL agent play Pacman for some time, it will start playing randomly, but eventually, figure out the rules of the game and can potentially play better than a human player. All this without we injecting any domain knowledge (rules of the game, winning strategies etc.) beforehand! (crazy right?)
  • Screen Shot 2017-05-07 at 10.23.36 PM
  • Most RL research assumes all processes can be represented using MDPs (Markov Decision Processes). These are processes where the entire past can be represented using the current state of the agent.
  • Learned about different RL algorithms such as:
    • Value Iteration
    • Policy Iteration
    • Q-learning
    • TD-Lambda etc.
  • Generalization using function approximation – This seemed to me to be one of the most promising sections of RL. It can effectively take RL outside the confines of Grid world and into the big and continuous state spaces of the real world.
    • For one of our projects, we used DQN (Deep Q-Networks), one of the latest efforts in generalization using deep neural networks, published by DeepMind – a Google company.
  • Reward Shaping – a mechanism to accelerate the learnings of the agents and help them get to their goals faster.
  • POMDPs (Partially Observable MDPS) – These are closer to the processes which we see in real-life. We don’t get to know fully which state we are in. We have to work with a set of ‘belief states’ or probability distributions of possible states we might be in.
  • Game Theory – I found this to be the most fun part of the course. It deals with stochastic games where multiple agents try to maximise their collective/competing rewards. This is again closer to the situations which we face in real-life. Topics include:
    • Prisoners Dilemma
    • Nash Equilibrium
    • Folk theorem and sub-game perfect equilibrium
    • Tit-for-Tat, Grim trigger, Pavlov etc. game strategies
    • Coordinated equilibria, using side payments (Coco-Q) etc.

The course content was a bit too theoretic in some chapters (e.g.: AAA – Advanced Algorithm Analysis). I found lectures from David Silver, DeepMind to be a good supplementary course to build the required intuition for this course – link.

One of the really exciting moments in this course was when Prof. Richard Sutton, considered by many as the father of Reinforcement Learning, and the author of the primary textbook for RL (of our course and elsewhere) ‘Reinforcement Learning: An Introduction’ (second edition draft available from author’s website – link) appeared for one of our office hours as a special guest.

Screen Shot 2017-04-20 at 4.44.15 AM
Prof. Richard Sutton along with our TAs during an office hour

I found all the TAs for this course really knowledgeble and helpful. All the office hours were really useful and fun-filled at the same time. One of our TAs, Migual Morales has been featured in the OMSCS website recently – link.

In conclusion, this course has been one helluva ride that I enjoyed throughout! 🙂

Machine Learning – Experience

I recently completed CS 7641 – Machine Learning as part of my OMSCS coursework. The course was really enjoyable and informative.

The course was taught by Professors Charles Isbell and Micheal Littman. Both are really awesome. Contrary to most other courses on the topic, they have managed to make the course content easy to understand and interesting, without losing out on any of its essences. All videos are structured as conversations between the Profs where one acts as the teacher and other as the student – very effective.

All the course videos are available publicly on Youtube – link. Also, I would recommend watching this funny Capella on ML based on Thriller by the Profs – link. 🙂

The course was a literature survey and general introduction into the various areas in ML. It was primarily divided into 3 modules:

  • Supervised learning – where we are given a dataset with labels (emails classified as spam or not). You try to predict the labels for future data based on what you’ve already seen or ‘learned’.
    • Techniques include Decision Trees, K-Nearest Neighbours, Support Vector Machines (SVM), Neural Networks etc
  • Unsupervised learning – all about finding patterns in unlabeled data. Eg: Group similar products together (clustering) based on customer interactions. This can be really helpful in recommendations etc.
    • Randomized Optimization, clustering, feature selection and transformation etc.
  • Reinforcement learning – the most exciting one (IMHO). This overlays many concepts we usually consider as part of Artificial Intelligence. RL is about incentivizing machines to learn various tasks (such as playing chess) by providing different rewards.
    • Markov Decision Processes, Game Theory etc.
    • I found the concepts in GT such as the Prisoners Dilemma, Nash Equilibrium etc. and how they tie into RL interesting.

All of these are very vast subjects in themselves. The assignments were designed in such a way that we got to work with all of these techniques at least to some extent. The languages and libraries that we use were left to our choice, though guidance and recommendations were provided. Through that, got the opportunity to work with Weka, scikit-learn and BURLAP.

Overall, enjoyed the course really well. Hoping to take courses like Reinforcement Learning (link) to learn more about the topics in upcoming semesters.

AI for Robotics – Experience

I studied AI for Robotics class as part of the Summer’16, OMSCS program. It was a really interesting and challenging experience. It was taught by Prof. Sebastian Thrun who lead the self-driving car project in Google. It was his team from Stanford which won the DARPA Grand Challenge in 2005 where they drove a car (Stanley) over 212 km of off-road course and came first. Incidentally Prof. Thrun is a co-founder at Udacity and was it’s CEO until recently.

The class consisted of two portions: 

  • a series of lectures combined with small programming tasks
  • two open-ended projects related to self-driving cars

The whole course centers around the use of probabilistic models to predict the various parameters involved such as the location of the robot car, the location of various landmarks, obstacles, moving targets such as other cars, pedestrians etc. The Prof also has an aptly titled text book ‘Probabilistic Robotics’ to go along with the course (though I couldn’t make much use of it).

The lectures covered the following topics:

Localization

Noise is an essential part of robotics.

There will be noise in the robot motion. Eg: If we instruct the robot to move 5 meters, the robot might end-up moving only 4.8 meters due to tire slipping or uneven surface.

There will be noise in sensor measurement. Eg: If the sensor readings tell us we are 3 meters from the car ahead, the actual distance might be 2.7 meters.

How can a robot car navigate the road safely given all these noises? That is exactly what localization addresses. The term refers to various techniques which help us ‘see-through’ the noise and identify the underlying motion model of the robot. The following localization techniques were taught in class:

  • Kalman filters: These work best for linear motions. The predictions are Gaussian distributions here and hence will be uni-modal i.e. the prediction will only tell which is the highest probability location of the robot (no info on 2nd or 3rd highest probability location etc). However, there are extension of the standard KF such as the Unscented KF and Extended KF which address the mentioned limitations.
  • Particle filters: These seem best suited for localization since they work for non-linear motions and support multi-modal distributions.

localizing
Localization in action: Hex bug path in black and localized particle in blue

Search

Self-driving cars need to find the optimal path to their destination as well. The technique used for finding the most optimal path without exploring the entire state space is A* algorithm. Those who have learned AI in under-grad might be familiar with the approach. It involves the use of a heuristic function which gives a score for all possible movements based on how far the new state is from the goal state.

Control Theory

Humans drive cars smoothly. If we ask a robot to move on a particular course, by default it will either over-shoot or under-shoot its goal and then correct itself. This is because of the inherent delay in the move-sense feedback cycle. This keeps repeating leading to a zig-zag motion and overall unpleasant (and potentially dangerous) driving experience. There is a whole domain of control systems on how to smoothen out the robot motion as it approaches it’s desired course.

The technique we learned is the PID controller. This controller adjusts the steering angle of the robot at all points of its motion based on various proportional, differential and integral terms computed in relation to its CTE or cross track error (the lateral distance between the robot and the reference trajectory). 

Screen Shot 2016-08-09 at 9.06.33 PM
Here A represents robot motion without any controller and B represents one with PID controller.

 

Runaway robot

The first project was a set of 4 interesting challenges (plus a bonus challenge for the extra smart ones) where we need to locate a robot (aptly named 404) which ran away from an assembly line and capture it using a hunter bot. This was an individual project. It requires some level of ingenuity to some up with a working solution since the lessons from class were not directly applicable here.

the_chase
Hunter bot (blue) chasing the runaway bot (black). The red dots are future predictions with which the hunter tries to capture the bot.

Hex bug motion prediction

The second project was a team project. Here we were given coordinates of random movements of a hex bug for 2 minutes at 30 fps (frames per second). We need to predict the last 2 seconds i.e. 60 frames of the bug’s motion. This was an open ended problem and we could use any technique from inside the class or outside. We were a team of 4 and explored various techniques including clustering trajectories, creating a markov model and finally ended up using PF to solve the same.

hexbot_predictions
Predictions of hex bug path using various approaches against actual bug path (in black)

Overall, enjoyed the class a lot!

Knowledge based AI – Experience

I enrolled for the Spring’16 batch of OMSCS program offered by Georgia Tech university and Udacity. As my first course as part of the program, I chose Knowledge base AI, taught by Prof. Ashok Goel and David Joyner. The video sessions of the course can be freely accessed through Udacity website here.

The class was very interesting and insightful. I thoroughly enjoyed the 3 main projects we had to do throughout the class. The overall class was focused on systematically studying human-level intelligence/cognition and seeing how we can build that using technology. As a measure of accessing human cognition, the class used Raven’s Progressive Matrices (RPM).

Our class came into the internet limelight after the course when Prof Ashok revealed that one of out TA – Jill Watson was actually a bot. Covered widely by press – Washington Post and WSJ.

With regard to the course content, we were introduced to the broad areas that come under human level AI research such as:

  • Semantic Networks, Frames and Scripts – useful knowledge representations
  • Generate & Test and Means-End Analysis – two popular problem solving techniques
  • Production systems – rule based systems which are useful in AI
  • Learning By Recording Cases and Case base reasoning – techniques to learn based on past examples and to adapt them as per requirement
  • Incremental concept learning – contrary to approaches like ML where we feed millions of examples to train models, human cognition deals with incremental data inputs. Incremental concept learning describes how we can use generalizations and specializations to make inferences from these inputs.
  • Classification – mapping percepts to concepts so that we can take actions
  • Formal Logic – techniques from predicate calculus such as resolution theorem proving are useful in some cases of reasoning. However human cognition is inductive and abductive in nature, whereas logic is deductive.
  • Understanding – Humans always deal with ambiguity. Eg: Same word can have different contextual meanings. Understanding is how we leveraging available constraints to resolve ambiguities.
  • Common sense reasoning – about modeling our world in terms of a set of primitive actions and their combination.
  • Explanation based learning and Analogical reasoning – deals with how we can extract ‘abstractions’ from prior knowledge and transfer these to new situations
  • Diagnosis and learning by correcting mistakes – Determining what is wrong with a malfunctioning device/system. Learning by correcting these mistakes.
  • Meta-reasoning – reasoning about our own reasoning process and applying all the above techniques to the same.

As part of the projects, we built small ‘AI agents’ that process RPM problem images and use various techniques to solve them. These were quite challenging and required fair amount of programming. For me, it was a good learning opportunity for building something non-trivial in Python. We used Python image processing library PIL and various image processing techniques like connected component labelling.

Considering the vastness of the topic – human level intelligence and that I intent to specialize in ‘Interactive Intelligence‘, I found the course really interesting and informative. Enjoyed it!

I’m adding a nice poster created by my classmate Eric on the high level topics we studied in this class.

kbaipostersmall