Machine Learning & Predictive Modelling
January 9, 2017 at 2:19 pm
(This post was last modified: January 9, 2017 at 2:29 pm by Aristocatt.)
So over the past month or so, I have plunged into the field of Machine Learning and Predictive Modelling. Data Science, and coding in general, to me is fascinating.
I love that I can think up a crazy question today, scour the internet for relevant data sets, and pull that data together to see if I can't falsify my hypothesis. Same thing with programming in general, I can think up a project that I want to do, and watch my project come to life without any real start up costs. All the while I get to build on problem solving skills as I move forward.
Right now I just finished my first semi-serious predictive model(still have a bit more tweaking to do), and submitted it to kaggle.
I'm curious if there are any data scientists(professional or hobbyist) or kagglers out there that might want to provide me with some resources/tips/etc that helped them improve, or that might want to form a kaggle team(for fun, I'm not very good at this right now) and work on some projects together.
I little bit about my experience in CS:
I got started with CS in college, and took an internship as a DBA over at Verisign in the summer of 2012. I liked coding a lot, but I hated CS in a corporate environment.
While there I primarily worked with SQL and Java.
Afterwards I got into web development a little bit, and worked with Php(worst language ever), JS, and MySQL. Around 2014 I essentially stopped coding and picked up some other hobbies.
At the beginning of 2016 I picked up python, and have been somewhat consistently(over the year probably an average of 6+ hours a week) coding in python since.
I'm familiar with the scipy package(Pandas, Numpy, sklearn, seaborn, etc)
I have all the mathematics you would expect an engineer to have, and haven't really used any of it since college. But I can grasp some of the Machine Learning fundamentals without needing to fill in many mathematical gaps.
Anyway, any Data Scientists(professional or hobbyist) out there? Want to do some kaggle competitions with me? Want to provide some helpful anecdotes, tips, papers, books.
I have one tip I think I can give that has helped me a bit, and I haven't seen many people using it to help them organize their data.
Most data sets will come with a .txt file that describes what each variable is. Often times that documentation file will also in a fairly consistent manner explain what type of data you are looking at (ordinal, nominal, discrete, continuous), and you can use a RegEx to quickly automate splitting your data up into those types. Spend a few minutes looking over the documentation, you should be doing this anyway, and try to see if you can't find a pattern that lets you use regular expressions to tease out important information like whether or not the data is nominal or discrete.
Kaggle Username: aristocatt
I love that I can think up a crazy question today, scour the internet for relevant data sets, and pull that data together to see if I can't falsify my hypothesis. Same thing with programming in general, I can think up a project that I want to do, and watch my project come to life without any real start up costs. All the while I get to build on problem solving skills as I move forward.
Right now I just finished my first semi-serious predictive model(still have a bit more tweaking to do), and submitted it to kaggle.
I'm curious if there are any data scientists(professional or hobbyist) or kagglers out there that might want to provide me with some resources/tips/etc that helped them improve, or that might want to form a kaggle team(for fun, I'm not very good at this right now) and work on some projects together.
I little bit about my experience in CS:
I got started with CS in college, and took an internship as a DBA over at Verisign in the summer of 2012. I liked coding a lot, but I hated CS in a corporate environment.
While there I primarily worked with SQL and Java.
Afterwards I got into web development a little bit, and worked with Php(worst language ever), JS, and MySQL. Around 2014 I essentially stopped coding and picked up some other hobbies.
At the beginning of 2016 I picked up python, and have been somewhat consistently(over the year probably an average of 6+ hours a week) coding in python since.
I'm familiar with the scipy package(Pandas, Numpy, sklearn, seaborn, etc)
I have all the mathematics you would expect an engineer to have, and haven't really used any of it since college. But I can grasp some of the Machine Learning fundamentals without needing to fill in many mathematical gaps.
Anyway, any Data Scientists(professional or hobbyist) out there? Want to do some kaggle competitions with me? Want to provide some helpful anecdotes, tips, papers, books.
I have one tip I think I can give that has helped me a bit, and I haven't seen many people using it to help them organize their data.
Most data sets will come with a .txt file that describes what each variable is. Often times that documentation file will also in a fairly consistent manner explain what type of data you are looking at (ordinal, nominal, discrete, continuous), and you can use a RegEx to quickly automate splitting your data up into those types. Spend a few minutes looking over the documentation, you should be doing this anyway, and try to see if you can't find a pattern that lets you use regular expressions to tease out important information like whether or not the data is nominal or discrete.
Kaggle Username: aristocatt