1.1.2 Knowing Your Task and Knowing Your Data
1.1.2 Knowing Your Task and Knowing Your Data
Quite possibly the most important part in the machine learning processis understanding the data you are working with and how it relates to thetask you want to solve. It will not be effective to randomly choose analgorithm and throw your data at it. It is necessary to understand whatis going on in your dataset before you begin building a model. Eachalgorithm is different in terms of what kind of data and what problemsetting it works best for. While you are building a machine learningsolution, you should answer, or at least keep in mind, the followingquestions:What question(s) am I trying to answer? Do I think the data collectedcan answer that question?What is the best way to phrase my question(s) as a machine learningproblem?Have I collected enough data to represent the problem I want to solve?What features of the data did I extract, and will these enable theright predictions?How will I measure success in my application?How will the machine learning solution interact with other parts of myresearch or business product?In a larger context, the algorithms and methods in machine learning areonly one part of a greater process to solve a particular problem, and itis good to keep the big picture in mind at all times. Many people spenda lot of time building complex machine learning solutions, only to findout they don’t solve the right problem.When going deep into the technical aspects of machine learning (as wewill in this book), it is easy to lose sight of the ultimate goals.While we will not discuss the questions listed here in detail, we stillencourage you to keep in mind all the assumptions that you might bemaking, explicitly or implicitly, when you start building machinelearning models.
Comments