Skip to main content

1.1.2 Knowing Your Task and Knowing Your Data

1.1.2 Knowing Your Task and Knowing Your Data

1.1.2 Knowing Your Task and Knowing Your Data


Quite possibly the most important part in the machine learning processis understanding the data you are working with and how it relates to thetask you want to solve. It will not be effective to randomly choose analgorithm and throw your data at it. It is necessary to understand whatis going on in your dataset before you begin building a model. Eachalgorithm is different in terms of what kind of data and what problemsetting it works best for. While you are building a machine learningsolution, you should answer, or at least keep in mind, the followingquestions:What question(s) am I trying to answer? Do I think the data collectedcan answer that question?What is the best way to phrase my question(s) as a machine learningproblem?Have I collected enough data to represent the problem I want to solve?What features of the data did I extract, and will these enable theright predictions?How will I measure success in my application?How will the machine learning solution interact with other parts of myresearch or business product?In a larger context, the algorithms and methods in machine learning areonly one part of a greater process to solve a particular problem, and itis good to keep the big picture in mind at all times. Many people spenda lot of time building complex machine learning solutions, only to findout they don’t solve the right problem.When going deep into the technical aspects of machine learning (as wewill in this book), it is easy to lose sight of the ultimate goals.While we will not discuss the questions listed here in detail, we stillencourage you to keep in mind all the assumptions that you might bemaking, explicitly or implicitly, when you start building machinelearning models.

Comments

Popular posts from this blog

How to Read .CSV file in Pandas

import pandas as pd df = pd . read_csv ( 'downloads/adeshbhai.csv' ) df . head () Out[1]: Region Country Item Type Sales Channel Order Priority Order Date Order ID Ship Date Units Sold Unit Price Unit Cost Total Revenue Total Cost Total Profit 0 Australia and Oceania Tuvalu Baby Food Offline H 5/28/2010 669165933 6/27/2010 9925 255.28 159.42 2533654.00 1582243.50 951410.50 1 Central America and the Caribbean Grenada Cereal Online C 8/22/2012 963881480 9/15/2012 2804 205.70 117.11 576782.80 328376.44 248406.36 2 Europe Russia Office Supplies Offline L 5/2/2014 341417157 5/8/2014 1779 651.21 524.96 1158502.59 933903.84 224598.75 3 Sub-Saharan Africa Sao Tome and Principe Fruits Online C 6/20/2014 514321792 7/5/2014 8102 9.33 6.92 75591.66 56065.84 19525.82 4 Sub-Saharan Africa Rwanda Office Supplies Offline L 2/1/2013 115456712 2/6/2013 5062 651.21 524.96 3296425.02 2657347.52 639077.50 In [2]: df . tail () Out[2]: Reg...

Regression Graded Quiz week 2 quiz (ibm) Coursera

Congratulations! You passed! TO PASS   80% or higher Keep Learning GRADE 80% Regression LATEST SUBMISSION GRADE 80% 1. Question 1 Based on the reading, which of the following best describes the real added value of the author's research on residential real estate properties? Quantifying the magnitude of relationships between housing prices and different determinants. Quantifying people's preferences of different transport services. The research revealed findings that opposed basic perceptions that people hold about the real estate properties. The research determined that there was no correlation between proximity to shopping centres and housing prices. Correct Correct. The research confirmed many perceptions that people have about real estate properties but it major contribution is quantifying the magnitude of the relationships between the housing prices and different deter...

Assignment 4 - Understanding and Predicting Property Maintenance Fines

You are currently looking at  version 1.0  of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the  Jupyter Notebook FAQ  course resource. Assignment 4 - Understanding and Predicting Property Maintenance Fines This assignment is based on a data challenge from the Michigan Data Science Team ( MDST ). The Michigan Data Science Team ( MDST ) and the Michigan Student Symposium for Interdisciplinary Statistical Sciences ( MSSISS ) have partnered with the City of Detroit to help solve one of the most pressing problems facing Detroit - blight.  Blight violations  are issued by the city to individuals who allow their properties to remain in a deteriorated condition. Every year, the city of Detroit issues millions of dollars in fines to residents and every year, many of these fines remain unpaid. Enforcing unpaid blight fines is a costly and tedious process, so the city...