Introduction to Data Science

Bill Howe, University of Washington

Join the data revolution. Companies are searching for data scientists. This specialized field demands multiple skills not easy to obtain through conventional curricula. Introduce yourself to the basics of data science and leave armed with practical experience extracting value from big data. #uwdatasci

Commerce and research are being transformed by data-driven discovery and prediction. Skills required for data analytics at massive levels – scalable data management on and off the cloud, parallel algorithms, statistical modeling, and proficiency with a complex ecosystem of tools and platforms – span a variety of disciplines and are not easy to obtain through conventional curricula. Tour the basic techniques of data science, including both SQL and NoSQL solutions for massive data management (e.g., MapReduce and contemporaries), algorithms for data mining (e.g., clustering and association rule mining), and basic statistical modeling (e.g., linear and non-linear regression).

Syllabus

Part 0: Introduction 
  • Examples, data science articulated, history and context, technology landscape
Part 1: Data Manipulation at Scale
  • Databases and the relational algebra 
  • Parallel databases, parallel query processing, in-database analytics 
  • MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages  
  • Key-value stores and NoSQL; tradeoffs of SQL and NoSQL
Part 2: Analytics
  • Topics in statistical modeling: basic concepts, experiment design, pitfalls
  • Topics in machine learning: supervised learning (rules, trees, forests, nearest neighbor, regression), optimization (gradient descent and variants), unsupervised learning
Part 3: Communicating Results
  • Visualization, data products, visual data analytics 
  • Provenance, privacy, ethics, governance 
Part 4: Special Topics
  • Graph Analytics: structure, traversals, analytics, PageRank, community detection, recursive queries, semantic web
  • Guest Lectures

Recommended Background

We expect you to have intermediate programming experience and familiarity with databases, roughly equivalent to two college courses.  We will have four programming assignments: two in Python, one in SQL, and one in R. The target audience is undergraduate students across disciplines who wish to build proficiency working with large datasets and a range of tools to perform predictive analytics.

After taking this course, you may be interested in participating in the three-course Certificate in Data Science offered through the University of Washington Professional and Continuing Education program.  This online course will provide an overview and introduction to the more extensive material covered in that program, which offers classroom-based instruction by data scientists from Microsoft and other Seattle players, networking opportunities with peers, case studies from the "front lines," and deep dives into selected topics.

Suggested Readings

There will be selected readings each week.  

We recommend, but do not require, that students refer to the book Mining of Massive Datasets by Anand Rajaraman and Jeff Ullman

Course Format

The class will consist of lecture videos about 8 to 10 minutes in length. These will contain 1-2 integrated quizzes per video. Some of these videos will be given by guest lecturers from the data science community.

There will be no formal exams or standalone quizzes.

There will be eight total assignments of which two are optional.

We will provide a virtual machine equipped with all necessary software, but you are permitted (and encouraged) to install software in your own environment as well.

There will be four structured programming assignments: two in Python, one in SQL, and one in R.

There will also be two open-ended assignments graded by peer assessment: one in visualization, and one in which you will participate in a Kaggle competition.

Finally, there will be two optional assignments: One involving an open-ended real-world project submitted by external organizations with real needs, and one involving processing a large dataset on AWS.

FAQ


Will I get a Statement of Accomplishment after completing this class?
 

Yes. Students who successfully complete the class will receive a Statement of Accomplishment signed by the instructor. 

What resources will I need for this class?

For this course, you will need an Internet connection and either a) the ability to run a virtual machine locally or b) the ability and knowledge to install the appropriate software yourself.  The software will include Python 2.7 (including various libraries), R, SQLite (or another database you are comfortable using).  You will also have the opportunity to install and work with Hadoop, but for logistics reasons, we will not require its use in an assignment.  Some assignments will be open-ended.

What level of programming experience should I have?

We expect intermediate programming experience in some language and some familiarity with database concepts.  There will be programming assignments, but these are not designed to test knowledge of the language itself and will not involve using any esoteric features.  The languages we will use are Python, R, and SQL.
Dates:
  • 30 June 2014, 8 weeks
  • 1 May 2013, 8 weeks
Course properties:
  • Free:
  • Paid:
  • Certificate:
  • MOOC:
  • Video:
  • Audio:
  • Email-course:
  • Language: English Gb

Reviews

No reviews yet. Want to be the first?

Register to leave a review

Show?id=n3eliycplgk&bids=695438
Included in selections:
Small-icon.hover Machine Learning
Machine learning: from the basics to advanced topics. Includes statistics...
NVIDIA
More on this topic:
135794_70fd_7 Analytics For All
Your practical application oriented guide to analyzing Big Data
Pgzmw8h7sloomdvdplvdg3rmh7xcngtc6w5j47_ezplh61qggi0pzzoc-vrvuqpxmoggngjgbcdd-nwc0og=s0#w=1724&h=1060 Intro to Data Science. Learn What It Takes to Become a Data Scientist
What does a data scientist do? In this course, we will survey the main topics...
67874_1106_5 Practical SQL Reporting with SSRS
Get hands on with Microsoft SQL Server Reporting Services and learn how to create...
17615_39cc_3 Beginner's Guide to PostgreSQL
PostgreSQL Tutorial: What you need to know to get started with relational databases...
Small-icon.hover Web Intelligence and Big Data
This course is about building 'web-intelligence' applications exploiting big...
More from 'Engineering & Technology':
Regular_1800ce45-0511-4bf2-b353-6b3bab9a9b60 Exploring Everyday Chemistry
Explore the organic chemistry behind perfume, medicine, brewing and sport from...
Regular_d22081d0-28bc-4122-b492-f3dcef8f6c27 Introduction to Psychology: The Psychology of Personality
Explore the complex factors and influences that help shape our personality and...
Regular_56cb2a32-9702-4b78-b8af-43bdf0a0a21a Beneath the Blue: The Importance of Marine Sediments
Understand the importance of our planet's seafloor and get an introduction to...
Regular_29f57fb2-9098-4f9b-91f4-9c3504a2ac33 Engineering the Future: Creating the Amazing
Learn more about the fascinating world of engineering, and discover where an...
Regular_a010978b-2f41-4422-a3d9-d1c8b34f5c13 Antimicrobial Stewardship for the Gulf, Middle East and North Africa
Improve your knowledge of the spread of antimicrobial resistance in the Gulf...
More from 'Coursera':
Success-from-the-start-2 First Year Teaching (Secondary Grades) - Success from the Start
Success with your students starts on Day 1. Learn from NTC's 25 years developing...
New-york-city-78181 Understanding 9/11: Why Did al Qai’da Attack America?
This course will explore the forces that led to the 9/11 attacks and the policies...
Small-icon.hover Aboriginal Worldviews and Education
This course will explore indigenous ways of knowing and how this knowledge can...
Ac-logo Analytic Combinatorics
Analytic Combinatorics teaches a calculus that enables precise quantitative...
Talk_bubble_fin2 Accountable Talk®: Conversation that Works
Designed for teachers and learners in every setting - in school and out, in...

© 2013-2019