Syllabus for Statistics 240: Nonparametric and Robust Statistics
Contents
Syllabus for Statistics 240: Nonparametric and Robust Statistics#
Instructor: Philip B. Stark#
GSI: Jake Spertus#
This version: 20 January 2023#
Six things to do right away if you are taking this class#
sign up for the Slack channel
log into https://github.berkeley.edu to initialize your Berkeley Github account (this will be used for assignments)
create a “regular” https://github.com account if you don’t already have one (this will be used for course materials)
clone the course materials repository
turn on Github notifications for changes to the course repo (there will be frequent edits–likely several per week)
get an ORCID if you don’t have one already
Overview#
This course focuses on nonparametric/exact/conservative methods for inference in a variety of settings, including survey sampling, auditing, and randomized experiments. The emphasis is on recent theory involving supermartingales, but many “classical” nonparametric topics will also be covered. Part of the class will involve fielding consulting inquiries (Statistics 272 is not being offered this term), because many problems that arise in consulting are amenable to a straightforward nonparametric approach.
Technical topics will include (time permitting):
Nonparametric inference about the treatment effect in randomized, controlled trials
Nonparametric inference about population means for a variety of sampling designs
Nonnegative supermartingales, Ville’s inequality, sequential testing
Combining hypothesis tests, nonparametric combinations of tests, \(E\)-values
Nonparametric methods based on ranks
Nonparametric methods based on projecting the empirical distribution onto the null
Density estimation and inverse problems
Pseudo-random number generation, simulation
Consistent class participation is crucial: we will be discussing subtle substantive, technical, and philosophical issues and reviewing code during class.
Administrativia#
Format and assessment#
3 hours of lecture and one two-hour of lab per week
approximately five problem sets, two small term projects involving contributing to an open-source python library, and a larger term project involving data
Office hours#
Philip: Wednesdays, 11AM-12PM by Google Meet, and by appointment
Jake Spertus: Thursdays, 3PM-4PM, Evans 240
Communication#
Please use the course Slack channels (https://join.slack.com/t/slack-iv99638/shared_invite/zt-1n1ypzr8l-U7Q8YRIkb6gl7xS9xH8xaw) for questions/comments/discussion about course material and logistics. For personal matters (illness, accommodations, etc.) that should remain private, please send email.
During the work week, we expect to be able to reply to Slack messages and email within 24 hours. On weekends, we might need longer.
Grading#
Submitting assignments: Submit written assignments by making a pull request to your private repository within the Berkeley GitHub organization for the class, https://github.berkeley.edu/stat-240-s23
Use your CalNet credentials to access your private repository. Create a directory for each assignment labeled with the assignment number, e.g., “Assignment1” for the first assignment.
Text documents should be written in LaTeX or Markdown (Markdown, processed by pandoc, is preferred: using Markdown can really speed up your writing and the source is easier to read than LaTeX). A pdf and the source file should be submitted. Microsoft Word is not acceptable.
Code and analyses should be in python. All code should have appropriate docstrings and unit tests. When you submit code as part of an assignment, also submit the output of running
pylint
(orpycodestyle
, for Jupyter notebooks) andpytest
on your code (including test coverage reports). Function definitions and class variables should include type hints. Follow PEP-8, PEP-257, PEP-484, and PEP-526. In some cases, Jupyter notebooks will be the appropriate thing to submit; in others (more extensive analyses), a notebook and a collection of .py files might be more appropriate. For term projects, the “deliverable” will include a repository that includes code, data, analyses, and unit tests.Final projects are due on the first day of final exams, 8 May 2023. Final projects might include an in-person or recorded video presentation (to be determined).
Computing components of assignments will be graded partly on the correctness of the calculations, but also on style, documentation, and the quality and coverage of unit tests.
Code of conduct; attribution of work#
The high academic standard at the University of California, Berkeley, is reflected in each degree awarded. Every student is expected to maintain this high standard by ensuring that all academic work reflects unique ideas or properly attributes the ideas to the original sources.
These are some basic expectations of students with regards to academic integrity: Any work submitted should be your own individual thoughts, and should not have been submitted for credit in another course unless you have prior written permission to re-use it in this course from this instructor.
All assignments must use “proper attribution,” meaning that you have identified the original source and extent or words or ideas that you reproduce or use in your assignment. This includes drafts and homework assignments! If you are unclear about expectations, ask your instructor.
Do not collaborate or work with other students on assignments or projects unless the instructor gives you permission or instruction to do so.
Disability accommodations#
If you need an accommodation for a disability, if you have information your wish to share with the instructor about a medical emergency, or if you need special arrangements if the building needs to be evacuated, please inform the instructor as soon as possible.
If you are not currently listed with DSP (the Disabled Students’ Program) and believe you might benefit from their support, please apply online at https://dsp.berkeley.edu/.
Resources#
Communication: class Slack channel https://join.slack.com/t/slack-iv99638/shared_invite/zt-1n1ypzr8l-U7Q8YRIkb6gl7xS9xH8xaw
Computing resources
We will use Jupyter notebooks. We will start with hosted notebooks on https://datahub.berkeley.edu/, but you can also install everything on your own device.
We will use the campus github server, https://github.berkeley.edu.
The class notes and most other materials are available at https://github.com/pbstark/StatNotes.
Assignments should be submitted by pull request to your private repository within the class organization https://github.berkeley.edu/stat-240-s23.
Git and git workflows
Continuous integration
Scientific Python, Jupyter
pylint and pycodestyle
Python for scientific computing by Fernando Perez
Elegant SciPy, Stefan van der Walt. The full book and all the notebooks are available.
Getting started with Python for research, a gentle introduction to Python in data-intensive research.
An introduction to “Data Science”, a collection of Notebooks by BIDS’ Stéfan Van der Walt.
Effective Computation in Physics, by Kathryn D. Huff; Anthony Scopatz. Notebooks to accompany the book.
A Whirlwind Tour of Python, by Jake VanderPlas.
Python for Data Analysis, 2nd Edition, by Wes McKinney, creator of Pandas. Companion Notebooks
Effective Pandas, a book by Tom Augspurger, core Pandas developer.
Docker
LaTeX
Markdown
Pandoc
Miscellaneous computing tutorials