General Setup for Data Science Projects in Python
Before you begin your first data science project using Python, it’s helpful to make sure that your computer is properly set up for the task. I broke this process down into manageable steps to help you get get started.
This blog is part of a series of tutorials called Data in Day. Follow these tutorials to create your first end-to-end data science project in just one day. This is a fun easy project that will teach you the basics of setting up your computer for a data science project and introduce you to some of the most popular tools available. It is a great way to get acquainted with the data science workflow.
- General Setup for Data Science Projects with Python
- Virtual Environments I: Installing Pyenv with Homebrew
- Virtual Environments II: Creating a Virtual Environment with Pyenv and Installing Data Science Packages
- Jupyter Notebooks I: Getting Started with Jupyter Notebooks
- GitHub I: Getting Started with GitHub
- Pandas I: read_csv(), head(), tail(), info(), and describe()
- Pandas II: drop(), isna()
- Pandas III: value_counts(), duplicated(), min(), and max()
Note to readers: Since the internet is full of outdated advice, it’s wise to check when directions or solutions have been written. It’s also helpful to know what system and specifications the author was working on. I’m using an Apple MacBook Air, Early 2015 that is running macOS Catalina Version 10.15.7. So these instructions will apply to users with the same or fairly similar setup. If you’re using an older operating system these steps should still work, but there may be areas where adjustments are needed, which I will try to note.
I. Hello Python
Python will already be installed on your computer, but it’s fairly certain that it is outdated. If your Mac is from 2015 like mine is, it will likely come with Python 2.7.7 out of the box. To get started, you will need the newest version of Python installed on your system.
Navigate to python.org > Downloads > Mac OS X and download the current version (as of December 2020, 3.9.0). You can just use the graphical installer if you wish. The installer should launch as soon as package downloads, and once you’re through all of the red tape, you’ll have a new version of Python available for use.
II. Meet the Terminal
Go to your launchpad and select Other. There, you will find an app called Terminal. Open it up, and when it shows up in your dock, right click to keep it in the doc. You’re going to need it.
III. Using Bash in Terminal Install Homebrew, CLT and Xcode
You may already have Homebrew on your Mac, but if you do not open your web browser and head over to Homebrew.
To find out if you are using Zsh or Bash, simply look at the text on the left side of the terminal. If it looks similar to the line below, it’s Zsh:
If it looks like this, it’s Bash:
The difference is that Zsh will use % and Bash will use $.
To switch to Bash simply enter:
The computer will return instructions to switch back to Zsh if you wish later on. From now on, when you see a code box with a $ before the command, you’ll know that the command is Bash.
Your screen should look something like this:
Command Line Tools (CLT) for Xcode
Homebrew requires that this package be installed on your system in order to use Homebrew. We are going to do this, but first we have to set up the terminal just right.
To install Command Line Tools, enter:
$ xcode-select — install
It might take a little while for everything to download. Follow the instructions of any prompts if they happen to pop up along the way.
Now, you can move onto installing Homebrew by entering the following:
$ mkdir homebrew && curl -L https://github.com/Homebrew/brew/tarball/master | tar xz — strip 1 -C homebrew
This will automatically install Homebrew inside your /usr/local directory.
IV. Using Homebrew to Install Python & Pip
Homebrew has been installed, and now it’s time to make sure that Homebrew has its very own Python. There are going to be a lot of different versions of Python on your computer. This step is making sure that your package manager is connected to its own version.
Python and Pip
To install the latest version of Python with Homebrew, enter the following:
brew install python
Homebrew will install its own copy of the newest version of Python. Pip is a package manager included when you download Python. The reason why Homebrew and other programs, packages, and apps need their own version of Python is because the Python that is already on your Mac is there for a reason. Altering or removing it can really mess up your computer and make it unusable unless you do a hard reset.
V. What Did We Do?
1. Verified what machine and OS you are using.
2. Installed the newest version of Python to date.
3. Introduced ourselves to the terminal.
4. Learned how to switch from Zsh to Bash, and how to tell which one we are using.
5. Installed Xcode, and subsequently Command Line Tools.
6. Installed Homebrew
7. Installed a version of Python especially for Homebrew, using Homebrew.
Now you have your foundational package management set up that we will build upon to set up your data science environments. Next, you can follow this tutorial to create a virtual environment for your project.