reading-notes

Data Analysis

Reading

Reference

Notes

Jupyter Lab

What is Jupyter Lab?

JupyterLab is the next-generation user interface for Project Jupyter. It provides a modern, flexible, and powerful platform for data science and scientific computing. JupyterLab offers a more integrated development environment for working with notebooks, code, and data.

It includes the following:

JupyterLab is designed to be extensible, so you can add new capabilities to it with plugins. It is also fully compatible with the classic Jupyter Notebook, so you can use all of your existing notebooks in JupyterLab.

What is a Jupyter Notebook?

Jupyter notebooks are interactive documents that contain a combination of text, code, and code output. They are used to create and share documents that contain live code, equations, visualizations, and narrative text. Jupyter notebooks are often used for data science, machine learning, and scientific computing, but they can be used for a wide range of tasks. They are a popular choice for doing reproducible research, as they allow you to combine code, results, and explanations in one document.

What is the advantage of Jupyter Lab over an IDE?

There are a few main advantages of Jupyter notebooks and JupyterLab over IDEs like VSCode and PyCharm:

That being said, IDEs like VSCode and PyCharm have their own advantages as well. They generally have more powerful code editing and debugging capabilities, and they may be a better choice for larger, more complex projects.

Jupyter Lab Shortcuts

NumPy

What is NumPy?

NumPy is a library for Python that is used for scientific computing. It provides a high-performance multidimensional array object, and tools for working with these arrays.

NumPy arrays are used to store large amounts of numerical data, and they can be efficiently processed using specialized functions and libraries written in C and Fortran.

NumPy is a fundamental library for scientific computing with Python, and it provides the foundation for many other libraries in the scientific Python ecosystem, such as SciPy and Pandas.

Can install with the pip install numpy command in the CLI

What is a NumPy Array?

A NumPy array is a multi-dimensional array of elements of the same data type. It is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array, and the shape of an array is a tuple of integers giving the size of the array along each dimension.

NumPy arrays are used to store large amounts of numerical data, and they can be efficiently processed using specialized functions and libraries written in C and Fortran. NumPy arrays are more efficient and more powerful than Python’s built-in lists or tuples, and they are an essential part of the scientific Python ecosystem.

Here is an example of how you can create a NumPy array in Python:

import numpy as np

# Create a 1-dimensional array
a = np.array([1, 2, 3])

# Create a 2-dimensional array
b = np.array([[1, 2, 3], [4, 5, 6]])

# Create an array with three dimensions
c = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print(a.shape) # (3, 0)
print(b.shape) # (2, 3)
print(c.shape) # (2, 2, 3)

In the example above, a is a 1-dimensional array with shape (3,), b is a 2-dimensional array with shape (2, 3), and c is a 3-dimensional array with shape (2, 2, 3).

The shape (2, 3) indicates that the array has two dimensions, and the size of the array along each dimension is given by the integers 2 and 3. In this case, the first integer 2 represents the number of rows in the array, and the second integer 3 represents the number of columns in the array.

Essentially, shape output is like this (#layer, #rows, #columns)

NumPy Arrays and Monte Carlo Simulation

Monte Carlo simulations typically involve generating many random samples and performing statistical analyses on them in order to make predictions or estimate uncertainties. NumPy arrays are a convenient and efficient way to store and manipulate large amounts of numerical data, which makes them well-suited for use in Monte Carlo simulations.

import numpy as np

# Set the number of samples
n_samples = 100000

# Generate random samples from a normal distribution
samples = np.random.normal(size=n_samples)

# Compute the mean of the samples
mean = np.mean(samples)

# Compute the standard deviation of the samples
std = np.std(samples)

print(f"Mean: {mean:.4f}")
print(f"Standard deviation: {std:.4f}")

In this example, we use the np.random.normal function to generate n_samples random samples from a normal distribution. We then use the np.mean and np.std functions to compute the mean and standard deviation of the samples. These statistics can be used to estimate the mean and standard deviation of the underlying distribution from which the samples were drawn.

Importing CSVs for Use in Python with NumPy

Code Example:

import csv
with open("winequality-red.csv", 'r') as f:
    wines = list(csv.reader(f, delimiter=";"))
import numpy as np
wines = np.array(wines[1:], dtype=np.float)