Pandas is a software library written for the Python programming language for data manipulation and analysis. It offers data structures for efficiently storing large datasets and tools for working with them. The library is designed to make it easy to work with tabular or structured data, such as data stored in a spreadsheet or a database table. Some of the main features of pandas include:
head()
and tail()
info()
describe()
value_counts()
groupby()
pivot_table()
plot()
hist()
import pandas as pd
# Load the time series data into a pandas DataFrame
df = pd.read_csv('time_series_data.csv')
# Plot the time series data
df.plot(x='date', y='value')
# Select only the data for 2018
df_2018 = df[df['date'].dt.year == 2018]
# Select only the data for January
df_january = df[df['date'].dt.month == 1]
resample()
function to resample the time series data to a different frequency.# Resample the data to monthly mean
df_monthly = df.resample('M', on='date').mean()
rolling()
function to calculate rolling statistics, such as the rolling mean or rolling standard deviation.# Calculate the rolling mean with a window size of 30
df['rolling_mean'] = df['value'].rolling(30).mean()
# Calculate the rolling standard deviation with a window size of 30
df['rolling_std'] = df['value'].rolling(30).std()
tqdm
.map()
, .apply()
, .applymap()
operations.tqdm
is a very useful package that helps predict when theses operations will finish executing.from tqdm import tqdm_notebook
tqdm_notebook().pandas()
# Will show progress/loading bar in Jupyter Notebook