Timestamp
This is a dates and time tutorial.
Pandas provides powerful methods for dealing with dates and time. The data structure for dates and time in Pandas is a Timestamp. A timestamp is a single point in time. Let’s see how to create and deal with timestamps in Pandas.
The timestamp data structure uses the constructer “Timestamp()” – capital T is a must, otherwise the program outputs an error. We can now check how to create a timestamp for the current time, specific date, and specific date and time.
import pandas as pd # Current date and time current_timestamp = pd.Timestamp.now() print(current_timestamp) # Specific date timestamp1 = pd.Timestamp('2024-04-06') print(timestamp1) # Specific date and time timestamp2 = pd.Timestamp(2024, 4, 6, 12, 30, 0) print(timestamp2)
2024-04-06 16:42:59.073490 2024-04-06 00:00:00 2024-04-06 12:30:00
Access timestamp components
We can access separate components of a timestamp data structure.
import pandas as pd timestamp = pd.Timestamp(2024, 4, 6, 12, 30, 0) print(timestamp) print("Year:", timestamp.year) print("Month:", timestamp.month) print("Day:", timestamp.day) print("Hour:", timestamp.hour) print("Minute:", timestamp.minute) print("Second:", timestamp.second)
Year: 2024 Month: 4 Day: 6 Hour: 12 Minute: 30 Second: 0
Strings to timestamp
In real projects, very often data comes from external sources. Such data may be in the string format. Pandas has the method “to_datetime()“. This technique converts string data (text) into timestamp data structure. Let’s explore the method.
# Random string date and time timestamp_str = '2024-04-06 12:30:00' # Convert to Timestamp parsed_timestamp = pd.to_datetime(timestamp_str) # Display outcome print(parsed_timestamp) # Outcome: 2024-04-06 12:30:00
Convert DataFrame
Let’s try another example, a bit more realistic (from a Pandas dataframe). First, we create a random dataframe, containing some values and dates, and then convert the date column into a timestamp format.
import pandas as pd # Sample DataFrame df = pd.DataFrame({ 'date': ['2024-01-01', '2024-01-02', '2024-01-03'], 'income': [100.79, 200.00, 305.95] }) print(df) print(df.dtypes)
# Data date income 0 2024-01-01 100.79 1 2024-01-02 200.00 2 2024-01-03 305.95 # Data types date object income float64 dtype: object
The date column is the data type object. Now, we convert it to data type datetime. We do the converting only on the date column.
# Convert the 'date' column to datetime type df['date'] = pd.to_datetime(df['date']) print(df.dtypes)
date datetime64[ns] income float64 dtype: object
Extract timestamp components
Within this section, extracting refers to the separation of values into different columns.
We continue the above example. Date column must be converted before extracting the components. Otherwise, the program outputs an error. Let’s explore the example.
import pandas as pd # Sample DataFrame df = pd.DataFrame({ 'date': ['2024-01-01', '2024-01-02', '2024-01-03'], 'income': [100.79, 200.00, 305.95] }) # Convert the 'date' column to datetime type df['date'] = pd.to_datetime(df['date']) # Extract componenets df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month df['day'] = df['date'].dt.day print(df)
date income year month day 0 2024-01-01 100.79 2024 1 1 1 2024-01-02 200.00 2024 1 2 2 2024-01-03 305.95 2024 1 3
Other methods
Once we have a datetime column, we can apply various Pandas methods. Such as, filtering, resampling and grouping, shifting dates, calculating differences, and many more.
Let’s dive into some of them.
# Filter by day df_day = df[df['date'].dt.day == 2] print(df_day) # Resample data by month monthly_data = df.resample('M', on='date').sum() print(monthly_data) # Shift dates by 1 period df['previous_date'] = df['date'].shift(1) print(df) # Calculate the difference between dates df['date_diff'] = df['date'] - df['previous_date'] print(df)
The method shift has multiple options for flexibility (check here). The previous date column shifts all data with 1 day backwards. Since the 1st of Jan 2024 is the first day of the year, the previous date shows as NaT (Pandas missing date).
date income 1 2024-01-02 200.0 income date 2024-01-31 606.74 date income previous_date 0 2024-01-01 100.79 NaT 1 2024-01-02 200.00 2024-01-01 2 2024-01-03 305.95 2024-01-02 date income previous_date date_diff 0 2024-01-01 100.79 NaT NaT 1 2024-01-02 200.00 2024-01-01 1 days 2 2024-01-03 305.95 2024-01-02 1 days
This is an original dates and time educational material created by aicorr.com.
Next: Time Series Analysis