pandas 之 datetime 初识

import numpy as np
import pandas as pd

认识

Time series data is an impotant from of data in many different fields, such as finance, economics, ecology, neuroscience(神经学) and physics. Anything that is observed or measured at many points in time forms a time series.
Many time series are fixed frequency , which is to say that data points occur at regular intervals according to some rule, such as every 15 seconds, every 5 minutes, or once per month.
Time series can also be irregular(不规则的) without a fixed unit of time or offset between units. How you mark and refer to time series data depends on the application, and you may have one of the following:

  • Timestamps, specific instants in time (时间戳)
  • Fixed periods, such as the month January 2007 or the full year 2010 (时期)
  • Intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals. (时间间隔)
  • Experiment or elapsed time(试验时间逝去); each timestamp is a measure of time relative to a particular start time (e.g. the diameter(直径) of a cookie baking each second since being palced in the oven)

In this chapter, I am mainly concerned with time series in the first three categories, though many of the teachniques can applied to experimental time series where the index may be an integer or floating-point number indicating elapsed time from the start of the experiment. The simplest and most widely used kind of time series are those indexed by timestamp.

pandas also supports indexes based on timedeltas, which can be a useful way of representing experiment or elapsed time. We do not explore timedelta indexes in this book , but you can learn more in the pandas documenttaion.

pandas provides many buit-in time series tools and data algorithims. You can efficiently work with very large time series and easily slice and dice, aggregate, and resample(重采样) irrgular-and fixed-frequency time series. Some of these tools are especially useful financial and economics applications, but you could certainly use them to analyze server log, too.

The Pyhton standard library includes data types for date and time data, as well as calendar-related(日历相关) functionality. The datetime, time, calendar modules are the main places to start. the datetime.datetime type, or simply datetime, is widely used.

from datetime import datetime
now = datetime.now()

now
datetime.datetime(2019, 4, 27, 15, 3, 14, 103616)
now.year, now.month, now.day, now.hour, now.minute
(2019, 4, 27, 15, 3)

datetime stores(存储) both the date and time down to the microsecond timedelta reprecents the temporal(临时的) difference between two datetime objects:

"cj 特方便, 在时间相加上"

delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)

delta
'cj 特方便, 在时间相加上'

datetime.timedelta(926, 56700)
delta.days, delta.seconds
(926, 56700)

You can add (or subtract) a timedelata or multiple thereof to a datetime object to yield a new shifted object:

from datetime import timedelta
start = datetime(2011, 1, 7)

"加12天"
start + timedelta(12)  
'加12天'

datetime.datetime(2011, 1, 19, 0, 0)
" 减去24天"
start - 2*timedelta(12) 
' 减去24天'

datetime.datetime(2010, 12, 14, 0, 0)

Table 11-1 summarizes the data types in the datetime module. While this chapter is mainly concerned with the data types in pandas and high-level time series manupulation, you may encounter the datetime-based types in many other places in Pyhton in the wild.

Type Description
date Store calendar date (year, month, day) using the Gregorian calendar
time Store time of day as hours,minutes, seconds, and microseconds
datetime Store both date and time
timedelta Reprecents the difference between tow datetime values(as days,second..)
tzinfo Base type for storing time zone infomation

String和Datetime间的转换

You can format datetime object and pandas Timestamp objects, which I‘ll introduce later, as strings using str or the strftime method, passing a format specification:

stamp = datetime(2011, 1, 3)

stamp
str(stamp)
datetime.datetime(2011, 1, 3, 0, 0)

'2011-01-03 00:00:00'
stamp.strftime('%Y-%m-%d')  # 四位数字的年
'2011-01-03'
stamp.strftime('%y-%m-%d')  # 2位数字的年
'11-01-03'

See Table 11-2 for a complete list of the format codes.

Type Description
%Y Four-digit year(4个数字的年)
%y Two-digit year
%m Two-dight month [01, 12]
%d Two-dight day [01, 31]
%H Hour(24-hour clock) [00, 23]
%I Hour(12-hour clock) [00, 12])
%M Two-dight minute [00, 59]
%S Second [00, 61] (second 60, 61 acccount for leap second)
%w Weekday as integer[0(Sundday), 6]
%U
%W
%z UTC time zone offset as +HHMM or -HHMM; empty if time zone naive
%F Shortcut for %Y-%m-%d (eg. 2012-4-8)
%D Shortcut for %m/%d/%y (eg. 04/18/12)

You can use these same format codes to convert strings to dates using date time.strptime:

value = "2011-01-03"
datetime.strptime(value, '%Y-%m-%d')
datetime.datetime(2011, 1, 3, 0, 0)
datestrs = ['7/6/2011', '8/6/2011']

[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]
[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

Datetime.strptime is a good way to parse a date with a know format. However, it can be a bit annoying to have to write a format spec each time, especially for common date formats.In this case, you can use the parse.parse method in the third-party dateutil package (this is installed automatically when you install pandas).

from dateutil.parser import parse
parse("2011-01-03")
datetime.datetime(2011, 1, 3, 0, 0)
parse("2011/01/03")
datetime.datetime(2011, 1, 3, 0, 0)

dateutil si capable of parsing most human-intelligble date representation:

parse('Jan 31, 1997, 10:45 PM')
datetime.datetime(1997, 1, 31, 22, 45)

In international locales, day appering before month is very common, so you can pass dayfirst=True to indicate this:

parse('6/12/2011', dayfirst=True)
datetime.datetime(2011, 12, 6, 0, 0)

pandas is generally oriented toward working with arrays of dates, whether used an axis index or a column in a DataFrame. The to_datetime method parses many different kinds of date representations. Standard date formats like ISO 8601 can be parsed very quickly:

datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

pd.to_datetime(datestrs)
DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

It also handles values that should be condidered missing (None, empty string. etc.):

idx = pd.to_datetime(datestrs + [None])

idx
DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)
idx[2]
NaT
pd.isnull(idx)
array([False, False,  True])

NaT (Not a Time) is pandas‘s null value for timestamp data.

dateutil.parser is a useful but imperfect tool. Notably, it will recognize some strings as dates that you might prefer that it didn‘t for example. ‘42‘ will be parsed as the year 2042 with today‘s ccalendar date.

datetime objects also have a number of locale-specific formatting options for systems in other countries or languages. For example, the abbreviated(缩写) month names will be different on German or French systems compared with English systme. See Table 11-3 for a listing.

  • %a Abbreviated weekday name
  • %A Full weekday name
  • %b 缩写月份的名字
  • %B 全写月份
  • %c Full date and time (eg. Tue 01 May 2012 04:20:57 PM)
  • %p 包含AM or PM
  • %x (eg. ‘05/01/2012‘)
  • %X (eg. ‘04:24:12 PM‘)

原文地址:https://www.cnblogs.com/chenjieyouge/p/12037578.html

时间: 2024-10-08 07:26:33

pandas 之 datetime 初识的相关文章

(zhuan) Using the latest advancements in AI to predict stock market movements

Using the latest advancements in AI to predict stock market movements 2019-01-13 21:31:18 This blog is copied from: https://github.com/borisbanushev/stockpredictionai In this notebook I will create a complete process for predicting stock price moveme

打卡工具(三)——数据可视化部分

背景 ?数据少的时候我们一眼就可以得出相应的结论,但是数据大量的情况下,我们可能无法快速得到想要的结论,因此可以借助数据可视化把我们想要表达的东西很直观的呈现出来.在打卡工具里面我一共写了四张图表,分别是各月的每日下班时间折线图,每月的总加班时长柱状图,加班等级柱状图,加班等级占比饼图. 技术概览 matplotlib.pyplot pandas numpy datetime 具体实现 ?创建图表我是写在lib库里,需要处理的数据通过构造函数传递进来. class createChart(obj

Pandas Timestamp 和 python 中 datetime 的互相转换

Pandas 的Timestamp 和 python 的 datetime,   这是两种不同的类型. 它们之间可以互相转换. refer to: https://www.jianshu.com/p/96ea42c58abe 原文地址:https://www.cnblogs.com/qingyuanjushi/p/8407421.html

pandas 初识(四)

Pandas 和 sqlalchemy 配合实现分页查询 Mysql 并获取总条数 @api.route('/show', methods=["POST"]) def api_show(): # 分页查询并获取总数 offset = request.json.get('offset', 0) limit = request.json.get('limit', 10) sql = "select SQL_CALC_FOUND_ROWS * from bidata.gen_adi

Python数据分析库pandas ------ 初识 matpoltlib:matplotliab画图怎么显示中文;设置坐标标签;主题;画子图;pandas时间数据格式转化;图例;

打开画布,传入x,y的值,可以简单的画出曲线图 1 import matplotlib.pyplot as plt 2 3 c = [ 4 0.9012051747628913, 0.9012051747628913, 0.9012051747628913, 0.9012051747628913, 5 0.9012051747628913, 0.9012051747628913, 0.9012051747628913, 0.9012051747628913, 6 0.90120517476289

第三节:初识pandas之DataFrame(上)

DataFrame是Python中Pandas库中的一种数据结构,它类似excel,是一种二维表. 原文地址:https://www.cnblogs.com/zhaco/p/10292107.html

pandas 初识(一)

基本内容 Series 是有一组数据(numpy的数据类型 numpy.ndarray)以及一组数据标签(即索引)组成 obj = Series([4, 7, -5, 3]) print(type(obj)) print(type(obj.values)) obj.values <class 'pandas.core.series.Series'><class 'numpy.ndarray'> array([ 4,  7, -5,  3], dtype=int64) 原文地址:ht

pandas 初识(三)

Python Pandas 空值 pandas 判断指定列是否(全部)为NaN(空值) import pandas as pd import numpy as np df = pd.DataFrame({"a": ["aa", np.NAN, np.NAN], "b": [3, np.NAN, 2]}) 判断某列是否有NaN >>> df.a.isnull().any() True 判断是否全部为 NAN >>>

Pandas Api 不完全翻译

原文地址 http://pandas.pydata.org/pandas-docs/stable/api.html API Reference Input/Output Pickling read_pickle(path) Load pickled pandas object (or any other pickled object) from the specified Flat File read_table(filepath_or_buffer[, sep, ...]) Read gene