人妖在线一区,国产日韩欧美一区二区综合在线,国产啪精品视频网站免费,欧美内射深插日本少妇

新聞動(dòng)態(tài)

Python Pandas高級(jí)教程之時(shí)間處理

發(fā)布日期:2021-12-26 16:27 | 文章來源:CSDN

簡介

時(shí)間應(yīng)該是在數(shù)據(jù)處理中經(jīng)常會(huì)用到的一種數(shù)據(jù)類型,除了Numpy中datetime64 和 timedelta64 這兩種數(shù)據(jù)類型之外,pandas 還整合了其他python庫比如 scikits.timeseries 中的功能。

時(shí)間分類

pandas中有四種時(shí)間類型:

  1. Date times : 日期和時(shí)間,可以帶時(shí)區(qū)。和標(biāo)準(zhǔn)庫中的 datetime.datetime 類似。
  2. Time deltas: 絕對(duì)持續(xù)時(shí)間,和 標(biāo)準(zhǔn)庫中的 datetime.timedelta 類似。
  3. Time spans: 由時(shí)間點(diǎn)及其關(guān)聯(lián)的頻率定義的時(shí)間跨度。
  4. Date offsets:基于日歷計(jì)算的時(shí)間 和 dateutil.relativedelta.relativedelta 類似。

我們用一張表來表示:

類型 標(biāo)量class 數(shù)組class pandas數(shù)據(jù)類型 主要?jiǎng)?chuàng)建方法
Date times Timestamp DatetimeIndex datetime64[ns] or datetime64[ns, tz] to_datetime or date_range
Time deltas Timedelta TimedeltaIndex timedelta64[ns] to_timedelta or timedelta_range
Time spans Period PeriodIndex period[freq] Period or period_range
Date offsets DateOffset None None DateOffset

看一個(gè)使用的例子:

In [19]: pd.Series(range(3), index=pd.date_range("2000", freq="D", periods=3))
Out[19]: 
2000-01-01 0
2000-01-02 1
2000-01-03 2
Freq: D, dtype: int64

看一下上面數(shù)據(jù)類型的空值:

In [24]: pd.Timestamp(pd.NaT)
Out[24]: NaT
In [25]: pd.Timedelta(pd.NaT)
Out[25]: NaT
In [26]: pd.Period(pd.NaT)
Out[26]: NaT
# Equality acts as np.nan would
In [27]: pd.NaT == pd.NaT
Out[27]: False

Timestamp

Timestamp 是最基礎(chǔ)的時(shí)間類型,我們可以這樣創(chuàng)建:

In [28]: pd.Timestamp(datetime.datetime(2012, 5, 1))
Out[28]: Timestamp('2012-05-01 00:00:00')
In [29]: pd.Timestamp("2012-05-01")
Out[29]: Timestamp('2012-05-01 00:00:00')
In [30]: pd.Timestamp(2012, 5, 1)
Out[30]: Timestamp('2012-05-01 00:00:00')

DatetimeIndex

Timestamp 作為index會(huì)自動(dòng)被轉(zhuǎn)換為DatetimeIndex:

In [33]: dates = [
....:  pd.Timestamp("2012-05-01"),
....:  pd.Timestamp("2012-05-02"),
....:  pd.Timestamp("2012-05-03"),
....: ]
....: 
In [34]: ts = pd.Series(np.random.randn(3), dates)
In [35]: type(ts.index)
Out[35]: pandas.core.indexes.datetimes.DatetimeIndex
In [36]: ts.index
Out[36]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)
In [37]: ts
Out[37]: 
2012-05-01 0.469112
2012-05-02-0.282863
2012-05-03-1.509059
dtype: float64

date_range 和 bdate_range

還可以使用 date_range 來創(chuàng)建DatetimeIndex:

In [74]: start = datetime.datetime(2011, 1, 1)
In [75]: end = datetime.datetime(2012, 1, 1)
In [76]: index = pd.date_range(start, end)
In [77]: index
Out[77]: 
DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
'2011-01-09', '2011-01-10',
...
'2011-12-23', '2011-12-24', '2011-12-25', '2011-12-26',
'2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30',
'2011-12-31', '2012-01-01'],
  dtype='datetime64[ns]', length=366, freq='D')

date_range 是日歷范圍,bdate_range 是工作日范圍:

In [78]: index = pd.bdate_range(start, end)
In [79]: index
Out[79]: 
DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
'2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14',
...
'2011-12-19', '2011-12-20', '2011-12-21', '2011-12-22',
'2011-12-23', '2011-12-26', '2011-12-27', '2011-12-28',
'2011-12-29', '2011-12-30'],
  dtype='datetime64[ns]', length=260, freq='B')

兩個(gè)方法都可以帶上 start, end, 和 periods 參數(shù)。

In [84]: pd.bdate_range(end=end, periods=20)
In [83]: pd.date_range(start, end, freq="W")
In [86]: pd.date_range("2018-01-01", "2018-01-05", periods=5)

origin

使用 origin參數(shù),可以修改 DatetimeIndex 的起點(diǎn):

In [67]: pd.to_datetime([1, 2, 3], unit="D", origin=pd.Timestamp("1960-01-01"))
Out[67]: DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None)

默認(rèn)情況下 origin='unix', 也就是起點(diǎn)是 1970-01-01 00:00:00.

In [68]: pd.to_datetime([1, 2, 3], unit="D")
Out[68]: DatetimeIndex(['1970-01-02', '1970-01-03', '1970-01-04'], dtype='datetime64[ns]', freq=None)

格式化

使用format參數(shù)可以對(duì)時(shí)間進(jìn)行格式化:

In [51]: pd.to_datetime("2010/11/12", format="%Y/%m/%d")
Out[51]: Timestamp('2010-11-12 00:00:00')
In [52]: pd.to_datetime("12-11-2010 00:00", format="%d-%m-%Y %H:%M")
Out[52]: Timestamp('2010-11-12 00:00:00')

Period

Period 表示的是一個(gè)時(shí)間跨度,通常和freq一起使用:

In [31]: pd.Period("2011-01")
Out[31]: Period('2011-01', 'M')
In [32]: pd.Period("2012-05", freq="D")
Out[32]: Period('2012-05-01', 'D')

Period可以直接進(jìn)行運(yùn)算:

In [345]: p = pd.Period("2012", freq="A-DEC")
In [346]: p + 1
Out[346]: Period('2013', 'A-DEC')
In [347]: p - 3
Out[347]: Period('2009', 'A-DEC')
In [348]: p = pd.Period("2012-01", freq="2M")
In [349]: p + 2
Out[349]: Period('2012-05', '2M')
In [350]: p - 1
Out[350]: Period('2011-11', '2M')

注意,Period只有具有相同的freq才能進(jìn)行算數(shù)運(yùn)算。包括 offsets 和 timedelta

In [352]: p = pd.Period("2014-07-01 09:00", freq="H")
In [353]: p + pd.offsets.Hour(2)
Out[353]: Period('2014-07-01 11:00', 'H')
In [354]: p + datetime.timedelta(minutes=120)
Out[354]: Period('2014-07-01 11:00', 'H')
In [355]: p + np.timedelta64(7200, "s")
Out[355]: Period('2014-07-01 11:00', 'H')

Period作為index可以自動(dòng)被轉(zhuǎn)換為PeriodIndex:

In [38]: periods = [pd.Period("2012-01"), pd.Period("2012-02"), pd.Period("2012-03")]
In [39]: ts = pd.Series(np.random.randn(3), periods)
In [40]: type(ts.index)
Out[40]: pandas.core.indexes.period.PeriodIndex
In [41]: ts.index
Out[41]: PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M')
In [42]: ts
Out[42]: 
2012-01-1.135632
2012-02 1.212112
2012-03-0.173215
Freq: M, dtype: float64

可以通過 pd.period_range 方法來創(chuàng)建 PeriodIndex:

In [359]: prng = pd.period_range("1/1/2011", "1/1/2012", freq="M")
In [360]: prng
Out[360]: 
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
 '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
 '2012-01'],
dtype='period[M]', freq='M')

還可以通過PeriodIndex直接創(chuàng)建:

In [361]: pd.PeriodIndex(["2011-1", "2011-2", "2011-3"], freq="M")
Out[361]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='period[M]', freq='M')

DateOffset

DateOffset表示的是頻率對(duì)象。它和Timedelta很類似,表示的是一個(gè)持續(xù)時(shí)間,但是有特殊的日歷規(guī)則。比如Timedelta一天肯定是24小時(shí),而在 DateOffset中根據(jù)夏令時(shí)的不同,一天可能會(huì)有23,24或者25小時(shí)。

# This particular day contains a day light savings time transition
In [144]: ts = pd.Timestamp("2016-10-30 00:00:00", tz="Europe/Helsinki")
# Respects absolute time
In [145]: ts + pd.Timedelta(days=1)
Out[145]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')
# Respects calendar time
In [146]: ts + pd.DateOffset(days=1)
Out[146]: Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki')
In [147]: friday = pd.Timestamp("2018-01-05")
In [148]: friday.day_name()
Out[148]: 'Friday'
# Add 2 business days (Friday --> Tuesday)
In [149]: two_business_days = 2 * pd.offsets.BDay()
In [150]: two_business_days.apply(friday)
Out[150]: Timestamp('2018-01-09 00:00:00')
In [151]: friday + two_business_days
Out[151]: Timestamp('2018-01-09 00:00:00')
In [152]: (friday + two_business_days).day_name()
Out[152]: 'Tuesday'

DateOffsets 和Frequency 運(yùn)算是先關(guān)的,看一下可用的Date Offset 和它相關(guān)聯(lián)的 Frequency:

Date Offset Frequency String 描述
DateOffset None 通用的offset 類
BDay or BusinessDay 'B' 工作日
CDay or CustomBusinessDay 'C' 自定義的工作日
Week 'W' 一周
WeekOfMonth 'WOM' 每個(gè)月的第幾周的第幾天
LastWeekOfMonth 'LWOM' 每個(gè)月最后一周的第幾天
MonthEnd 'M' 日歷月末
MonthBegin 'MS' 日歷月初
BMonthEnd or BusinessMonthEnd 'BM' 營業(yè)月底
BMonthBegin or BusinessMonthBegin 'BMS' 營業(yè)月初
CBMonthEnd or CustomBusinessMonthEnd 'CBM' 自定義營業(yè)月底
CBMonthBegin or CustomBusinessMonthBegin 'CBMS' 自定義營業(yè)月初
SemiMonthEnd 'SM' 日歷月末的第15天
SemiMonthBegin 'SMS' 日歷月初的第15天
QuarterEnd 'Q' 日歷季末
QuarterBegin 'QS' 日歷季初
BQuarterEnd 'BQ 工作季末
BQuarterBegin 'BQS' 工作季初
FY5253Quarter 'REQ' 零售季( 52-53 week)
YearEnd 'A' 日歷年末
YearBegin 'AS' or 'BYS' 日歷年初
BYearEnd 'BA' 營業(yè)年末
BYearBegin 'BAS' 營業(yè)年初
FY5253 'RE' 零售年 (aka 52-53 week)
Easter None 復(fù)活節(jié)假期
BusinessHour 'BH' business hour
CustomBusinessHour 'CBH' custom business hour
Day 'D' 一天的絕對(duì)時(shí)間
Hour 'H' 一小時(shí)
Minute 'T' or 'min' 一分鐘
Second 'S' 一秒鐘
Milli 'L' or 'ms' 一微妙
Micro 'U' or 'us' 一毫秒
Nano 'N' 一納秒

DateOffset還有兩個(gè)方法 rollforward() 和 rollback() 可以將時(shí)間進(jìn)行移動(dòng):

In [153]: ts = pd.Timestamp("2018-01-06 00:00:00")
In [154]: ts.day_name()
Out[154]: 'Saturday'
# BusinessHour's valid offset dates are Monday through Friday
In [155]: offset = pd.offsets.BusinessHour(start="09:00")
# Bring the date to the closest offset date (Monday)
In [156]: offset.rollforward(ts)
Out[156]: Timestamp('2018-01-08 09:00:00')
# Date is brought to the closest offset date first and then the hour is added
In [157]: ts + offset
Out[157]: Timestamp('2018-01-08 10:00:00')

上面的操作會(huì)自動(dòng)保存小時(shí),分鐘等信息,如果想要設(shè)置為 00:00:00 , 可以調(diào)用normalize() 方法:

In [158]: ts = pd.Timestamp("2014-01-01 09:00")
In [159]: day = pd.offsets.Day()
In [160]: day.apply(ts)
Out[160]: Timestamp('2014-01-02 09:00:00')
In [161]: day.apply(ts).normalize()
Out[161]: Timestamp('2014-01-02 00:00:00')
In [162]: ts = pd.Timestamp("2014-01-01 22:00")
In [163]: hour = pd.offsets.Hour()
In [164]: hour.apply(ts)
Out[164]: Timestamp('2014-01-01 23:00:00')
In [165]: hour.apply(ts).normalize()
Out[165]: Timestamp('2014-01-01 00:00:00')
In [166]: hour.apply(pd.Timestamp("2014-01-01 23:30")).normalize()
Out[166]: Timestamp('2014-01-02 00:00:00')

作為index

時(shí)間可以作為index,并且作為index的時(shí)候會(huì)有一些很方便的特性。

可以直接使用時(shí)間來獲取相應(yīng)的數(shù)據(jù):

In [99]: ts["1/31/2011"]
Out[99]: 0.11920871129693428
In [100]: ts[datetime.datetime(2011, 12, 25):]
Out[100]: 
2011-12-30 0.56702
Freq: BM, dtype: float64
In [101]: ts["10/31/2011":"12/31/2011"]
Out[101]: 
2011-10-31 0.271860
2011-11-30-0.424972
2011-12-30 0.567020
Freq: BM, dtype: float64

獲取全年的數(shù)據(jù):

In [102]: ts["2011"]
Out[102]: 
2011-01-31 0.119209
2011-02-28-1.044236
2011-03-31-0.861849
2011-04-29-2.104569
2011-05-31-0.494929
2011-06-30 1.071804
2011-07-29 0.721555
2011-08-31-0.706771
2011-09-30-1.039575
2011-10-31 0.271860
2011-11-30-0.424972
2011-12-30 0.567020
Freq: BM, dtype: float64

獲取某個(gè)月的數(shù)據(jù):

In [103]: ts["2011-6"]
Out[103]: 
2011-06-30 1.071804
Freq: BM, dtype: float64

DF可以接受時(shí)間作為loc的參數(shù):

In [105]: dft
Out[105]: 
A
2013-01-01 00:00:00  0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00  0.113648
2013-01-01 00:04:00 -1.478427
...  ...
2013-03-11 10:35:00 -0.747967
2013-03-11 10:36:00 -0.034523
2013-03-11 10:37:00 -0.201754
2013-03-11 10:38:00 -1.509067
2013-03-11 10:39:00 -1.693043
[100000 rows x 1 columns]
In [106]: dft.loc["2013"]
Out[106]: 
A
2013-01-01 00:00:00  0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00  0.113648
2013-01-01 00:04:00 -1.478427
...  ...
2013-03-11 10:35:00 -0.747967
2013-03-11 10:36:00 -0.034523
2013-03-11 10:37:00 -0.201754
2013-03-11 10:38:00 -1.509067
2013-03-11 10:39:00 -1.693043
[100000 rows x 1 columns]

時(shí)間切片:

In [107]: dft["2013-1":"2013-2"]
Out[107]: 
A
2013-01-01 00:00:00  0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00  0.113648
2013-01-01 00:04:00 -1.478427
...  ...
2013-02-28 23:55:00  0.850929
2013-02-28 23:56:00  0.976712
2013-02-28 23:57:00 -2.693884
2013-02-28 23:58:00 -1.575535
2013-02-28 23:59:00 -1.573517
[84960 rows x 1 columns]

切片和完全匹配

考慮下面的一個(gè)精度為分的Series對(duì)象:

In [120]: series_minute = pd.Series(
.....:  [1, 2, 3],
.....:  pd.DatetimeIndex(
.....:["2011-12-31 23:59:00", "2012-01-01 00:00:00", "2012-01-01 00:02:00"]
.....:  ),
.....: )
.....: 
In [121]: series_minute.index.resolution
Out[121]: 'minute'

時(shí)間精度小于分的話,返回的是一個(gè)Series對(duì)象:

In [122]: series_minute["2011-12-31 23"]
Out[122]: 
2011-12-31 23:59:00 1
dtype: int64

時(shí)間精度大于分的話,返回的是一個(gè)常量:

In [123]: series_minute["2011-12-31 23:59"]
Out[123]: 1
In [124]: series_minute["2011-12-31 23:59:00"]
Out[124]: 1

同樣的,如果精度為秒的話,小于秒會(huì)返回一個(gè)對(duì)象,等于秒會(huì)返回常量值。

時(shí)間序列的操作

Shifting

使用shift方法可以讓 time series 進(jìn)行相應(yīng)的移動(dòng):

In [275]: ts = pd.Series(range(len(rng)), index=rng)
In [276]: ts = ts[:5]
In [277]: ts.shift(1)
Out[277]: 
2012-01-01 NaN
2012-01-02 0.0
2012-01-03 1.0
Freq: D, dtype: float64

通過指定 freq , 可以設(shè)置shift的方式:

In [278]: ts.shift(5, freq="D")
Out[278]: 
2012-01-06 0
2012-01-07 1
2012-01-08 2
Freq: D, dtype: int64
In [279]: ts.shift(5, freq=pd.offsets.BDay())
Out[279]: 
2012-01-06 0
2012-01-09 1
2012-01-10 2
dtype: int64
In [280]: ts.shift(5, freq="BM")
Out[280]: 
2012-05-31 0
2012-05-31 1
2012-05-31 2
dtype: int64

頻率轉(zhuǎn)換

時(shí)間序列可以通過調(diào)用 asfreq 的方法轉(zhuǎn)換其頻率:

In [281]: dr = pd.date_range("1/1/2010", periods=3, freq=3 * pd.offsets.BDay())
In [282]: ts = pd.Series(np.random.randn(3), index=dr)
In [283]: ts
Out[283]: 
2010-01-01 1.494522
2010-01-06-0.778425
2010-01-11-0.253355
Freq: 3B, dtype: float64
In [284]: ts.asfreq(pd.offsets.BDay())
Out[284]: 
2010-01-01 1.494522
2010-01-04NaN
2010-01-05NaN
2010-01-06-0.778425
2010-01-07NaN
2010-01-08NaN
2010-01-11-0.253355
Freq: B, dtype: float64

asfreq還可以指定修改頻率過后的填充方法:

In [285]: ts.asfreq(pd.offsets.BDay(), method="pad")
Out[285]: 
2010-01-01 1.494522
2010-01-04 1.494522
2010-01-05 1.494522
2010-01-06-0.778425
2010-01-07-0.778425
2010-01-08-0.778425
2010-01-11-0.253355
Freq: B, dtype: float64

Resampling 重新取樣

給定的時(shí)間序列可以通過調(diào)用resample方法來重新取樣:

In [286]: rng = pd.date_range("1/1/2012", periods=100, freq="S")
In [287]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
In [288]: ts.resample("5Min").sum()
Out[288]: 
2012-01-01 25103
Freq: 5T, dtype: int64

resample 可以接受各類統(tǒng)計(jì)方法,比如: sum, mean, std, sem, max, min, median, first, last, ohlc。

In [289]: ts.resample("5Min").mean()
Out[289]: 
2012-01-01 251.03
Freq: 5T, dtype: float64
In [290]: ts.resample("5Min").ohlc()
Out[290]: 
open  high  low  close
2012-01-01308460 9 205
In [291]: ts.resample("5Min").max()
Out[291]: 
2012-01-01 460
Freq: 5T, dtype: int64

總結(jié)

到此這篇關(guān)于Python Pandas高級(jí)教程之時(shí)間處理的文章就介紹到這了,更多相關(guān)Pandas時(shí)間處理內(nèi)容請(qǐng)搜索本站以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持本站!

版權(quán)聲明:本站文章來源標(biāo)注為YINGSOO的內(nèi)容版權(quán)均為本站所有,歡迎引用、轉(zhuǎn)載,請(qǐng)保持原文完整并注明來源及原文鏈接。禁止復(fù)制或仿造本網(wǎng)站,禁止在非www.sddonglingsh.com所屬的服務(wù)器上建立鏡像,否則將依法追究法律責(zé)任。本站部分內(nèi)容來源于網(wǎng)友推薦、互聯(lián)網(wǎng)收集整理而來,僅供學(xué)習(xí)參考,不代表本站立場,如有內(nèi)容涉嫌侵權(quán),請(qǐng)聯(lián)系alex-e#qq.com處理。

相關(guān)文章

實(shí)時(shí)開通

自選配置、實(shí)時(shí)開通

免備案

全球線路精選!

全天候客戶服務(wù)

7x24全年不間斷在線

專屬顧問服務(wù)

1對(duì)1客戶咨詢顧問

在線
客服

在線客服:7*24小時(shí)在線

客服
熱線

400-630-3752
7*24小時(shí)客服服務(wù)熱線

關(guān)注
微信

關(guān)注官方微信
頂部