pandas resample 重采样

下方是pandas中resample方法的定义,帮助文档http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling中有更加详细的解释。

    def resample(self, rule, how=None, axis=0, fill_method=None, closed=None,
                 label=None, convention=‘start‘, kind=None, loffset=None,
                 limit=None, base=0, on=None, level=None):
        """
        Convenience method for frequency conversion and resampling of time
        series.  Object must have a datetime-like index (DatetimeIndex,
        PeriodIndex, or TimedeltaIndex), or pass datetime-like values
        to the on or level keyword.(数据重采样和频率转换,数据必须有时间类型的索引列)

        Parameters
        ----------
        rule : string
            the offset string or object representing target conversion(代表目标转换的偏移量)
        axis : int, optional, default 0(操作的轴信息)
        closed : {‘right‘, ‘left‘}
            Which side of bin interval is closed. The default is ‘left‘
            for all frequency offsets except for ‘M‘, ‘A‘, ‘Q‘, ‘BM‘,
            ‘BA‘, ‘BQ‘, and ‘W‘ which all have a default of ‘right‘.(哪一个方向的间隔是关闭的,)
        label : {‘right‘, ‘left‘}
            Which bin edge label to label bucket with. The default is ‘left‘
            for all frequency offsets except for ‘M‘, ‘A‘, ‘Q‘, ‘BM‘,
            ‘BA‘, ‘BQ‘, and ‘W‘ which all have a default of ‘right‘.(区间的哪一个方向的边界标签保留)
        convention : {‘start‘, ‘end‘, ‘s‘, ‘e‘}
            For PeriodIndex only, controls whether to use the start or end of
            `rule`
        kind: {‘timestamp‘, ‘period‘}, optional
            Pass ‘timestamp‘ to convert the resulting index to a
            ``DateTimeIndex`` or ‘period‘ to convert it to a ``PeriodIndex``.
            By default the input representation is retained.
        loffset : timedelta
            Adjust the resampled time labels
        base : int, default 0
            For frequencies that evenly subdivide 1 day, the "origin" of the
            aggregated intervals. For example, for ‘5min‘ frequency, base could
            range from 0 through 4. Defaults to 0
        on : string, optional
            For a DataFrame, column to use instead of index for resampling.
            Column must be datetime-like.

            .. versionadded:: 0.19.0

        level : string or int, optional
            For a MultiIndex, level (name or number) to use for
            resampling.  Level must be datetime-like.

            .. versionadded:: 0.19.0

        Returns
        -------
        Resampler object

        Notes
        -----
        See the `user guide
        <http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling>`_
        for more.

        To learn more about the offset strings, please see `this link
        <http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases>`__.

        Examples
        --------

        Start by creating a series with 9 one minute timestamps.(新建频率为1min的时间序列)

        >>> index = pd.date_range(‘1/1/2000‘, periods=9, freq=‘T‘)
        >>> series = pd.Series(range(9), index=index)
        >>> series
        2000-01-01 00:00:00    0
        2000-01-01 00:01:00    1
        2000-01-01 00:02:00    2
        2000-01-01 00:03:00    3
        2000-01-01 00:04:00    4
        2000-01-01 00:05:00    5
        2000-01-01 00:06:00    6
        2000-01-01 00:07:00    7
        2000-01-01 00:08:00    8
        Freq: T, dtype: int64

        Downsample the series into 3 minute bins and sum the values
        of the timestamps falling into a bin.(下采样为三分钟)

        >>> series.resample(‘3T‘).sum()
        2000-01-01 00:00:00     3
        2000-01-01 00:03:00    12
        2000-01-01 00:06:00    21
        Freq: 3T, dtype: int64

        Downsample the series into 3 minute bins as above, but label each
        bin using the right edge instead of the left. Please note that the
        value in the bucket used as the label is not included in the bucket,
        which it labels. For example, in the original series the
        bucket ``2000-01-01 00:03:00`` contains the value 3, but the summed
        value in the resampled bucket with the label ``2000-01-01 00:03:00``
        does not include 3 (if it did, the summed value would be 6, not 3).
        To include this value close the right side of the bin interval as
        illustrated in the example below this one.

        >>> series.resample(‘3T‘, label=‘right‘).sum()(保留间隔的右侧标签,上一个结果是左侧标签)
        2000-01-01 00:03:00     3
        2000-01-01 00:06:00    12
        2000-01-01 00:09:00    21
        Freq: 3T, dtype: int64

        Downsample the series into 3 minute bins as above, but close the right
        side of the bin interval.

        >>> series.resample(‘3T‘, label=‘right‘, closed=‘right‘).sum()
        2000-01-01 00:00:00     0
        2000-01-01 00:03:00     6
        2000-01-01 00:06:00    15
        2000-01-01 00:09:00    15
        Freq: 3T, dtype: int64

        Upsample the series into 30 second bins.

        >>> series.resample(‘30S‘).asfreq()[0:5] #select first 5 rows
        2000-01-01 00:00:00   0.0
        2000-01-01 00:00:30   NaN
        2000-01-01 00:01:00   1.0
        2000-01-01 00:01:30   NaN
        2000-01-01 00:02:00   2.0
        Freq: 30S, dtype: float64

        Upsample the series into 30 second bins and fill the ``NaN``
        values using the ``pad`` method.

        >>> series.resample(‘30S‘).pad()[0:5]
        2000-01-01 00:00:00    0
        2000-01-01 00:00:30    0
        2000-01-01 00:01:00    1
        2000-01-01 00:01:30    1
        2000-01-01 00:02:00    2
        Freq: 30S, dtype: int64

        Upsample the series into 30 second bins and fill the
        ``NaN`` values using the ``bfill`` method.

        >>> series.resample(‘30S‘).bfill()[0:5]
        2000-01-01 00:00:00    0
        2000-01-01 00:00:30    1
        2000-01-01 00:01:00    1
        2000-01-01 00:01:30    2
        2000-01-01 00:02:00    2
        Freq: 30S, dtype: int64

        Pass a custom function via ``apply``

        >>> def custom_resampler(array_like):
        ...     return np.sum(array_like)+5

        >>> series.resample(‘3T‘).apply(custom_resampler)
        2000-01-01 00:00:00     8
        2000-01-01 00:03:00    17
        2000-01-01 00:06:00    26
        Freq: 3T, dtype: int64

        For a Series with a PeriodIndex, the keyword `convention` can be
        used to control whether to use the start or end of `rule`.

        >>> s = pd.Series([1, 2], index=pd.period_range(‘2012-01-01‘,
                                                        freq=‘A‘,
                                                        periods=2))
        >>> s
        2012    1
        2013    2
        Freq: A-DEC, dtype: int64

        Resample by month using ‘start‘ `convention`. Values are assigned to
        the first month of the period.

        >>> s.resample(‘M‘, convention=‘start‘).asfreq().head()
        2012-01    1.0
        2012-02    NaN
        2012-03    NaN
        2012-04    NaN
        2012-05    NaN
        Freq: M, dtype: float64

        Resample by month using ‘end‘ `convention`. Values are assigned to
        the last month of the period.

        >>> s.resample(‘M‘, convention=‘end‘).asfreq()
        2012-12    1.0
        2013-01    NaN
        2013-02    NaN
        2013-03    NaN
        2013-04    NaN
        2013-05    NaN
        2013-06    NaN
        2013-07    NaN
        2013-08    NaN
        2013-09    NaN
        2013-10    NaN
        2013-11    NaN
        2013-12    2.0
        Freq: M, dtype: float64

        For DataFrame objects, the keyword ``on`` can be used to specify the
        column instead of the index for resampling.

        >>> df = pd.DataFrame(data=9*[range(4)], columns=[‘a‘, ‘b‘, ‘c‘, ‘d‘])
        >>> df[‘time‘] = pd.date_range(‘1/1/2000‘, periods=9, freq=‘T‘)
        >>> df.resample(‘3T‘, on=‘time‘).sum()
                             a  b  c  d
        time
        2000-01-01 00:00:00  0  3  6  9
        2000-01-01 00:03:00  0  3  6  9
        2000-01-01 00:06:00  0  3  6  9

        For a DataFrame with MultiIndex, the keyword ``level`` can be used to
        specify on level the resampling needs to take place.

        >>> time = pd.date_range(‘1/1/2000‘, periods=5, freq=‘T‘)
        >>> df2 = pd.DataFrame(data=10*[range(4)],
                               columns=[‘a‘, ‘b‘, ‘c‘, ‘d‘],
                               index=pd.MultiIndex.from_product([time, [1, 2]])
                               )
        >>> df2.resample(‘3T‘, level=0).sum()
                             a  b   c   d
        2000-01-01 00:00:00  0  6  12  18
        2000-01-01 00:03:00  0  4   8  12

原文地址:https://www.cnblogs.com/jinqier/p/9280813.html

时间: 2024-11-05 15:00:02

pandas resample 重采样的相关文章

pandas的resample重采样

Pandas中的resample,重新采样,是对原样本重新处理的一个方法,是一个对常规时间序列数据重新采样和频率转换的便捷的方法. 降采样:高频数据到低频数据 升采样:低频数据到高频数据 主要函数:resample()(pandas对象都会有这个方法) resample方法的参数 参数 说明 freq 表示重采样频率,例如'M'.'5min',Second(15) how='mean' 用于产生聚合值的函数名或数组函数,例如'mean'.'ohlc'.np.max等,默认是'mean',其他常用

ffmpeg实现音频resample(重采样)(二)

本篇文章将增加AVFifoBuffer和音频样本是av_sample_fmt_is_planar的样式采样率讲解,下面上代码 AVFifoBuffer * m_fifo = NULL; SwrContext * init_pcm_resample(AVFrame *in_frame, AVFrame *out_frame) { SwrContext * swr_ctx = NULL; swr_ctx = swr_alloc(); if (!swr_ctx) { printf("swr_alloc

pandas 之 datetime 初识

import numpy as np import pandas as pd 认识 Time series data is an impotant from of data in many different fields, such as finance, economics, ecology, neuroscience(神经学) and physics. Anything that is observed or measured at many points in time forms a

高端实战 Python数据分析与机器学习实战 Numpy/Pandas/Matplotlib等常用库

课程简介:? ? 课程风格通俗易懂,真实案例实战.精心挑选真实的数据集为案例,通过Python数据科学库numpy,pandas,matplot结合机器学习库scikit-learn完成一些列的机器学习案例.课程以实战为基础,所有课时都结合代码演示如何使用这些python库来完成一个真实的数据案例.算法与项目相结合,选择经典kaggle项目,从数据预处理开始一步步代码实战带大家快速入门机器学习.旨在帮助同学们快速上手如何使用python库来完整机器学习案例. ------------------

[读书笔记] Python 数据分析 (十一)经济和金融数据应用

resample: 重采样函数,可以按照时间来提高或者降低采样频率,fill_method可以使用不同的填充方式. pandas.data_range 的freq参数枚举: Alias Description B business day frequency C custom business day frequency D calendar day frequency W weekly frequency M month end frequency SM semi-month end freq

Android多媒体开发介绍(转)

Android多媒体开发介绍 转自:http://blog.csdn.net/reiliu/article/details/9060557 一.       多媒体架构 基于第三方PacketVideo公司的OpenCORE来实现,支持所有通用的音频/视频/静态图像格式,包括:MPEG4.H.264.MP3.AAC.AMR.JPG.PNG.GIF等.从功能上分为两部分,一是音/视频的回放(PlayBack),二是音视频的纪录(Recorder). CODEC(编解码器)使用OpenMAX 1L

直播疑难杂症排查(8)— 播放杂音、噪音、回声问题

本文为 <直播疑难杂症排查>系列的第八篇文章,我们重点看看直播过程中出现的杂音.噪音和回声等问题. 相比于视频而言,音频要敏感得多,视频画面有噪点.马赛克都还是可以勉强被接受,而声音一旦有任何瑕疵,人耳都会特别容易感觉到,而且难以忍受. 1.  问题现象 常见的音频问题现象描述如下: - 电流音,爆音,滋滋声或者嘟嘟声 - 声音断断续续,听不清楚 - 回声,能听到自己说话的声音 2. 问题排查 2.1 参数配置问题 上面也有提到,音频是一个特别敏感的东西,涉及到许多参数配置,一旦配置不太匹配,

音频特征提取——librosa工具包使用

作者:桂. 时间:2017-05-06  11:20:47 链接:http://www.cnblogs.com/xingshansi/p/6816308.html 前言 本文主要记录librosa工具包的使用,librosa在音频.乐音信号的分析中经常用到,是python的一个工具包,这里主要记录它的相关内容以及安装步骤,用的是python3.5以及win8.1环境. 一.MIR简介 音乐信息检索(Music information retrieval,MIR)主要翻译自wikipedia. M

深入剖析Android音频之AudioTrack

播放声音可以用MediaPlayer和AudioTrack,两者都提供了java API供应用开发者使用.虽然都可以播放声音,但两者还是有很大的区别的.其中最大的区别是MediaPlayer可以播放多种格式的声音文件,例如MP3,AAC,WAV,OGG,MIDI等.MediaPlayer会在framework层创建对应的音频解码器.而AudioTrack只能播放已经解码的PCM流,如果是文件的话只支持wav格式的音频文件,因为wav格式的音频文件大部分都是PCM流.AudioTrack不创建解码