stepwise

# -*- coding: utf-8 -*-

import statsmodels.formula.api as smf

def forward_selected(data, response, sle=0.05):
    """Linear model designed by forward selection.

    Parameters:
    -----------
    data: pandas DataFrame with all possible predictors and response

    response: string, name of response column in data

    sle: significance level of a variable into the model

    Returns:
    --------
    model: an "optimal" fitted statsmodels linear model
           with an intercept selected by forward selection
    """
    remaining = set(data.columns)
    remaining.remove(response)
    selected = []
    while remaining:
        scores_with_candidates = []
        for candidate in remaining:
            formula = "{} ~ {} + 1".format(response, ‘ + ‘.join(selected + [candidate]))
            score = smf.logit(formula, data).fit().pvalues[candidate]
            scores_with_candidates.append((score, candidate))
        scores_with_candidates.sort()
        best_new_score, best_candidate = scores_with_candidates.pop(0)
        if best_new_score <= sle:
            remaining.remove(best_candidate)
            selected.append(best_candidate)
        else:break
    formula = "{} ~ {} + 1".format(response, ‘ + ‘.join(selected))
    model = smf.logit(formula, data).fit()
    return model

def backward_selected(data, response, sls=0.01):
    """Linear model designed by backward selection.

    Parameters:
    -----------
    data: pandas DataFrame with all possible predictors and response

    response: string, name of response column in data

    sls: significance level of a variable to stay in the model

    Returns:
    --------
    model: an "optimal" fitted statsmodels linear model
           with an intercept selected by backward selection
    """
    remaining = set(data.columns)
    remaining.remove(response)
    while remaining:
        formula = "{} ~ {} + 1".format(response, ‘ + ‘.join(remaining))
        scores = smf.logit(formula, data).fit().pvalues
        worst_new_score = scores.max()
        worst_candidate = scores.idxmax()
        if worst_new_score > sls:
            remaining.remove(worst_candidate)
        else:break
    formula = "{} ~ {} + 1".format(response, ‘ + ‘.join(remaining))
    model = smf.logit(formula, data).fit()
    return model

def stepwise_selected(data, response, sle=0.05, sls=0.01):
    """Linear model designed by stepwise selection.

    Parameters:
    -----------
    data: pandas DataFrame with all possible predictors and response

    response: string, name of response column in data

    sle: significance level of a variable into the model
    sls: significance level of a variable to stay in the model

    Returns:
    --------
    model: an "optimal" fitted statsmodels linear model
           with an intercept selected by stepwise selection
    """
    remaining = set(data.columns)
    remaining.remove(response)
    selected = []
    while remaining:
        scores_with_candidates = []
        for candidate in remaining:
            formula = "{} ~ {} + 1".format(response, ‘ + ‘.join(selected + [candidate]))
            score = smf.logit(formula, data).fit().pvalues[candidate]
            scores_with_candidates.append((score, candidate))
        scores_with_candidates.sort()
        best_new_score, best_candidate = scores_with_candidates.pop(0)
        if best_new_score <= sle:
            remaining.remove(best_candidate)
            selected.append(best_candidate)
            formula = "{} ~ {} + 1".format(response, ‘ + ‘.join(selected))
            scores = smf.logit(formula, data).fit().pvalues
            worst_new_score = scores.max()
            worst_candidate = scores.idxmax()
            if worst_new_score > sls:
                selected.remove(worst_candidate)
                remaining.add(worst_candidate)
                if best_candidate == worst_candidate:break
        else:break
    formula = "{} ~ {} + 1".format(response, ‘ + ‘.join(selected))
    model = smf.logit(formula, data).fit()
    return model
时间: 2024-08-29 23:15:44

stepwise的相关文章

《Stepwise Metric Promotion for Unsupervised Video Person Re-identification》 ICCV 2017

Motivation: 这是ICCV 17年做无监督视频ReID的一篇文章.这篇文章简单来说基于两个Motivation. 在不同地方或者同一地方间隔较长时间得到的tracklet往往包含的人物是不同的 一个tracklet里面,大多数图片帧对应的都是同一个人 以上两点虽然是假设,但是也是满足大部分条件下的客观事实,之后的一些操作便是基于这两点假设展开. Introduction: 这篇文章的出发点在于避免ReID领域令人头疼的标注工作.至于为什么做基于Video的ReID作者解释道首先是因为v

回归分析步骤

The 13 Steps for Statistical Modeling in any Regression or ANOVA No matter what statistical model you're running, you need to go through the same 13 steps.  The order and the specifics of how you do each step will differ depending on the data and the

【Similarity Search】Multi-Probe LSH算法深入

引言 上一小节中,我们初步介绍了Multi-Probe LSH算法的大致思路,为了不显得博客文章太冗杂,所以将这个话题分成几篇文章来写. 在该小节文章中,我将具体介绍一下生成微扰向量序列(a sequence of perturbation vectors)的方法及相关分析. 步进式探测(Step-Wise Probing) n-step微扰向量Δ有n个非零坐标,根据位置敏感哈希的性质,距离查询q一步远(one step away)的哈希桶要比距离q两步远(two step away)所包含的数

机器学习算法分类

转自@王萌,有少许修改. 机器学习起源于人工智能,可以赋予计算机以传统编程所无法实现的能力,比如飞行器的自动驾驶.人脸识别.计算机视觉和数据挖掘等. 机器学习的算法很多.很多时候困惑人们的是,很多算法是一类算法,而有些算法又是从其他算法中延伸出来的.这里,我们从两个方面来给大家介绍,第一个方面是学习的方式,第二个方面是算法的类似性. 学习方式 将算法按照学习方式分类可以让人们在建模和算法选择的时候考虑能根据输入数据来选择最合适的算法来获得最好的结果. 监督学习  在监督学习中,输入数据被称为"训

回归分析

回归分析即,量化因变量受自变量影响的大小,建立线性回归方程或者非线性回归方程,从而达对因变量的预测,或者对因变量的解释作用. 回归分析流程如下: ①探索性分析,画不同变量之间的散点图,进行相关性检验等,了解数据的大致情况,以及得知重点关注那几个变量: ②变量和模型选择,: ③回归分析假设条件验证: ④共线性和强影响点检查: ⑤模型修改,并且重复③④: ⑥模型验证. 基本原理 相关系数只能说明变量之间的相关性,并不能对相关性进行量化,回归分析就能够做到这一点. 一元线性回归方程为:Y=β0+β1X

AAAI 2016 paper阅读

本篇文章调研一些感兴趣的AAAI 2016 papers.科研要多读paper!!! Learning to Generate Posters of Scientific Papers,Yuting Qiang, Yanwei Fu, Yanwen Guo, Zhi-Hua Zhou and Leonid Sigal. http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/aaai16poster.pdf 这篇paper研究从科技论文中生成海报

机器学习算法之旅

在理解了我们须要解决的机器学习问题之后,我们能够思考一下我们须要收集什么数据以及我们能够用什么算法.本文我们会过一遍最流行的机器学习算法,大致了解哪些方法可用,非常有帮助. 机器学习领域有非常多算法,然后每种算法又有非常多延伸,所以对于一个特定问题,怎样确定一个正确的算法是非常困难的.本文中我想给你们两种方法来归纳在现实中会遇到的算法. 学习方式 依据怎样处理经验.环境或者不论什么我们称之为输入的数据,算法分为不同种类.机器学习和人工智能课本通常先考虑算法能够适应的学习方式. 这里仅仅讨论几个基

杨氏矩阵查找

在一个m行n列二维数组中,每一行都按照从左到右递增的顺序排序,每一列都按照从上到下递增的顺序排序.请完成一个函数,输入这样的一个二维 数组和一个整数,判断数组中是否含有该整数. 使用Step-wise线性搜索. ```python def get_value(l, r, c): return l[r][c] def find(l, x): m = len(l) - 1 n = len(l[0]) - 1 r = 0 c = n while c >= 0 and r <= m: value =

你应该掌握的七种回归技术

转自:http://www.iteye.com/news/30875 英文原文:https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/ [编者按]回归分析是建模和分析数据的重要工具.本文解释了回归分析的内涵及其优势,重点总结了应该掌握的线性回归.逻辑回归.多项式回归.逐步回归.岭回归.套索回归.ElasticNet回归等七种最常用的回归技术及其关键要素,最后介绍了选择正确的回归模型的关键因素. 什么