【原创】大叔经验分享（39）spark cache unpersist级联操作

问题：spark中如果有两个DataFrame（或者DataSet），DataFrameA依赖DataFrameB，并且两个DataFrame都进行了cache，将DataFrameB unpersist之后，DataFrameA的cache也会失效，官方解释如下：

When invalidating a cache, we invalid other caches dependent on this cache to ensure cached data is up to date. For example, when the underlying table has been modified or the table has been dropped itself, all caches that use this table should be invalidated or refreshed.

However, in other cases, like when user simply want to drop a cache to free up memory, we do not need to invalidate dependent caches since no underlying data has been changed. For this reason, we would like to introduce a new cache invalidation mode: the non-cascading cache invalidation.

之前默认的模式为regular mode，这种模式下为了保证被cache数据是最新的（没有过期），会对cache的unpersist进行级联操作，即清空所有依赖（包括间接依赖）该cache的其他cache；
从spark2.4开始引入了一个新的模式：non-cascading mode，这个模式下不会对cache的unpersist进行级联操作；

DataFrame/DataSet的cache操作默认用的level是MEMORY_AND_DISK，除非手工指定MEMORY，并且确认内存足够，否则unpersist之前的cache看起来没有必要；

参考：
https://issues.apache.org/jira/browse/SPARK-21478
https://issues.apache.org/jira/browse/SPARK-24596
https://issues.apache.org/jira/browse/SPARK-21579

原文地址：https://www.cnblogs.com/barneywill/p/10524805.html

时间： 2024-11-05 22:32:12

【原创】大叔经验分享（39）spark cache unpersist级联操作的相关文章

【原创】经验分享（15）spark sql limit实现原理

之前讨论过hive中limit的实现,详见 https://www.cnblogs.com/barneywill/p/10109217.html下面看spark sql中limit的实现,首先看执行计划: spark-sql> explain select * from test1 limit 10;== Physical Plan ==CollectLimit 10+- HiveTableScan [id#35], MetastoreRelation temp, test1Time taken

【原创】大叔经验分享（23）hive metastore的几种部署方式

hive及其他组件(比如spark.impala等)都会依赖hive metastore,依赖的配置文件位于hive-site.xml hive metastore重要配置 hive.metastore.warehouse.dirhive2及之前版本默认为/user/hive/warehouse/,创建数据库或表时会在该目录下创建对应的目录 javax.jdo.option.ConnectionURLjavax.jdo.option.ConnectionDriverNamejavax.jdo.o

【原创】大叔经验分享（35）lzo格式支持

建表语句 CREATE EXTERNAL TABLE `my_lzo_table`(`something` string)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputForma

【原创】大叔经验分享（58）kudu写入压力大时报错

kudu写入压力大时报错 19/05/18 16:53:12 INFO AsyncKuduClient: Invalidating location fd52e4f930bc45458a8f29ed118785e3(server002:7050) for tablet 4259921cdcca4776b37771659a8cafb3: Service unavailable: Soft memory limit exceeded (at 106.05% of capacity). See htt

【原创】大叔经验分享（52）ClouderaManager修改配置报错

Cloudera Manager中修改配置可能报错: Incorrect string value: '\xE7\xA8\x8B\xE5\xBA\x8F...' for column 'MESSAGE' at row 1 这是一个mysql的字符集问题,极有可能创建scm数据库时使用默认的latin1编码导致,涉及的表为: CREATE TABLE `REVISIONS` ( `REVISION_ID` bigint(20) NOT NULL, `OPTIMISTIC_LOCK_VERSION`

【原创】大叔经验分享（53）kudu报错unable to find SASL plugin: PLAIN

kudu安装后运行不正常,master中找不到任何tserver,查看tserver日志发现有很多报错: Failed to heartbeat to master:7051: Invalid argument: Failed to ping master at master:7051: Client connection negotiation failed: client connection to master:7051: unable to find SASL plugin: PLAIN

【原创】大叔经验分享（55）hue导出行数限制

/opt/cloudera/parcels/CDH/lib/hue/apps/beeswax/src/beeswax/conf.py # Deprecated DOWNLOAD_CELL_LIMIT = Config( key='download_cell_limit', default=10000000, type=int, help=_t('A limit to the number of cells (rows * columns) that can be downloaded from

【原创】大叔经验分享（57）hue启动coordinator时报错

hue启动coordinator时报错,页面返回undefinied错误框: 后台日志报错: runcpserver.log [13/May/2019 04:34:55 -0700] middleware INFO Processing exception: 'NoneType' object has no attribute 'is_superuser': Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.

【原创】大叔经验分享（61）kudu rebalance报错

kudu rebalance命令报错 terminate called after throwing an instance of 'std::regex_error' what(): regex_error *** Aborted at 1558779043 (unix time) try "date -d @1558779043" if you are using GNU date *** PC: @ 0x7ff0d6cf9207 __GI_raise *** SIGABRT (@

猜你喜欢

作业6 学生成绩录入系统设计与实现阶段一

作业要求: [必做 1] 列出成绩录入系统功能需求. [必做 2] 画出该学生成绩录入系统的用例图. [必做 3] 将系统开发工作分解为若干任务,画出WBS [必做 4] 将任务分配到团队成员.列出任 ...

<正则吃饺子> ：关于微信支付的简单总结说明（二）

关于微信退款一.官方文档申请退款:https://pay.weixin.qq.com/wiki/doc/api/app/app.php?chapter=9_4&index=6 二.退款流程 ...

Java学习-课堂总结

一.字符串比较方式 1)‘==’ 地址值比较 2) equals()方法内容比较二.String类的两种实例化方式 1)String str=“Hello”: 2 ...

函数的简单调用

1 #include<stdio.h> 2 #include<string.h> 3 #include<windows.h> 4 int Move(); 5 int ...

下拉列表联动显示（Car表）三级联动

1.Models namespace 下拉列表联动显示_Car表_.Models { public class ProductorBF { private MyDBDataContext _conte ...

python之自定义异步IO客户端

#!/usr/bin/env python # -*- coding: utf8 -*- # __Author: "Skiler Hao" # date: 2017/5/16 15 ...

python的变量作用域

1 import time 2 global mark,sum 3 def gaosi(Q): 4 global sum,mark # 在使用的时候防止隔离也要声明一下这个是全局变量 , 引用外 ...

2013-2014-2期末监考

学期初监考了“开学初补考”,14周-18周又有两三场监考,结果期末还有序号课程名称班级名称人数考试教室考试时间监考1 监考2 72 大学英语Ⅱ 13地理A1班 62 1-308 2014 ...

lsof(list open files)是一个列出当前系统打开文件的工具.在linux环境下,任何事物都以文件的形式存在,通过文件不仅仅可以访问常规数据,还可以访问网络连接和硬件.所以如传输控制协议 ...

<JSP>page与pageContext什么关系

JSP网页本身,page对象是当前页面转换后的Servlet类的实例.从转换后的Servlet类的代码中,可以看到这种关系:Object page = this;在JSP页面中,很少使用page对象. ...

c++字符串编码GBK到UTF8的转换

使用c++跨windows和linux平台实现字符串GBK到UTF8的转换. 原理是GBK字符串先转为unicode编码,然后再转换为UTF8编码. 代码如下: #ifndef __CODE_CONV ...

LAMP--Apache 禁止解析 php

某个目录下禁止解析php,这个很有用,比如某些目录可以上传文件,为了避免上传的文件有木马,所有我们禁用这个目录下面的访问解析 php. <Directory /data/www/data> ...

Xcode 6.3.2 bug：编辑界面乱跳问题

今天刚把Xcode升级到6.3.2,结果写swift的时候界面各种问题频出,一会代码行数没了,一会整个屏幕没有代码了,一会从这里突然跳到那里...简直不能忍.后来才知道是自定义主题的问题.当自定义主题 ...

base64coder调用

base64coder 可以查看官网: http://www.source-code.biz/base64coder/java/ 我所涉及到的 base64coder调用: 某天,因需要修改Pr ...

http响应报文和http请求报文详细信息

tomcat项目本身的jar包

.Net Framework 之框架图

.Net Framework框架图,如下图: 它表明了这么一种编写软件的方式或者说表明了.Net平台下开发软件的思想和规范. .Net Framework框架实际只包含两部分: 1.公共语言运行时( ...

Spring AOP基于xml配置实例

目录层级: AOP相关的几个类就是com.aop.xmltype这个报下的4个类. ICalculatorxml.java package com.aop.xmltype; /** * 加减乘除接口, ...

Go slice的容量和长度

package main import ( "fmt" ) func main() { a := []int{1,2,3,4} fmt.Println("a:" ...

ACM学习历程——UVA540 Team Queue（队列，map：Hash）

Description Team Queue Team Queue Queues and Priority Queues are data structures which are known ...

WebService 检测到有潜在危险的 Request.Form 值

方法一: 在.aspx文件头中加入这句: <%@ Page validateRequest="false" EnableEventValidation="fals ...

专题

随机推荐

© 2024 憋错料 | info#biecuoliao.com | 10 q. 0.020 s.