数据库题目整理及详解英文版(五)


Preface

when there is rain neither sun nor moon, the people actually do not think so.

Perhaps there are seasonal climate not to be cold rain, let the sun or the cool side.

Has the rain the night is otherwise a moonlit night no flavor.

Sometimes not reminiscent of Li Shangyin “when he cut a total of west window candle, but then when the famous hope of reunion among friends”

In the rain, fom the sky come the silk notes; here, quiet and comfortable.


Presentation

In this page, we will show the homework of the course of the Databases Technology in CMU. All of the questions we tested in the postgresql.

This section mainly is to practice the query operation of the SQL, include SQL view, index, etc. We also sure that all the sqls will pass in the mysql, sqlite… , this article also will provide the Chinese version in detail.

中文版


Question Details

In this homework you will have to write SQL queries to answer questions on a movie dataset, about movies and actors. The database contains two tables:

? movies(mid, title, year, num ratings, rating) , Primary key: mid

? play in(mid, name, cast position), Primary key: mid, name

The tables contain the obvious information: which actor played in what movie, at what position; for each movie, we have the title (eg., ’Gone with the wind’), year of production, count of rating reviews it received, and the average score of those ratings (a float in the range 0 to 10, with ’10’ meaning ’excellent’).



We will use Postgres, which is installed in the your own machines.

Question 1: Warm-up queries … … … … … … … … . [5 points]

(a) [2 points] Print all actors in the movie Quantum of Solace, sorted by cast position. Print only their names.

(b) [3 points] Print all movie titles that were released in 2002, with rating larger than 8 and with more than one rating (num ratings > 1).



Question 2: Find the star’s movies … … … … … … . . [5 points]

(a) [5 points] Print movie titles where Sean Connery was the star (i.e. he had position 1 in the cast). Sort the movie titles alphabetically.



Question 3: Popular actors … … … … … … … … . . [15 points]

(a) [8 points] We want to find the actors of the highest quality. We define their quality as the weighted average of the ratings of the movies they have played in (regardless of cast position), using the number of ratings for each movie as the

weight. In other words, we define the quality for a particular actor as

Print the names of the top 5 actors, according to the above metric. Break ties alphabetically.

(b) [7 points] Now we want to find the 5 most popular actors, in terms of number of ratings (regardless of positive or negative popularity). I.e, if actor ‘Smith’ played in 2 movies, with num ratings 10 and 15, then Smith’s popularity is 25 (=10+15). Print the top 5 actor names according to popularity. Again, break ties alphabetically.



Question 4: Most controversial actor … … … … … . [10 points]

(a) [10 points] We want to find the most controversial actor. As a measure of controversy, we define the maximum difference between the ratings of two movies that an actor has played in (regardless of cast position). That is, if actor ‘Smith’ played in a movie that got rating=1.2, and another that got rating=9.5, and all the other movies he played in, obtained scores within that range, then Smith’s contoversy score is 9.5-1.2= 8.3. Print the name of the top-most controversial actor - again, if there is a tie in first place, break it alphabetically.



Question 5: The minions … … … … … … … … … . [20 points]

(a) [20 points] Find the “minions” of Annette Nicole: Print the names of actors who only played in movies with her and never without her. The answer should not contain the name of Annette Nicole. Order the names alphabetically.



Question 6: High productivity … … … … … … … … [5 points]

(a) [5 points] Find the top 2 most productive years (by number of movies produced). Solve ties by preferring chronologically older years, and print only the years.



Question 7: Movies with similar cast … … … … … . [15 points]

(a) [8 points] Print the count of distinct pairs of movies that have at least one actor in common (ignoring cast position). Exclude self-pairs, and mirror-pairs.

(b) [7 points] Print the count of distinct pairs of moves that have at least two actors in common (again, ignoring cast position). Again, exclude self-pairs, and mirror pairs.



Question 8: Skyline query … … … … … … … … … [25 points]

(a) [25 points] We want to find a set of movies that have both high popularity (ie, high num ratings) as well as high quality (rating). No single movie may achieve both - in which case, we want the so-called Skyline query 2 . More specifically, we want all movies that are not “dominated” by any other movie:

Definition of domination : Movie “A” dominates movie “B” if movie “A” wins over movie “B”, on both criteria, or wins on one, and ties on the rest.

Figure 1 gives a pictorial example: the solid dots (’A’, ’D’, ’F’) are not dominated by any other dot, and thus form the skyline. All other dots are dominated by at least one other dot: e.g., dot ’B’ is dominated by dot ’A’, being inside the shaded rectangle that has ’A’ as the upper-right corner.

Figure 1: Illustration of Skyline and domination : ’A’ dominates all points in the shaded rectangle; ’A’, ’D’ and ’F’ form the skyline of this cloud of points.

Given the above description, print the title of all the movies on the skyline, along with the rating and the number of ratings.


Answer

we give the Postgres version in detail, we will see you can tranfer it easily in mysql or sqlite.

Initialization:

## drop the table if exists
drop table if exists movies cascade;
drop table if exists play_in cascade;

## create tables movies and play_in
create table movies (
mid integer PRIMARY KEY,
title varchar(200),
year integer,
num_ratings integer,
rating real);

create table play_in (
mid integer references movies,
name varchar(100),
cast_position integer,
PRIMARY KEY(mid, name));

create index mid on movies(mid);

Insert Values

Insert into some values into the table movies and play_in,

you will find the datas just in the follow links in my 360 yunFiles:

https://yunpan.cn/cSfLzxQApRXSi password: f3ab

## use "copy" in Postgres
\copy movies from ‘~/data/movie_processed.dat‘;
\copy play_in from ‘~/data/movie_actor_processed.dat‘;

## if you use other databases(mysql, sqlite), you can use   the sql statement: "insert into ... valuse()"

The flowing image show the test infos in my ubuntu os:


Solution 1

(a) SELECT name FROM play_in p, movies m
WHERE p.mid = m.mid and m.title=’Quantum of Solace’
ORDER BY p.cast_position;

## (a) Result just like this:

name
------------------------------
Daniel Craig
Olga Kurylenko
Mathieu Amalric
Judi Dench
Giancarlo Giannini
Gemma Arterton
Jeffrey Wright
David Harbour
Jesper Christensen
Anatole Taubman
Rory Kinnear
Tim Pigott-Smith
Fernando Guillen-Cuervo
Jesus Ochoa
Glenn Foster
Paul Ritter
Simon Kassianides
Stana Katic
Lucrezia Lante della Rove...
Neil Jackson
Oona Chaplin
(21 rows)

(b) SELECT title FROM movies
WHERE year = 2002 and rating>8 and num_ratings>1;

## (b) Result just like this:
title
---------------------------------------
The Lord of the Rings: The Two Towers
Cidade de Deus
Mou gaan dou
(3 rows)

The flowing image show the test solution 1 infos in my ubuntu os:


Solution 2

SELECT title from movies m, play_in p
WHERE m.mid = p.mid and
name = ’Sean Connery’ and
cast_position = 1
ORDER BY title;

## Result just like this:
title
---------------------------------------
Der Name der Rose
Diamonds Are Forever
Dr. No
Entrapment
Finding Forrester
First Knight
From Russia with Love
Goldfinger
Never Say Never Again
The Hunt for Red October
The League of Extraordinary Gentlemen
Thunderball
You Only Live Twice
(13 rows)

The flowing image show the test solution 2 infos in my ubuntu os:


Solution 3

(a) DROP VIEW IF EXISTS WeigthedRatings;

CREATE VIEW WeightedRatings AS
SELECT name, SUM(rating*num_ratings)/SUM(num_ratings) AS WeightedRating
FROM movies m, play_in p WHERE m.mid = p.mid GROUP BY(name);

SELECT name FROM WeightedRatings
ORDER BY
WeightedRating DESC, name ASC LIMIT 5;

## (a) Result just like this:
name
-----------------------
Adam Kalesperis
Aidan Feore
Aleksandr Kajdanovsky
Alexander Kaidanovsky
Alisa Frejndlikh
(5 rows)

(b) DROP VIEW IF EXISTS ActorSumRatings;

CREATE VIEW ActorSumRatings AS
SELECT name, SUM(num_ratings) as popularity
FROM play_in p, movies m
WHERE p.mid = m.mid
GROUP BY name;

SELECT name from ActorSumRatings
ORDER BY popularity DESC, name ASC LIMIT 5;

## (b) Result just like this:
name
----------------------
Johnny Depp
Alan Rickman
Orlando Bloom
Helena Bonham Carter
Matt Damon
(5 rows)

The flowing images show the test solution 3 infos in my ubuntu os:


Solution 4

DROP VIEW IF EXISTS RatingGap;

CREATE VIEW RatingGap AS
SELECT p1.name, MAX(ABS(m1.rating-m2.rating)) as Gap
FROM play_in p1, play_in p2, movies m1, movies m2
WHERE p1.mid = m1.mid and
p2.mid = m2.mid and
p1.name = p2.name
GROUP BY(p1.name);

SELECT name
FROM RatingGap
ORDER BY(Gap) DESC LIMIT 1;

## Result just like this:
name
---------------
John Travolta
(1 row)

The flowing image show the test solution 4 infos in my ubuntu os:


Solution 5

DROP VIEW IF EXISTS MastersMovies CASCADE;

CREATE VIEW MastersMovies AS
SELECT m.mid,m.title FROM movies m, play_in p
WHERE m.mid = p.mid and p.name = ’Annette Nicole’;

DROP VIEW IF EXISTS CoActors;

CREATE VIEW CoActors AS
SELECT DISTINCT name FROM MastersMovies m , play_in p
WHERE p.mid = m.mid;

DROP VIEW IF EXISTS Combinations;

CREATE VIEW Combinations AS
SELECT name,mid FROM MastersMovies , CoActors;

DROP VIEW IF EXISTS NonExistent;

CREATE VIEW NonExistent AS
SELECT * FROM Combinations
EXCEPT (SELECT name, mid FROM play_in);

DROP VIEW IF EXISTS PotentialResults;

CREATE VIEW PotentialResults AS
SELECT * from CoActors
EXCEPT (SELECT distinct(name) FROM NonExistent);

DROP VIEW IF EXISTS NotMastersMovies;

CREATE VIEW NotMastersMovies AS
SELECT m.mid FROM movies m
EXCEPT (SELECT mid FROM MastersMovies);

SELECT * from PotentialResults
WHERE name not in
(SELECT name
FROM play_in p, NotMastersMovies m
WHERE m.mid = p.mid
UNION SELECT ’Annette Nicole’
) ORDER BY name;

## Result just like this:
name
-----------------
Christian Perry
(1 row)

The flowing image show the test solution 5 infos in my ubuntu os:


Solution 6

DROP VIEW IF EXISTS MoviesPerYear;

CREATE VIEW MoviesPerYear AS
SELECT year, COUNT(title) num_movies
FROM MOVIES GROUP BY(year);

SELECT year from MoviesPerYear
ORDER BY num_movies DESC LIMIT 2;

## Result just like this:
year
------
2006
2007
(2 rows)

The flowing image show the test solution 6 infos in my ubuntu os:


Solution 7

(a) SELECT COUNT(*) FROM
(SELECT DISTINCT m1.mid, m2.mid
 FROM movies m1, movies m2, play_in p1, play_in p2
 WHERE m1.mid > m2.mid and
 m1.mid = p1.mid and
 m2.mid = p2.mid and
 p1.name = p2.name)
AS count;

## (a) Result just like this:
count
--------
104846
(1 row)

(b) SELECT COUNT(*) FROM
(SELECT DISTINCT m1.mid, m2.mid
FROM movies m1, movies m2, play_in p1,
play_in p2, play_in p3, play_in p4
WHERE m1.mid > m2.mid and
m1.mid = p1.mid and
m2.mid = p2.mid and
m1.mid = p3.mid and
m2.mid = p4.mid and
p2.name <> p4.name and
p1.name = p2.name and
p3.name = p4.name)
AS count;

## (b) Result just like this:
count
-------
6845
(1 row)

The flowing image show the test solution 7 infos in my ubuntu os:


Solution 8

DROP VIEW IF EXISTS Dominated;

CREATE VIEW Dominated AS
SELECT DISTINCT m2.mid, m2.title,m2.num_ratings, m2.rating
FROM movies m1, movies m2
WHERE m2.rating<=m1.rating and m2.num_ratings<=m1.num_ratings and
NOT (m2.rating = m1.rating and m2.num_ratings=m1.num_ratings);

SELECT title,num_ratings,rating
FROM movies
EXCEPT (SELECT title,num_ratings,rating FROM Dominated);

The flowing image show the test solution 8 infos in my ubuntu os:


Reference

[1] http://www.ruanyifeng.com/blog/2013/12/getting_started_with_postgresql.html

[2] http://www.postgresql.org/docs/

[3] http://www.cs.cmu.edu/~epapalex/15415S14/PostgreSQLReadme.htm

[4] http://www.cs.cmu.edu/~christos/courses/

时间: 2024-10-27 06:06:51

数据库题目整理及详解英文版(五)的相关文章

数据库题目整理及详解(四)

前言 有多少次挥汗如雨,伤痛曾添满记忆,只因为始终相信,去拼搏才能胜利.总在鼓舞自己,要成功就得努力.热血在赛场沸腾,巨人在赛场升起. 相信自己,你将赢得胜利,创造奇迹:相信自己,梦想在你手中,这是你的天地.当一切过去,你们将是第一. 相信自己,你们将超越极限,超越自己! 相信自己,加油吧,健儿们,相信你自己. 坐在中体对面, 听着这振奋激昂的加油欢呼声, 照样可以感受到校运会的气势磅礴, 虽然我还在敲代码-- 来个这个吧, 特殊纪念, 沃夫慈悲: 说明 老生常谈! 接着之前的SQL语句继续整理

数据库题目整理及详解中文版(五)

前言 有雨的时候既没有太阳也没有月亮,人们却多不以为许. 有雨的夜晚则另有一番月夜所没有的韵味. 有时不由让人想起李商隐"何当共剪西窗烛,却话巴山夜雨时"的名句 雨中, 从对面天空中传来丝丝音符; 这里, 静谧而又舒畅. 说明 本文算作一篇译文吧, 题目来自于卡内基.梅隆大学的数据库技术课程的Homework, 在数据库postgresql中亲测通过; 各版本的SQL环境差别不大, 看看下文就知道, 在其他版本的数据库, 如mysql | sqlite等都能通过. 这部分主要是使用SQ

expdp impdp 数据库导入导出命令详解

一.创建逻辑目录,该命令不会在操作系统创建真正的目录,最好以system等管理员创建. create directory dpdata1 as 'd:\test\dump'; 二.查看管理理员目录(同时查看操作系统是否存在,因为Oracle并不关心该目录是否存在,如果不存在,则出错) select * from dba_directories; 三.给scott用户赋予在指定目录的操作权限,最好以system等管理员赋予. grant read,write on directory dpdata

开源项目MultiChoiceAdapter详解(五)——可扩展的MultiChoiceBaseAdapter

上次写到了开源项目MultiChoiceAdapter详解(四)——MultiChoiceBaseAdapter的使用,其实我们仍旧可以不使用ActionMode的,所以这里就写一个自己扩展的方法. 一.布局文件 listview_normal_layout.xml <?xml version="1.0" encoding="utf-8"?> <LinearLayout xmlns:android="http://schemas.andr

SQL Server数据库ROW_NUMBER()函数使用详解

SQL Server数据库ROW_NUMBER()函数使用详解 摘自:http://database.51cto.com/art/201108/283399.htm SQL Server数据库ROW_NUMBER()函数的使用是本文我们要介绍的内容,接下来我们就通过几个实例来一一介绍ROW_NUMBER()函数的使用. 实例如下: 1.使用row_number()函数进行编号,如 select email,customerID, ROW_NUMBER() over(order by psd) a

JAVA: httpclient 详解——第五章;

相对于httpurlconnection ,httpclient更加丰富,也更加强大,其中apache有两个项目都是httpclient,一个是commonts包下的,这个是通用的,更专业的是org.apache.http.包下的,所以我一般用后者: httpclient可以处理长连接,保存会话,重连接,以及请求过滤器,连接重用等等... 下面是测试代码(全部总结来自官方文档,以及翻译) 须要下载核心包:httpclient-4.3.4.jar ,也可在官网下载:http://hc.apache

Spring MVC 3.0.5+Spring 3.0.5+MyBatis3.0.4全注解实例详解(五)

这是本系列的最后一篇,主要讲一下FreeMarker模板引擎的基本概念与常用指令的使用方式.     一.FreemMarker基本概念     FreemMarker是一个用Java语言编写的模板引擎,它是一个基于模板来生成文本输出的一个工具.是除了JSP之外被使用得最多的页面模板技术之一,另一个比较有名的模板则是Velocity.     用户可以使用FreeMarker来生成所需要的内容,通常由Java提供数据模型,FreeMarker通过模板引擎渲染数据模型,这样最终得到我们想要的内容.

C# Oracle数据库操作类实例详解

本文所述为C#实现的Oracle数据库操作类,可执行超多常用的Oracle数据库操作,包含了基础数据库连接.关闭连接.输出记录集.执行Sql语句,返回带分页功能的dataset .取表里字段的类型和长度等,同时还有哈稀表自动插入数据库等高级任务.需要特别指出的是:在执行SQL语句,返回 DataReader之前一定要先用.read()打开,然后才能读到数据,再用hashTable对数据库进行insert,update,del操作,注意此时只能用默认的数据库连接"connstr". 本文

数据库三个范式详解

数据库三个范式详解 数据库范式的提出是为了对关系数据库中的数据进行规范而提出的一个概念,第一范式,第二范式,第三范式这三个范式逐渐对数据进行细分,意思就是指属于这三种范式之一的关系数据库的数据相互之间的依赖关系越来越清晰明了.下面对三种范式进行详细的讲解. 第一范式(1NF):属于第一范式的数据库的表的列(属性)是不能再进一步拆分的.如 学号 课程 2014212797 软件技术基础   高数 很显然,这个表格的第二列是可以在细分的,所以不属于第一范式.第一范式是数据库数据的最低要求,不满足第一