R基础入门——基本语法和作图

整理笔记的时候翻到两年前做的R入门笔记，还记得21年冬天那个时候是第一次接触R，华中农业大学的孔秋生教授来塔里木大学做的R语言讲座。两年了有些东西过时了，整理下做个备份吧~顺便回头复习复习，温故而知新 ^_^

1. R是什么

R是一种用于统计计算和数据分析的编程语言。它提供了广泛的统计和图形功能，以及丰富的数据处理和建模工具。R具有强大的数据处理能力和丰富的统计函数库，被广泛应用于学术研究、数据科学、金融分析、生物医学等领域。

RStudio是一个集成开发环境（Integrated Development Environment，IDE），用于编写、运行和调试R语言代码。它提供了许多功能和工具，旨在提高R语言开发的效率和便利性。说白了，我们是在Rstudio中编写和运行R语言代码。

当然，这个集成开发环境不是唯一的，我们也可以在比如vscode中调试运行。Rstudio只是提供一个为新手入门提供一个友好的界面，熟练后甚至可以不用集成开发环境，比如在linux中也可以运行，这就是后话了。

2. 前期准备和一些基础认识

先安装R，再安装Rstudio，顺序不能反，否则可能会提示找不到R在什么地方…

R官网：R: The R Project for Statistical Computing (r-project.org)

Rstudio官网（现在已经改名为Posit，还真不习惯）：Posit | The Open-Source Data Science Company

全部安装好，进入Rstudio后，点击菜单栏Tools，下拉框的Global Options，这里可以修改全局设置。主要修改的是自己的工作目录（也可以在代码中修改），我顺便改了四个窗口的布局（在Pane Layout中修改）：

Source窗口：写R代码的窗口，也可以在3窗口（console）写，个人习惯

Environment、History等窗口：可以看到代码运行过程中生成的变量、自己的历史命令等等

Console窗口：R语言的交互式控制台，可以逐行输入和执行R代码，并立即看到结果，ctrl+L可以清屏

Files、Plot等窗口，前者可以看当前工作环境的文件，后者看到绘制的图

2和4窗口有多个选项供选择，我用的比较多的是这些，仅供参考。

对于1和3码代码的窗口部分，对于有较大代码块或者需要保存和重复的代码，建议用Source窗口，运行每一行需要Ctrl+回车；而对于简单的代码测试、快速计算或者做交互式探索的话，可以选择在Console窗口，回车就可以运行。

R语言要调用的软件包在CRAN仓库中，我们可以在以下R包官网中找到你需要的R包，以及各R包的参数、用法。

CRAN - Contributed Packages (r-project.org)

在Rstudio中，你可以通过菜单栏Tools，下拉框的第一个Install Packages窗口，输入你想要安装的R包，点击install安装：

以上是认识R和Rstudio的最基础的知识，下面主要讲讲R代码的语法和作图的一些示例。

3. R代码基础语法

再次申明这是入门写的笔记，不会介绍很详细，完整的可以看官方手册An Introduction to R (r-project.org)

为了方便展示运行结果，以下运行结果前均以两个井号##开头。

3.1 数据类型

常用的数据类型有数值型(numeric)，字符型(character)，逻辑型（logical）。

a = 123    # =赋值，<-也可以赋值，R官方社区用<-较多，自己取舍。#表示注释，不会运行
class(a)    # class()确定括号内数据类型
## [1] "numeric"

a = "123"    # 赋值为字符型加双引号，与代码相关的都是英文字符
class(a)
## [1] "character"

a = TRUE    # 逻辑型包括TRUE/FALSE/T/F
class(a)
## [1] "logical"

3.2 数据结构

在R中，向量是一种基本的数据结构，用于存储一系列相同类型的元素。

# 向量vector c()这个函数用来创建向量
# 数值型向量
a = c(1,7,10)      # 数值之间逗号间隔
class(a)
## [1] "numeric"

seq(1,5)    # seq()序列函数
## [1] 1 2 3 4 5
seq(from = 1, to = 3, by = 0.5)    # by表示步长
## [1] 1.0 1.5 2.0 2.5 3.0
1:10    # x:x也可以表示序列
## [1]  1  2  3  4  5  6  7  8  9 10

# 字符型向量
a = c("A","B","C")
class(a)
## [1] "character"

letters    # 26个小写英文字母顺序排列
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
LETTERS    # 26个大写英文字母顺序排列
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

# 逻辑型向量
a = c(T,TRUE,F,FALSE)
class(a)
## [1] "logical"

3.3 向量数据操作

# rep()重复
rep(1:3, times = 4)    # 1到3重复3次
## [1] 1 2 3 1 2 3 1 2 3 1 2 3
rep(1:3, each = 4)    # 1到3每个数字重复三次
## [1] 1 1 1 1 2 2 2 2 3 3 3 3

# paste()组合
paste(1:3, 
      c("A","B","C"))    # 默认组合中间有空格
## [1] "1 A" "2 B" "3 C"
paste0(1:3,
       c("A","B","C"))    # 去掉组合中间空格
## [1] "1A" "2B" "3C"
paste(1:3,
      c("A","B","C"),    # 换行注意逗号不要漏
      sep = "/")    # 自定义连接组合的符号
## [1] "1/A" "2/B" "3/C"

##练习：3个处理ABC，3个重复
paste0(rep(c("A","B","C"), each = 3),
       rep(1:3, times = 3))
## [1] "A1" "A2" "A3" "B1" "B2" "B3" "C1" "C2" "C3"

# []索引，注意索引第一位是1而不是0
a = 2:10
a[5]    # a向量中第5个数
## [1] 6
a[1:3]    # a向量第1-3个数
## [1] 2 3 4
a[c(1,4,5)]    # a向量第1，4，5个数
## [1] 2 5 6

# 逻辑操作符
# &与；|或；!非
a[a>5]    # a向量中比5大的数
## [1]  6  7  8  9 10
a[a>5 & a<8]    # a向量中大于5且小于8的数
## [1] 6 7
a[a>5 | a<3]    # a向量中大于5或小于3的数
## [1]  2  6  7  8  9 10
a[a!=8]    # a向量中不包括8的值
## [1]  2  3  4  5  6  7  9 10

3.4 向量计算

# +, -, *, /  加减乘除正常计算
2 * 1:3
## [1] 2 4 6
1:3 * 1:3    # 两个向量分别相乘
## [1] 1 4 9
2:5 + 1:3    # 注意两个向量长度不同，计算方式不同，最后一个是5+1得到的
## [1] 3 5 7 6
## Warning message:
## In 2:5 + 1:3 :
##   longer object length is not a multiple of shorter object length

3.5 向量类型转换

# as.+数据类型()
a = 1:3
class(a)
## [1] "integer"
# as.character() 转换字符型数据
b = as.character(a)
b
## [1] "1" "2" "3"
# as.numeric() 转换数值型数据
c = as.numeric(b)  
c
## [1] 1 2 3
# as.logical() 转换逻辑型数据
d = as.logical(c)
d
## [1] TRUE TRUE TRUE

## 练习：产生5个大写字母
a = LETTERS[1:5]
class(a)
## [1] "character"
a
## [1] "A" "B" "C" "D" "E"

## 练习：产生4对T,F
a = rep(c(T,F), times = 4)
class(a)
## [1] "logical"
as.numeric(a)    # TRUE数值为1，FALSE数值为0
## [1] 1 0 1 0 1 0 1 0
as.character(a)
## [1] "TRUE"  "FALSE" "TRUE"  "FALSE" "TRUE"  "FALSE" "TRUE"  "FALSE"

3.6 常用计算函数

mean(1:10)    # 平均数
## [1] 5.5
sd(1:10)    # 标准差
## [1] 3.02765
max(1:10)    # 最大值
## [1] 10
range(1:10)    # 最小值和最大值
## [1]  1 10
length(1:10)    # 长度
## [1] 10
length(letters)
## [1] 26

3.7 矩阵（matrix）

矩阵是一种二维数据结构，由行和列组成，其中每个元素有相同的数据类型。矩阵可以看成是向量的拓展。

# matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
# data：矩阵的元素，默认为NA，即未给出元素值的话，各项为NA
# nrow：矩阵的行数，默认为1，可简写nr
# ncol：矩阵的列数，默认为1，可简写nc
# byrow：元素是否按行填充，默认按列
# dimnames：以字符型向量表示的行名及列名
a = 1:12
matrix(a)
##       [,1]
## [1,]    1
## [2,]    2
## [3,]    3
## [4,]    4
## [5,]    5
## [6,]    6
## [7,]    7
## [8,]    8
## [9,]    9
##[10,]   10
##[11,]   11
##[12,]   12

# 数值型矩阵
a = matrix(a,
           nr = 3,    # nrow可以不写
           byrow = T)    # 先行后列形式填充
a
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12

# 字符型矩阵
matrix(LETTERS[1:12],
       ncol = 3)
##      [,1] [,2] [,3]
## [1,] "A"  "E"  "I" 
## [2,] "B"  "F"  "J" 
## [3,] "C"  "G"  "K" 
## [4,] "D"  "H"  "L"

# 行/列名  使用行数和列数相等的向量命名
colnames(a) = c("第一列","第二列","第三列","第四列")
a
##      第一列 第二列 第三列 第四列
## [1,]      1      2      3      4
## [2,]      5      6      7      8
## [3,]      9     10     11     12
row.names(a) = LETTERS[1:3]
a
##   第一列 第二列 第三列 第四列
## A      1      2      3      4
## B      5      6      7      8
## C      9     10     11     12

# 数据过滤（提取）
a[1,]    # 提取矩阵第一行
## 第一列 第二列 第三列 第四列 
##     1      2      3      4 
a[1:2,]    
##   第一列 第二列 第三列 第四列
## A      1      2      3      4
## B      5      6      7      8
a[-1,]    # 删除矩阵第一行
##   第一列 第二列 第三列 第四列
## B      5      6      7      8
## C      9     10     11     12

3.8 数据框（Data Frame）

数据框是R语言中另一种常见的二维数据结构，它可以存储不同类型的数据，比如数值、字符、因子（factor）等等，并且每一列可以有不同的长度。

data.frame(1:12,5:8)    # 注意行数不同，填充方式不同
##    X1.12 X5.8
## 1      1    5
## 2      2    6
## 3      3    7
## 4      4    8
## 5      5    5
## 6      6    6
## 7      7    7
## 8      8    8
## 9      9    5
## 10    10    6
## 11    11    7
## 12    12    8

a = 1:4
b = 5:8
c = letters[1:4]
d = letters[5:8]
e = data.frame(a,b,c,d)
e
##   a b c d
## 1 1 5 a e
## 2 2 6 b f
## 3 3 7 c g
## 4 4 8 d h

str(e)    # 检查数据框中数据类型
## 'data.frame':	4 obs. of  4 variables:
##  $ a: int  1 2 3 4
##  $ b: int  5 6 7 8
##  $ c: chr  "a" "b" "c" "d"
##  $ d: chr  "e" "f" "g" "h"

# 基本数据操作
# 行/列提取
e[1,]
##   a b c d
## 1 1 5 a e
e[,1]
## [1] 1 2 3 4
e$a    # $对列提取,abcd为列数
## [1] 1 2 3 4

# 增加列
e$e = 9:12    # 增加不存在的e列为向量9:12
e
##   a b c d  e
## 1 1 5 a e  9
## 2 2 6 b f 10
## 3 3 7 c g 11
## 4 4 8 d h 12

# 行/列命名
row.names(e)[3] = "P"
row.names(e)[4] = "l"    # 更改行名
colnames(e)[1] = "第一列"    # 更改列名
e
##   第一列 b c d  e
## 1      1 5 a e  9
## 2      2 6 b f 10
## P      3 7 c g 11
## l      4 8 d h 12

# 合并矩阵两种方式rbind,cbind(矩阵，名称 = 赋值)
f = cbind(e,f = 13:16)    # 合并以后相当于增加了1列
f
##   第一列 b c d  e  f
## 1      1 5 a e  9 13
## 2      2 6 b f 10 14
## P      3 7 c g 11 15
## l      4 8 d h 12 16
colnames(f)[6] = "Y"
f
g = rbind(f, 6 )    # 合并以后相当于增加了1行
g
##   第一列 b c d  e  Y
## 1      1 5 a e  9 13
## 2      2 6 b f 10 14
## P      3 7 c g 11 15
## l      4 8 d h 12 16
## 5      6 6 6 6  6  6

# 行/列平均数
# rowMeans 行平均数 colMeans 列平均数

## 练习：生成一个矩阵，包含21-40的值，给行和列取名
## 加上最后一列平均数；加上最后一列，计算第二列和第一列的差
a = matrix(21:40, ncol = 5)
colnames(a) = LETTERS[1:5]
row.names(a) = letters[1:4]
cbind(a,Mean = rowMeans(a))
##    A  B  C  D  E Mean
## a 21 25 29 33 37   29
## b 22 26 30 34 38   30
## c 23 27 31 35 39   31
## d 24 28 32 36 40   32
b = a[,1]
c = a[,2]
d = c-b
cbind(a, H = d)
##    A  B  C  D  E H
## a 21 25 29 33 37 4
## b 22 26 30 34 38 4
## c 23 27 31 35 39 4
## d 24 28 32 36 40 4


## 练习：建一个数据框统计一天消费，第一列开支，第二列单价，第三列数量
b = c(648,328,128,60,30,6)
c = c(1,3,4,1,4,6)
a = c(c*b)
d = data.frame(a,b,c)
colnames(d)[1] = "消费金额"
colnames(d)[2] = "单价"
colnames(d)[3] = "数量"
row.names(d)[1] = "648氪金消费"
row.names(d)[2] = "328氪金消费"
row.names(d)[3] = "128氪金消费"
row.names(d)[4] = "60氪金消费"
row.names(d)[5] = "30氪金消费"
row.names(d)[6] = "6氪金消费"
rbind(d,总氪金量 = sum(a))
##             消费金额 单价 数量
## 648氪金消费      648  648    1
## 328氪金消费      984  328    3
## 128氪金消费      512  128    4
## 60氪金消费        60   60    1
## 30氪金消费       120   30    4
## 6氪金消费         36    6    6
## 总氪金量        2360 2360 2360

4. R做图

咱们学生物最关心的就是怎么作图了，上面的编程基础没懂也没事，下面怎么画图可以套模板。

R可以导入不同的包作图，这里用最最基础的ggplot为例，以下是安装和导入方式，后面所有例子均需导入ggplot：

# 安装ggplot2包
install.packages("ggplot2")

# 导入ggplot2
library(ggplot2)

4.1 线性回归图

library(ggplot2)
# 以R内置的cars数据为例
ggplot(data = cars,
       mapping = aes(x = speed,    # aes描述数据中的变量映射到geom的可视属性（美学？）
                     y = dist)) +     
  geom_point() + geom_line(col = "red") +     # geom表示图层
  geom_smooth(method = "lm") +   # lm表示线性回归方法
  annotate(geom = "text",    # annotate注释
           x = 10,    # 注释坐标
           y = 100,
           label = "y = -17.6x + 3.9 \n p = 1.49e-12",    # \n为换行符
           size = 5)    # 注释大小

lm.cars = lm(formula = dist ~ speed,
             data = cars)
summary(lm.cars)    # 回归方程详细参数总结，显示结果如下，代码中不要加下面的东西
#-------------------------------------------------------------------------------------
#Call:
#lm(formula = dist ~ speed, data = cars)

#Residuals:
#    Min      1Q  Median      3Q     Max 
#-29.069  -9.525  -2.272   9.215  43.201 

#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept) -17.5791     6.7584  -2.601   0.0123 *  
#speed         3.9324     0.4155   9.464 1.49e-12 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 15.38 on 48 degrees of freedom
#Multiple R-squared:  0.6511,	Adjusted R-squared:  0.6438 
#F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

4.2 直方图

library(ggplot2)
# 以R内置的iris数据为例
ggplot(data = iris,
       mapping = aes(x = Sepal.Length)) +
  geom_histogram(col = "red",     # 边缘颜色
                 fill = "blue",    # 填充色
                 alpha = 0.5,    # 透明度
                 bins = 30) +    #分组数   
  labs(x = "Sepal Length(cm)",    # labs横纵坐标名
       y = "Count") +
  theme_classic()    # 主题——经典背景（白色的）

## 练习：对iris数据集中的Sepal.Width形状做直方图（图略）
ggplot(data = iris,
       mapping = aes(x = Sepal.Width)) +
  geom_histogram(col = "red",
                 fill = "blue", 
                 alpha = 0.3,
                 bins = 30) +
  labs(x = "Sepal Width(cm)",
       y = "Count") + 
  theme_classic()

4.3 密度图

library(ggplot2)
ggplot(iris,
       aes(x = Sepal.Width)) +
  geom_density()

# 密度图 + 直方图
library(ggplot2)
ggplot(data = iris,
       mapping = aes(x = Sepal.Width)) +
  geom_histogram(aes(y = ..density..),    # 直方图纵坐标改成密度
                 col = "red",
                 fill = "blue", 
                 alpha = 0.3,
                 bins = 30) +
  geom_density(col = "red",) +
  theme_dark()

## 练习：做Petal.Length直方图和密度图（图略）
ggplot(data = iris,
       mapping = aes(x = Petal.Length)) + 
  geom_histogram(aes(y = ..density..),
                 bins = 30,
                 col = "blue",
                 fill = "green")+
  geom_density(col = "red")

4.4 折线图

library(ggplot2)
ggplot(iris,
       aes(x = Sepal.Width)) + 
  geom_freqpoly()    # 折线图

4.5 柱形图

library(ggplot2)
ggplot(data = iris,
       mapping = aes(x = Species,
                     fill = Species)) +   # 总体颜色设置
  geom_bar(width = 0.5,    # 设置宽度
           alpha = 0.3,     # 注意设置颜色后总体颜色设置失效
           show.legend = F)   # 设置F为图例不显示

## 练习：以mtcars数据集为例,对cyl作图（图略）
mtcars
ggplot(mtcars,
       aes(x = factor(cyl),    # factor连续型变量数值转换成因子
           fill = factor(cyl))) + 
  geom_bar(width = 0.5,
           col = "red",
           alpha = 0.3) + 
  labs(x = "Number of cylinder",
       y = "Count")

# 2个变量作图（图略）
ggplot(iris,
       aes(x = Sepal.Length,
           col = Species)) +     # 以种作为分类依据
  geom_density()

## 练习：对iris的Sepal.Length,根据不同种做直方图（图略）
ggplot(iris,
       aes(x= Sepal.Length,
           fill = Species)) +
  geom_histogram(bins = 30,
                 alpha = 0.4,
                 col = "black") +
  theme_classic()

# 柱形图叠加误差棒
library(ggplot2)
Mean = tapply(iris$Sepal.Length,    # tapply分组，提取 
              iris$Species,
              mean)
Mean = as.data.frame(Mean)    # 转换array类型为数据框
Mean
##            Mean
## setosa     5.006
## versicolor 5.936
## virginica  6.588
Mean$Species = row.names(Mean)
Mean
##             Mean    Species
## setosa     5.006     setosa
## versicolor 5.936 versicolor
## virginica  6.588  virginica
sd = tapply(iris$Sepal.Length,    # tapply分组，提取 
            iris$Species,
            sd)
sd = as.data.frame(sd)
sd
##                   sd
## setosa     0.3524897
## versicolor 0.5161711
## virginica  0.6358796
Newiris = cbind(Mean, Sd = sd$sd)
Newiris
##             Mean    Species        Sd
## setosa     5.006     setosa 0.3524897
## versicolor 5.936 versicolor 0.5161711
## virginica  6.588  virginica 0.6358796
ggplot(Newiris,
       aes(x = Species,y = Mean))+
  geom_col(width = 0.5,
           aes(fill = Species))+
  geom_errorbar(aes(ymin = Mean - Sd,
                    ymax = Mean + Sd),
                width = 0.5,
                col = "black")

4.6 箱式图

library(ggplot2)
# 三条线是75%，50%，25%。长度是箱高1.5倍，超过为离群值
ggplot(iris,
       aes(x = Species,
           y = Sepal.Length,
           fill = Species)) + 
  geom_violin(show.legend = F,
              width = 0.8) +     # 叠加小提琴图
  geom_boxplot(show.legend = F,
               width = 0.2,
               fill = "white") +
  geom_jitter(width = 0.1,
              size = 0.5,
              show.legend = F)    # 显示每个点,size为点大小

4.7 一些简单的练习

## 练习：以iris为例，做Sepal.Length和Petal.Length回归分析，并可视化
library(ggplot2)
ggplot(iris,
       aes(x = Sepal.Length,
           y = Petal.Length)) + 
  geom_point(size = 0.5) + 
  geom_smooth(method = lm,
              col = "black") + 
  annotate(geom = "text",
           x = 5.5,
           y = 7,
           size = 5,
           label = "y = -7.10x + 1.86 \n p < 2e-16 \n Adjusted r-squared = 0.75" )
lm.iris = lm(Petal.Length ~ Sepal.Length,
             iris)
summary(lm.iris)

## 练习：做个糖葫芦？
library(ggplot2)
x = rep(1:3,5)
y = 1:5
a = data.frame(x,y)
ggplot(data = a,
       aes(x = x, y = y),
       col = "red") +
  geom_vline(xintercept = 1:3,    # 生成一条直线，交于x = 1,2,3
             size = 3,
             col = "yellow") +
  geom_point(size = 18,
             col = "#FF5000") +
  ylim(0,6) +    # 调整y轴上下限
  xlim(0,4) +
  theme_void()    # 主题设置为空

## 练习:做个字母表？
library(ggplot2)
x = rep(1:6,5)
y = rep(5:1,each = 6)
m = letters
n = LETTERS
l = paste0(n,m)
l
al = c(l,rep(NA,4))    # 设置4个NA值补齐
a = data.frame(x,y,al)
a
ggplot(a,
       aes(x,y)) +
  geom_text(aes(label = al),    # 加文本图层，点的坐标映射字母
            size = 5) +
  theme_void() +
  ylim(-1,6) +
  xlim(0,7)

## 优化一下，做个炫彩字母表？大写字母在上，小写字母在下
library(ggplot2)
x = rep(1:6,5)
y = rep(5:1,each = 6)
al = c(LETTERS,rep(NA,4))
al1 = c(letters,rep(NA,4))
mydata = data.frame(x,y,al)
mydata
ggplot(mydata,
       aes(x,y)) +
  geom_text(aes(label = al,
                col = al),
            size =5,
            show.legend = F) + 
  geom_text(aes(y = y - 0.3,
                label = al1,
                col = al1),
            show.legend = F) +
  theme_void() +
  ylim(-1,6) +
  xlim(0,7)

## 算了 自由发挥吧，图就不放了
# 练习 3个变量
ggplot(iris,
       aes(x = Sepal.Length,
           y = Petal.Length,
           col = Species)) +    # 增加种属变量
  geom_point() +
  geom_smooth(method = lm)

# 练习
ggplot(mtcars,
       aes(x = factor(cyl),
           fill = factor(carb))) +
  geom_bar()

# 练习
ToothGrowth	# 另一个奇奇怪怪的R内置数据
str(ToothGrowth)    # str()查看括号内数据类型信息
ToothGrowth$dose = factor(ToothGrowth$dose)
a = tapply(ToothGrowth$len,
           ToothGrowth$supp:ToothGrowth$dose,
           mean)
a = data.frame(a)
supp = rep(c("OJ","VC"),each = 3)
dose = rep(c(0.5,1,2),2)
b = cbind(len = a$a,supp,dose)
b = as.data.frame(b)
ggplot(b,
       aes(x = supp,
           y = len,
           fill = dose)) + 
  geom_col(position = position_dodge())

# 练习
ggplot(ToothGrowth,
       aes(x = supp,
           y = len,
           fill = dose)) +
  geom_boxplot() + 
  geom_jitter()

# 练习
ggplot(iris,
       aes(x = Sepal.Length,
           y = Sepal.Width,
           size = Petal.Length,
           col = Petal.Length,
           alpha = Petal.Length)) +    # 根据需要选第三个变量
  geom_point()

# 练习
x = rep(1:6,5)
y = rep(5:1,each = 6)
al = c(LETTERS,rep(NA,4))
al1 = c(letters,rep(NA,4))
mydata = data.frame(x,y,al)
mydata
ggplot(mydata,
       aes(x,y)) +
  geom_text(aes(label = al,
                col = al),
            size =5,
            show.legend = F) + 
  geom_text(aes(y = y - 0.3,
                label = al1,
                col = al1),
            show.legend = F) +
  theme_void() +
  ylim(-1,6) +
  xlim(0,7)