MySQL數(shù)據(jù)庫 Load Data 多種用法
MySQL Load Data 的多種用法
一、LOAD 基本背景
我們在數(shù)據(jù)庫運維過程中難免會涉及到需要對文本數(shù)據(jù)進(jìn)行處理,并導(dǎo)入到數(shù)據(jù)庫中,本文整理了一些導(dǎo)入導(dǎo)出時常見的場景進(jìn)行示例演示。
二、LOAD 基礎(chǔ)參數(shù)
文章后續(xù)示例均使用以下命令導(dǎo)出的 csv 格式樣例數(shù)據(jù)(以 , 逗號做分隔符,以 " 雙引號作為界定符)
-- 導(dǎo)出基礎(chǔ)參數(shù) select * into outfile '/data/mysql/3306/tmp/employees.txt' character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' from employees.employees limit 10; -- 導(dǎo)入基礎(chǔ)參數(shù) load data infile '/data/mysql/3306/tmp/employees.txt' replace into table demo.emp character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' ...
三、LOAD 示例數(shù)據(jù)及示例表結(jié)構(gòu)
以下為示例數(shù)據(jù),表結(jié)構(gòu)及對應(yīng)關(guān)系信息
-- 導(dǎo)出的文件數(shù)據(jù)內(nèi)容 [root@10-186-61-162 tmp]# cat employees.txt "10001","1953-09-02","Georgi","Facello","M","1986-06-26" "10002","1964-06-02","Bezalel","Simmel","F","1985-11-21" "10003","1959-12-03","Parto","Bamford","M","1986-08-28" "10004","1954-05-01","Chirstian","Koblick","M","1986-12-01" "10005","1955-01-21","Kyoichi","Maliniak","M","1989-09-12" "10006","1953-04-20","Anneke","Preusig","F","1989-06-02" "10007","1957-05-23","Tzvetan","Zielinski","F","1989-02-10" "10008","1958-02-19","Saniya","Kalloufi","M","1994-09-15" "10009","1952-04-19","Suma
四、LOAD 場景示例
場景1. LOAD 文件中的字段比數(shù)據(jù)表中的字段多
只需要文本文件中部分?jǐn)?shù)據(jù)導(dǎo)入到數(shù)據(jù)表中
-- 臨時創(chuàng)建2個字段的表結(jié)構(gòu) SQL > create table emp_tmp select emp_no,hire_date from emp; SQL > desc emp_tmp; +-----------+------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+------+------+-----+---------+-------+ | emp_no | int | NO | | NULL | | | hire_date | date | NO | | NULL | | +-----------+------+------+-----+---------+-------+ -- 導(dǎo)入數(shù)據(jù)語句 load data infile '/data/mysql/3306/tmp/employees.txt' replace into table demo.emp_tmp character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6) -- 該部分對應(yīng)employees.txt文件中6列數(shù)據(jù) -- 只對導(dǎo)出數(shù)據(jù)中指定的2個列與表中字段做匹配,mapping關(guān)系指定的順序不影響導(dǎo)入結(jié)果 set hire_date=@C6, emp_no=@C1; -- 導(dǎo)入數(shù)據(jù)結(jié)果示例 SQL > select * from emp_tmp; +--------+------------+ | emp_no | hire_date | +--------+------------+ | 10001 | 1986-06-26 | | 10002 | 1985-11-21 | | 10003 | 1986-08-28 | | 10004 | 1986-12-01 | | 10005 | 1989-09-12 | | 10006 | 1989-06-02 | | 10007 | 1989-02-10 | | 10008 | 1994-09-15 | | 10009 | 1985-02-18 | | 10010 | 1989-08-24 | +--------+------------+ 10 rows in set (0.0016 sec)
場景2. LOAD 文件中的字段比數(shù)據(jù)表中的字段少
表字段不僅包含文本文件中所有數(shù)據(jù),還包含了額外的字段
-- 導(dǎo)入數(shù)據(jù)語句 load data infile '/data/mysql/3306/tmp/employees.txt' replace into table demo.emp character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6) -- 該部分對應(yīng)employees.txt文件中6列數(shù)據(jù) -- 將文件中的字段與表中字段做mapping對應(yīng),表中多出的字段不做處理 set emp_no=@C1, birth_date=@C2, first_name=@C3, last_name=@C4, gender=@C5, hire_date=@C6;
場景3. LOAD 生成自定義字段數(shù)據(jù)
從場景 2 的驗證可以看到,emp 表中新增的字段
fullname,modify_date,delete_flag
字段在導(dǎo)入時并未做處理,被置為了 NULL 值,如果需要對其進(jìn)行處理,可在 LOAD 時通過MySQL支持的函數(shù)
或給定固定值
自行定義數(shù)據(jù),對于文件中存在的字段也可做函數(shù)處理,結(jié)合導(dǎo)入導(dǎo)出,實現(xiàn)簡單的 ETL 功能,如下所示:
-- 導(dǎo)入數(shù)據(jù)語句 load data infile '/data/mysql/3306/tmp/employees.txt' replace into table demo.emp character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6) -- 該部分對應(yīng)employees.txt文件中6列數(shù)據(jù) -- 以下部分明確對表中字段與數(shù)據(jù)文件中的字段做Mapping關(guān)系,不存在的數(shù)據(jù)通過函數(shù)處理生成(也可設(shè)置為固定值) set emp_no=@C1, birth_date=@C2, first_name=upper(@C3), -- 將導(dǎo)入的數(shù)據(jù)轉(zhuǎn)為大寫 last_name=lower(@C4), -- 將導(dǎo)入的數(shù)據(jù)轉(zhuǎn)為小寫 fullname=concat(first_name,' ',last_name), -- 對first_name和last_name做拼接 gender=@C5, hire_date=@C6 , modify_date=now(), -- 生成當(dāng)前時間數(shù)據(jù) delete_flag=if(hire_date<'1988-01-01','Y','N'); -- 對需要生成的值基于某一列做條件運算
場景4. LOAD 定長數(shù)據(jù)
定長數(shù)據(jù)的特點如下所示,可以使用函數(shù)取出字符串中固定長度來生成指定列數(shù)據(jù)
SQL > select c1 as sample_data, substr(c1,1,3) as c1, substr(c1,4,3) as c2, substr(c1,7,2) as c3, substr(c1,9,5) as c4, substr(c1,14,3) as c5, substr(c1,17,3) as c6 from t1 *************************** 1. row *************************** sample_data: ABC余振興CDMySQLEFG數(shù)據(jù)庫 c1: ABC c2: 余振興 c3: CD c4: MySQL c5: EFG c6: 數(shù)據(jù)庫
定長數(shù)據(jù)導(dǎo)入需要明確每列數(shù)據(jù)占用的字符個數(shù),以下直接使用 rpad 對現(xiàn)有的表數(shù)據(jù)填充空格的方式生成定長數(shù)據(jù)用作示例使用
-- 生成定長數(shù)據(jù) SQL > select concat(rpad(emp_no,10,' '), rpad(birth_date,19,' '), rpad(first_name,14,' '), rpad(last_name,16,' '), rpad(gender,2,' '), rpad(hire_date,19,' ')) as fixed_length_data from employees.employees limit 10; +----------------------------------------------------------------------------------+ | fixed_length_data | +----------------------------------------------------------------------------------+ | 10001 1953-09-02 Georgi Facello M 1986-06-26 | | 10002 1964-06-02 Bezalel Simmel F 1985-11-21 | | 10003 1959-12-03 Parto Bamford M 1986-08-28 | | 10004 1954-05-01 Chirstian Koblick M 1986-12-01 | | 10005 1955-01-21 Kyoichi Maliniak M 1989-09-12 | | 10006 1953-04-20 Anneke Preusig F 1989-06-02 | | 10007 1957-05-23 Tzvetan Zielinski F 1989-02-10 | | 10008 1958-02-19 Saniya Kalloufi M 1994-09-15 | | 10009 1952-04-19 Sumant Peac F 1985-02-18 | | 10010 1963-06-01 Duangkaew Piveteau F 1989-08-24 | +----------------------------------------------------------------------------------+ -- 導(dǎo)出定長數(shù)據(jù) select concat(rpad(emp_no,10,' '), rpad(birth_date,19,' '), rpad(first_name,14,' '), rpad(last_name,16,' '), rpad(gender,2,' '), rpad(hire_date,19,' ')) as fixed_length_data into outfile '/data/mysql/3306/tmp/employees_fixed.txt' character set utf8mb4 lines terminated by '\n' from employees.employees limit 10; -- 導(dǎo)出數(shù)據(jù)示例 [root@10-186-61-162 tmp]# cat employees_fixed.txt 10001 1953-09-02 Georgi Facello M 1986-06-26 10002 1964-06-02 Bezalel Simmel F 1985-11-21 10003 1959-12-03 Parto Bamford M 1986-08-28 10004 1954-05-01 Chirstian Koblick M 1986-12-01 10005 1955-01-21 Kyoichi Maliniak M 1989-09-12 10006 1953-04-20 Anneke Preusig F 1989-06-02 10007 1957-05-23 Tzvetan Zielinski F 1989-02-10 10008 1958-02-19 Saniya Kalloufi M 1994-09-15 10009 1952-04-19 Sumant Peac F 1985-02-18 10010 1963-06-01 Duangkaew Piveteau F 1989-08-24 -- 導(dǎo)入定長數(shù)據(jù) load data infile '/data/mysql/3306/tmp/employees_fixed.txt' replace into table demo.emp character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@row) -- 對一行數(shù)據(jù)定義為一個整體 set emp_no = trim(substr(@row,1,10)),-- 使用substr取前10個字符,并去除頭尾空格數(shù)據(jù) birth_date = trim(substr(@row,11,19)),-- 后續(xù)字段以此類推 first_name = trim(substr(@row,30,14)), last_name = trim(substr(@row,44,16)), fullname = concat(first_name,' ',last_name), -- 對first_name和last_name做拼接 gender = trim(substr(@row,60,2)), hire_date = trim(substr(@row,62,19)), modify_date = now(), delete_flag = if(hire_date<'1988-01-01','Y','N'); -- 對需要生成的值基于某一列做條件運算
五、LOAD 總結(jié)
1.默認(rèn)情況下導(dǎo)入的順序以文本文件 列-從左到右,行-從上到下
的順序?qū)?/p>
2.如果表結(jié)構(gòu)和文本數(shù)據(jù)不一致,建議將文本文件中的各列依次順序編號并與表中字段建立 mapping 關(guān)系,以防數(shù)據(jù)導(dǎo)入到錯誤的字段
3.對于待導(dǎo)入的文本文件較大的場景,建議將文件 按行拆分
為多個小文件,如用 split 拆分
4.對文件導(dǎo)入后建議執(zhí)行以下語句驗證導(dǎo)入的數(shù)據(jù)是否有 Warning
,ERROR
以及導(dǎo)入的數(shù)據(jù)量
GET DIAGNOSTICS @p1=NUMBER,@p2=ROW_COUNT;
select @p1 AS ERROR_COUNT,@p2 as ROW_COUNT;
5.文本文件數(shù)據(jù)與表結(jié)構(gòu)存在過大的差異或數(shù)據(jù)需要做清洗轉(zhuǎn)換,建議還是用專業(yè)的 ETL 工具或先粗略導(dǎo)入 MySQL 中再進(jìn)行加工轉(zhuǎn)換處理
以上就是 MySQL Load Data 數(shù)據(jù)的多種用法的詳細(xì)內(nèi)容,更多關(guān)于MySQL Load Data 的用法的資料請關(guān)注本站其它相關(guān)文章!,希望大家以后多多支持本站!
版權(quán)聲明:本站文章來源標(biāo)注為YINGSOO的內(nèi)容版權(quán)均為本站所有,歡迎引用、轉(zhuǎn)載,請保持原文完整并注明來源及原文鏈接。禁止復(fù)制或仿造本網(wǎng)站,禁止在非www.sddonglingsh.com所屬的服務(wù)器上建立鏡像,否則將依法追究法律責(zé)任。本站部分內(nèi)容來源于網(wǎng)友推薦、互聯(lián)網(wǎng)收集整理而來,僅供學(xué)習(xí)參考,不代表本站立場,如有內(nèi)容涉嫌侵權(quán),請聯(lián)系alex-e#qq.com處理。