Hive-SQL查詢連續(xù)活躍登錄用戶思路詳解
連續(xù)活躍登陸的用戶指至少連續(xù)2天都活躍登錄的用戶
解決類似場(chǎng)景的問(wèn)題
創(chuàng)建數(shù)據(jù)
CREATE TABLE test5active( dt string, user_id string, age int) ROW format delimited fields terminated BY ','; INSERT INTO TABLE test5active VALUES ('2019-02-11','user_1',23),('2019-02-11','user_2',19), ('2019-02-11','user_3',39),('2019-02-11','user_1',23), ('2019-02-11','user_3',39),('2019-02-11','user_1',23), ('2019-02-12','user_2',19),('2019-02-13','user_1',23), ('2019-02-15','user_2',19),('2019-02-16','user_2',19);
思路一:
1、因?yàn)槊刻煊脩舻卿洿螖?shù)可能不止一次,所以需要先將用戶每天的登錄日期去重。
2、再用row_number() over(partition by _ order by _)函數(shù)將用戶id分組,按照登陸時(shí)間進(jìn)行排序。
3、計(jì)算登錄日期減去第二步驟得到的結(jié)果值,用戶連續(xù)登陸情況下,每次相減的結(jié)果都相同。
4、按照id和日期分組并求和,篩選大于等于2的即為連續(xù)活躍登陸的用戶。
第一步:用戶登錄日期去重
select DISTINCT dt,user_id from test5active;
第二步:用row_number() over()函數(shù)計(jì)數(shù)
select t1.user_id,t1.dt, row_number() over(partition by t1.user_id order by t1.dt) day_rank from ( select DISTINCT dt,user_id from test5active )t1;
第三步:日期減去計(jì)數(shù)值得到結(jié)果
select t2.user_id,t2.dt,date_sub(t2.dt,t2.day_rank) as dis from ( select t1.user_id,t1.dt, row_number() over(partition by t1.user_id order by t1.dt) day_rank from ( select DISTINCT dt,user_id from test5active )t1)t2;
第四步:根據(jù)id和結(jié)果分組并計(jì)算總和,大于等于2的即為連續(xù)登陸的用戶,得到 用戶id,開(kāi)始日期,結(jié)束日期,連續(xù)登錄天數(shù)
select t3.user_id,min(t3.dt),max(t3.dt),count(1) from ( select t2.user_id,t2.dt,date_sub(t2.dt,t2.day_rank) as dis from ( select t1.user_id,t1.dt, row_number() over(partition by t1.user_id order by t1.dt) day_rank from ( select DISTINCT dt,user_id from test5active )t1 )t2 )t3 group by t3.user_id,t3.dis having count(1)>1;
用戶id 開(kāi)始日期 結(jié)束日期 連續(xù)登錄天數(shù)
最后:連續(xù)登陸的用戶
select distinct t4.user_id from ( select t3.user_id,min(t3.dt),max(t3.dt),count(1) from ( select t2.user_id,t2.dt,date_sub(t2.dt,t2.day_rank) as dis from ( select t1.user_id,t1.dt, row_number() over(partition by t1.user_id order by t1.dt) day_rank from ( select DISTINCT dt,user_id from test5active )t1 )t2 )t3 group by t3.user_id,t3.dis having count(1)>1 )t4;
思路二:使用lag(向后)或者lead(向前)
select user_id,t1.dt, lead(t1.dt) over(partition by user_id order by t1.dt) as last_date_id from ( select DISTINCT dt,user_id from test5active )t1;
select distinct t2.user_id from ( select user_id,t1.dt, lead(t1.dt) over(partition by user_id order by t1.dt) as last_date_id from ( select DISTINCT dt,user_id from test5active )t1 )t2 where datediff(last_date_id,t2.dt)=1;
參考:
2020年大廠面試題-數(shù)據(jù)倉(cāng)庫(kù)篇
SQL 查詢連續(xù)登陸7天以上的用戶
到此這篇關(guān)于Hive-SQL查詢連續(xù)活躍登陸的用戶的文章就介紹到這了,更多相關(guān)SQL查詢連續(xù)登陸的用戶內(nèi)容請(qǐng)搜索本站以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持本站!
版權(quán)聲明:本站文章來(lái)源標(biāo)注為YINGSOO的內(nèi)容版權(quán)均為本站所有,歡迎引用、轉(zhuǎn)載,請(qǐng)保持原文完整并注明來(lái)源及原文鏈接。禁止復(fù)制或仿造本網(wǎng)站,禁止在非www.sddonglingsh.com所屬的服務(wù)器上建立鏡像,否則將依法追究法律責(zé)任。本站部分內(nèi)容來(lái)源于網(wǎng)友推薦、互聯(lián)網(wǎng)收集整理而來(lái),僅供學(xué)習(xí)參考,不代表本站立場(chǎng),如有內(nèi)容涉嫌侵權(quán),請(qǐng)聯(lián)系alex-e#qq.com處理。