一、问题
按每年的1月1日算当年的第一个自然周
(遇到跨年也不管,如果1月1日是周三,那么到1月5号(周日)算是本年的第一个自然周, 如果按周一是一周的第一天)
计算是本年的第几周,那么 spark sql 如何写 ?
二、分析
难点 :
- Spark SQL 的 DAYOFWEEK 函数返回的每周第一天是周日。
- 边界值的处理,即第一周如何判定、第二周从哪天开始计算。
对应的伪代码
int day_of_week(int day) { if ( day == 7) { return 1; } else { return day + 1; } } dayofyear = DAYOFYEAR(your_date_column) if(dayofyear <= 7 - day_of_week(first_day_of_year_week_number) + 1) { return 1; } else { return ceil( (dayofyear - 1) / 7.0); }
先给出 sql 关键逻辑
CASE WHEN DAYOFWEEK(your_date_column) = 1 THEN 7 ELSE DAYOFWEEK(your_date_column) - 1 END AS day_of_week, CASE WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7 ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1 END AS first_day_of_year_week_number, to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year, // 上面的 sql 是内层 CASE WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1 ELSE CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1 END AS week_number,
多找一些边界值测试一下。
DAYOFWEEK(your_date_column)分别返回
周日 周一 周二 周三 周四 周五 周六 1 2 3 4 5 6 7
如果要让周一是第一天,那么需要调整偏移量
int day_of_week(int day) { if ( day == 7) { return 1; } else { return day + 1; } }
调整后的函数逻辑
周一 周二 周三 周四 周五 周六 周日 1 2 3 4 5 6 7
sql 逻辑
CASE WHEN DAYOFWEEK(your_date_column) = 1 THEN 7 ELSE DAYOFWEEK(your_date_column) - 1 END AS day_of_week,
2023-01-01 年是周日,
那么 DAYOFWEEK(your_date_column) 返回的是 1,即本周第一天。
WEEKOFYEAR(your_date_column) 返回的是 52, 即 2022 年最后一周。
但实际上我们要求的结果应该是 2023 年的第一周。
2023-01-02 年是周一,
那么 DAYOFWEEK(your_date_column) 返回的是 2,即本周第二天。
WEEKOFYEAR(your_date_column) 返回的是 1, 即 2023 年第一周。
但实际上我们要求的结果应该是 2023 年的第二周。
三、验证
drop table your_table; CREATE TABLE your_table ( id INT, your_date_column DATE ); CREATE OR REPLACE TEMPORARY VIEW temp_view AS SELECT 1 as id, to_date('2023-01-01', 'yyyy-MM-dd') as your_date_column UNION ALL SELECT 2, to_date('2023-01-02', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-03', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-04', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-05', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-06', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-07', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-08', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-09', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-10', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-11', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-12', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-13', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-14', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-15', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-16', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-17', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-18', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-19', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-20', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-21', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-22', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-23', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-24', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-25', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-26', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-27', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-28', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-29', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-30', 'yyyy-MM-dd') UNION ALL SELECT 2, to_date('2023-01-31', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-01', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-02', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-03', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-04', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-05', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-06', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-07', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-08', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-09', 'yyyy-MM-dd') UNION ALL SELECT 3, to_date('2023-02-15', 'yyyy-MM-dd') UNION ALL SELECT 4, to_date('2023-12-31', 'yyyy-MM-dd') UNION ALL SELECT 5, to_date('2024-01-01', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-02', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-03', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-04', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-05', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-06', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-07', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-08', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-09', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-10', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-11', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-12', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-13', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-14', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-15', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-16', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-17', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-18', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-19', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-20', 'yyyy-MM-dd') UNION ALL SELECT 6, to_date('2024-01-21', 'yyyy-MM-dd') ; INSERT INTO your_table SELECT * FROM temp_view; SELECT your_date_column, DAYOFYEAR(your_date_column), 8 - first_day_of_year_week_number, (DAYOFYEAR(your_date_column) - day_of_week ), (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 , CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ), CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1, CASE WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1 ELSE CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1 END AS week_number, // 所求的结果 * FROM ( SELECT '|', your_date_column, DAYOFWEEK(your_date_column), DAYOFYEAR(your_date_column), CASE WHEN DAYOFWEEK(your_date_column) = 1 THEN 7 ELSE DAYOFWEEK(your_date_column) - 1 END AS day_of_week, CASE WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7 ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1 END AS first_day_of_year_week_number, // 每年第一天是周几,如果是周一返回 1,周日返回 7 to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year, // 每年第一天的日期 date_format(your_date_column, 'EEEE') as WEEK FROM your_table );
2023-01-01 1 1 -6 -0.857143 0 1 1 | 2023-01-01 1 1 7 7 2023-01-01 Sunday 2023-01-02 2 1 1 0.142857 1 2 2 | 2023-01-02 2 2 1 7 2023-01-01 Monday 2023-01-03 3 1 1 0.142857 1 2 2 | 2023-01-03 3 3 2 7 2023-01-01 Tuesday 2023-01-04 4 1 1 0.142857 1 2 2 | 2023-01-04 4 4 3 7 2023-01-01 Wednesday 2023-01-05 5 1 1 0.142857 1 2 2 | 2023-01-05 5 5 4 7 2023-01-01 Thursday 2023-01-06 6 1 1 0.142857 1 2 2 | 2023-01-06 6 6 5 7 2023-01-01 Friday 2023-01-07 7 1 1 0.142857 1 2 2 | 2023-01-07 7 7 6 7 2023-01-01 Saturday 2023-01-08 8 1 1 0.142857 1 2 2 | 2023-01-08 1 8 7 7 2023-01-01 Sunday 2023-01-09 9 1 8 1.142857 2 3 3 | 2023-01-09 2 9 1 7 2023-01-01 Monday 2023-01-10 10 1 8 1.142857 2 3 3 | 2023-01-10 3 10 2 7 2023-01-01 Tuesday 2023-01-11 11 1 8 1.142857 2 3 3 | 2023-01-11 4 11 3 7 2023-01-01 Wednesday 2023-01-12 12 1 8 1.142857 2 3 3 | 2023-01-12 5 12 4 7 2023-01-01 Thursday 2023-01-13 13 1 8 1.142857 2 3 3 | 2023-01-13 6 13 5 7 2023-01-01 Friday 2023-01-14 14 1 8 1.142857 2 3 3 | 2023-01-14 7 14 6 7 2023-01-01 Saturday 2023-01-15 15 1 8 1.142857 2 3 3 | 2023-01-15 1 15 7 7 2023-01-01 Sunday 2023-01-16 16 1 15 2.142857 3 4 4 | 2023-01-16 2 16 1 7 2023-01-01 Monday 2023-01-17 17 1 15 2.142857 3 4 4 | 2023-01-17 3 17 2 7 2023-01-01 Tuesday 2023-01-18 18 1 15 2.142857 3 4 4 | 2023-01-18 4 18 3 7 2023-01-01 Wednesday 2023-01-19 19 1 15 2.142857 3 4 4 | 2023-01-19 5 19 4 7 2023-01-01 Thursday 2023-01-20 20 1 15 2.142857 3 4 4 | 2023-01-20 6 20 5 7 2023-01-01 Friday 2023-01-21 21 1 15 2.142857 3 4 4 | 2023-01-21 7 21 6 7 2023-01-01 Saturday 2023-01-22 22 1 15 2.142857 3 4 4 | 2023-01-22 1 22 7 7 2023-01-01 Sunday 2023-01-23 23 1 22 3.142857 4 5 5 | 2023-01-23 2 23 1 7 2023-01-01 Monday 2023-01-24 24 1 22 3.142857 4 5 5 | 2023-01-24 3 24 2 7 2023-01-01 Tuesday 2023-01-25 25 1 22 3.142857 4 5 5 | 2023-01-25 4 25 3 7 2023-01-01 Wednesday 2023-01-26 26 1 22 3.142857 4 5 5 | 2023-01-26 5 26 4 7 2023-01-01 Thursday 2023-01-27 27 1 22 3.142857 4 5 5 | 2023-01-27 6 27 5 7 2023-01-01 Friday 2023-01-28 28 1 22 3.142857 4 5 5 | 2023-01-28 7 28 6 7 2023-01-01 Saturday 2023-01-29 29 1 22 3.142857 4 5 5 | 2023-01-29 1 29 7 7 2023-01-01 Sunday 2023-01-30 30 1 29 4.142857 5 6 6 | 2023-01-30 2 30 1 7 2023-01-01 Monday 2023-01-31 31 1 29 4.142857 5 6 6 | 2023-01-31 3 31 2 7 2023-01-01 Tuesday 2023-02-01 32 1 29 4.142857 5 6 6 | 2023-02-01 4 32 3 7 2023-01-01 Wednesday 2023-02-02 33 1 29 4.142857 5 6 6 | 2023-02-02 5 33 4 7 2023-01-01 Thursday 2023-02-03 34 1 29 4.142857 5 6 6 | 2023-02-03 6 34 5 7 2023-01-01 Friday 2023-02-04 35 1 29 4.142857 5 6 6 | 2023-02-04 7 35 6 7 2023-01-01 Saturday 2023-02-05 36 1 29 4.142857 5 6 6 | 2023-02-05 1 36 7 7 2023-01-01 Sunday 2023-02-06 37 1 36 5.142857 6 7 7 | 2023-02-06 2 37 1 7 2023-01-01 Monday 2023-02-07 38 1 36 5.142857 6 7 7 | 2023-02-07 3 38 2 7 2023-01-01 Tuesday 2023-02-08 39 1 36 5.142857 6 7 7 | 2023-02-08 4 39 3 7 2023-01-01 Wednesday 2023-02-09 40 1 36 5.142857 6 7 7 | 2023-02-09 5 40 4 7 2023-01-01 Thursday 2023-02-15 46 1 43 6.142857 7 8 8 | 2023-02-15 4 46 3 7 2023-01-01 Wednesday 2023-12-31 365 1 358 51.142857 52 53 53 | 2023-12-31 1 365 7 7 2023-01-01 Sunday 2024-01-01 1 7 0 0.000000 0 1 1 | 2024-01-01 2 1 1 1 2024-01-01 Monday 2024-01-02 2 7 0 0.000000 0 1 1 | 2024-01-02 3 2 2 1 2024-01-01 Tuesday 2024-01-03 3 7 0 0.000000 0 1 1 | 2024-01-03 4 3 3 1 2024-01-01 Wednesday 2024-01-04 4 7 0 0.000000 0 1 1 | 2024-01-04 5 4 4 1 2024-01-01 Thursday 2024-01-05 5 7 0 0.000000 0 1 1 | 2024-01-05 6 5 5 1 2024-01-01 Friday 2024-01-06 6 7 0 0.000000 0 1 1 | 2024-01-06 7 6 6 1 2024-01-01 Saturday 2024-01-07 7 7 0 0.000000 0 1 1 | 2024-01-07 1 7 7 1 2024-01-01 Sunday 2024-01-08 8 7 7 1.000000 1 2 2 | 2024-01-08 2 8 1 1 2024-01-01 Monday 2024-01-09 9 7 7 1.000000 1 2 2 | 2024-01-09 3 9 2 1 2024-01-01 Tuesday 2024-01-10 10 7 7 1.000000 1 2 2 | 2024-01-10 4 10 3 1 2024-01-01 Wednesday 2024-01-11 11 7 7 1.000000 1 2 2 | 2024-01-11 5 11 4 1 2024-01-01 Thursday 2024-01-12 12 7 7 1.000000 1 2 2 | 2024-01-12 6 12 5 1 2024-01-01 Friday 2024-01-13 13 7 7 1.000000 1 2 2 | 2024-01-13 7 13 6 1 2024-01-01 Saturday 2024-01-14 14 7 7 1.000000 1 2 2 | 2024-01-14 1 14 7 1 2024-01-01 Sunday 2024-01-15 15 7 14 2.000000 2 3 3 | 2024-01-15 2 15 1 1 2024-01-01 Monday 2024-01-16 16 7 14 2.000000 2 3 3 | 2024-01-16 3 16 2 1 2024-01-01 Tuesday 2024-01-17 17 7 14 2.000000 2 3 3 | 2024-01-17 4 17 3 1 2024-01-01 Wednesday 2024-01-18 18 7 14 2.000000 2 3 3 | 2024-01-18 5 18 4 1 2024-01-01 Thursday 2024-01-19 19 7 14 2.000000 2 3 3 | 2024-01-19 6 19 5 1 2024-01-01 Friday 2024-01-20 20 7 14 2.000000 2 3 3 | 2024-01-20 7 20 6 1 2024-01-01 Saturday 2024-01-21 21 7 14 2.000000 2 3 3 | 2024-01-21 1 21 7 1 2024-01-01 Sunday Time taken: 8.512 seconds, Fetched 63 row(s) 在这个查询中: date_format 函数的第二个参数 'EEEE' 指定返回完整的星期名称(如 Monday, Tuesday 等)。 DAYOFYEAR(your_date_column) 计算出年中的天数。 DAYOFWEEK(your_date_column) 返回一周中的某天(以周日为一周的第一天)。
// 直接求结果,整理后的 sql 表达式 SELECT your_date_column, CASE WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1 ELSE CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1 END AS week_number FROM ( SELECT your_date_column, CASE WHEN DAYOFWEEK(your_date_column) = 1 THEN 7 ELSE DAYOFWEEK(your_date_column) - 1 END AS day_of_week, CASE WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7 ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1 END AS first_day_of_year_week_number, to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year, date_format(your_date_column, 'EEEE') as WEEK FROM your_table ); 2023-01-01 1 2023-01-02 2 2023-01-03 2 2023-01-04 2 2023-01-05 2 2023-01-06 2 2023-01-07 2 2023-01-08 2 2023-01-09 3 2023-01-10 3 2023-01-11 3 2023-01-12 3 2023-01-13 3 2023-01-14 3 2023-01-15 3 2023-01-16 4 2023-01-17 4 2023-01-18 4 2023-01-19 4 2023-01-20 4 2023-01-21 4 2023-01-22 4 2023-01-23 5 2023-01-24 5 2023-01-25 5 2023-01-26 5 2023-01-27 5 2023-01-28 5 2023-01-29 5 2023-01-30 6 2023-01-31 6 2023-02-01 6 2023-02-02 6 2023-02-03 6 2023-02-04 6 2023-02-05 6 2023-02-06 7 2023-02-07 7 2023-02-08 7 2023-02-09 7 2023-02-15 8 2023-12-31 53 2024-01-01 1 2024-01-02 1 2024-01-03 1 2024-01-04 1 2024-01-05 1 2024-01-06 1 2024-01-07 1 2024-01-08 2 2024-01-09 2 2024-01-10 2 2024-01-11 2 2024-01-12 2 2024-01-13 2 2024-01-14 2 2024-01-15 3 2024-01-16 3 2024-01-17 3 2024-01-18 3 2024-01-19 3 2024-01-20 3 2024-01-21 3 Time taken: 0.493 seconds, Fetched 63 row(s) 23/11/14 14:27:07 INFO SparkSQLCLIDriver: Time taken: 0.493 seconds, Fetched 63 row(s)
猜你喜欢
- 15小时前将网页数据读入数据库+将数据库数据读出到网页——基于python flask实现网页与数据库的交互连接【全网最全】
- 13小时前空调模式图标含义图解(格力空调模式图标含义图解)
- 12小时前柠檬英语(柠檬英语复数怎么读)
- 12小时前悉知是什么意思(悉知是什么意思?知悉又是什么意思?)
- 6小时前沈阳辉山乳业(沈阳辉山乳业是国企吗)
- 5小时前patrol尼桑(patrol尼桑途乐价格)
- 2小时前除夕夜守岁打一歇后语(除夕夜守岁打歇后语是什么)
- 51分钟前路考注意事项(路考注意事项和要领)
- 31分钟前国考补录是什么意思(2021国考补录是怎么补录的)
- 26分钟前王者荣耀赛季多久更新(王者荣耀赛季多久更新一次?)
网友评论
- 搜索
- 最新文章
- 热门文章