HIVE 中内连接(JOIN ON)与LEFT SEMI JOIN查询结果不一致的分析
由于HIVE中都是等值连接,在JOIN使用的时候,有两种写法在理论上是可以达到相同的效果的,但是由于实际情况的不一样,子表中数据的差异导致结果也不太一样。
写法一:
select a.bucket_id, a.search_type, a.level1, a.name1, a.level2, a.name2, cast((a.alipay_fee) as double) as zhuliu_alipay, cast(0 as double) as total_alipay from tmall_data_fdi_search_zhuliu_alipay_cocerage_bucket_1 a left semi join tmall_data_fdi_dim_main_auc b on (a.level2 = b.cat_id2 and a.brand_id = b.brand_id and b.cat_id2 > 0 and b.brand_id > 0 and b.max_price = 0 )
select a.bucket_id, a.search_type, a.level1, a.name1, a.level2, a.name2, cast((a.alipay_fee) as double) as zhuliu_alipay, cast(0 as double) as total_alipay from tmall_data_fdi_search_zhuliu_alipay_cocerage_bucket_1 a join tmall_data_fdi_dim_main_auc b on (a.level2 = b.cat_id2 and a.brand_id = b.brand_id) where b.cat_id2 > 0 and b.brand_id > 0 and b.max_price = 0