sql - ETL - Views or persist tables? -
when building data warehouse see 2 main approaches etl-process:
1. view - view of views - view of views of views - ...
approach 1 in database , has advantage don't have redundant data, lead performance issues.
2. stage table (copy of data) - clear table (copy of data) - dwh table (copy of data) - ...
approach 2 done many tools stored procedures , jobs or etl-tool ssis. advantage here it's easy understand process can visualise pretty good. have overall etl-performance , many predefined tasks etc. problem example, change of process more complex persistent tables have changed.
in real world see mix of both, when many people have worked on process. of course depends on situation (size of tables, how similar processes designed in company, how complex etl-process, ...).
i prefer copy tables, keep etl-process simple , if possible in etl-tool (usually ssis in case) designed purpose.
but best practice , why?
views views of views not scale volume of data in dwh. when comes of dwh mean talking huge volumes of data. integration of data multiple sources common usecase dwh. stage->tranform-->fact/dim 1 of common way how dwh built store data. yes change when talk hdfs , other technologies, views not able give desired performance in dwh. have seen many systems , of them have multi step etl process first data dwh sources , clean/process/conform/transform data via etl/others in dimensional/other model.
Comments
Post a Comment