數(shù)據(jù)工程基礎(chǔ)(影印版)
出版時(shí)間:2023年03月
頁數(shù):422
“數(shù)據(jù)世界已經(jīng)演變了有一段時(shí)間。首先是設(shè)計(jì)師,然后是數(shù)據(jù)庫管理員,接著是首席信息官和數(shù)據(jù)架構(gòu)師。本書標(biāo)志著該行業(yè)演變和成熟的下一步。對(duì)那些忠于自己職業(yè)和事業(yè)的人來說,這是一本必讀之書。”
——Bill Inmon
數(shù)據(jù)倉庫的創(chuàng)建人
“本書很好地介紹了數(shù)據(jù)移動(dòng)、處理和操作。我強(qiáng)烈將它推薦給任何想要快速掌握數(shù)據(jù)工程或分析的讀者,或者想要填補(bǔ)理解上的空白的現(xiàn)有從業(yè)者?!?br />
——Jordan Tigani
MotherDuck 創(chuàng)始人兼首席執(zhí)行官,
BigQuery的創(chuàng)始工程師和共同創(chuàng)始人
數(shù)據(jù)工程在過去十年間發(fā)展迅速,許多軟件工程師、數(shù)據(jù)科學(xué)家和分析師都在尋找相關(guān)實(shí)踐的全面觀點(diǎn)。通過這本實(shí)踐用書,你將學(xué)習(xí)如何通過評(píng)估數(shù)據(jù)工程生命周期框架中可用的最佳技術(shù)來規(guī)劃和構(gòu)建系統(tǒng),以滿足你的組織和客戶的需求。
作者Joe Reis和Matt Housley將為你介紹數(shù)據(jù)工程的生命周期,向你展示如何綜合運(yùn)用各種云技術(shù),以滿足下游數(shù)據(jù)消費(fèi)者的需求。你將理解如何應(yīng)用數(shù)據(jù)生成、攝取、編排、轉(zhuǎn)換、存儲(chǔ)和治理的概念,無論底層技術(shù)是什么,這些概念在任何數(shù)據(jù)環(huán)境中都至關(guān)重要。
本書將幫助你:
● 簡要了解整個(gè)數(shù)據(jù)工程領(lǐng)域
● 使用端到端的最佳實(shí)踐框架評(píng)估數(shù)據(jù)工程問題
● 在選擇數(shù)據(jù)技術(shù)、架構(gòu)和流程時(shí)避開市場營銷炒作
● 使用數(shù)據(jù)工程生命周期來設(shè)計(jì)和構(gòu)建穩(wěn)健的架構(gòu)
● 在數(shù)據(jù)工程生命周期中融入數(shù)據(jù)治理和安全性
- Preface
- Part I. Foundation and Building Blocks
- 1. Data Engineering Described
- What Is Data Engineering?
- Data Engineering Skills and Activities
- Data Engineers Inside an Organization
- Conclusion
- Additional Resources
- 2. The Data Engineering Lifecycle
- What Is the Data Engineering Lifecycle?
- Major Undercurrents Across the Data Engineering Lifecycle
- Conclusion
- Additional Resources
- 3. Designing Good Data Architecture
- What Is Data Architecture?
- Principles of Good Data Architecture
- Major Architecture Concepts
- Examples and Types of Data Architecture
- Who’s Involved with Designing a Data Architecture?
- Conclusion
- Additional Resources
- 4. Choosing Technologies Across the Data Engineering Lifecycle
- Team Size and Capabilities
- Speed to Market
- Interoperability
- Cost Optimization and Business Value
- Today Versus the Future: Immutable Versus Transitory Technologies
- Location
- Build Versus Buy
- Monolith Versus Modular
- Serverless Versus Servers
- Optimization, Performance, and the Benchmark Wars
- Undercurrents and Their Impacts on Choosing Technologies
- Conclusion
- Additional Resources
- Part II. The Data Engineering Lifecycle in Depth
- 5. Data Generation in Source Systems
- Sources of Data: How Is Data Created?
- Source Systems: Main Ideas
- Source System Practical Details
- Whom You’ll Work With
- Undercurrents and Their Impact on Source Systems
- Conclusion
- Additional Resources
- 6. Storage
- Raw Ingredients of Data Storage
- Data Storage Systems
- Data Engineering Storage Abstractions
- Big Ideas and Trends in Storage
- Whom You’ll Work With
- Undercurrents
- Conclusion
- Additional Resources
- 7. Ingestion
- What Is Data Ingestion?
- Key Engineering Considerations for the Ingestion Phase
- Batch Ingestion Considerations
- Message and Stream Ingestion Considerations
- Ways to Ingest Data
- Whom You’ll Work With
- Undercurrents
- Conclusion
- Additional Resources
- 8. Queries, Modeling, and Transformation
- Queries
- Data Modeling
- Transformations
- Whom You’ll Work With
- Undercurrents
- Conclusion
- Additional Resources
- 9. Serving Data for Analytics, Machine Learning, and Reverse ETL
- General Considerations for Serving Data
- Analytics
- Machine Learning
- What a Data Engineer Should Know About ML
- Ways to Serve Data for Analytics and ML
- Reverse ETL
- Whom You’ll Work With
- Undercurrents
- Conclusion
- Additional Resources
- Part III. Security, Privacy, and the Future of Data Engineering
- 10. Security and Privacy
- People
- Processes
- Technology
- Conclusion
- Additional Resources
- 11. The Future of Data Engineering
- The Data Engineering Lifecycle Isn’t Going Away
- The Decline of Complexity and the Rise of Easy-to-Use Data Tools
- The Cloud-Scale Data OS and Improved Interoperability
- “Enterprisey” Data Engineering
- Titles and Responsibilities Will Morph...
- Moving Beyond the Modern Data Stack, Toward the Live Data Stack
- Conclusion
- A. Serialization and Compression Technical Details
- B. Cloud Networking
- Index
書名:數(shù)據(jù)工程基礎(chǔ)(影印版)
國內(nèi)出版社:東南大學(xué)出版社
出版時(shí)間:2023年03月
頁數(shù):422
書號(hào):978-7-5766-0551-8
原版書書名:Fundamentals of Data Engineering
原版書出版商:O'Reilly Media
Joe Reis
Joe Reis是一名“恢復(fù)中的數(shù)據(jù)科學(xué)家”,也是一名數(shù)據(jù)工程師和架構(gòu)師。
Matt Housley
Matt Housley是一名數(shù)據(jù)工程顧問和云專家。
The animal on the cover of Fundamentals of Data Engineering is the white-eared puffbird (Nystalus chacuru).
So named for the conspicuous patch of white at their ears, as well as for their fluffy plumage, these small, rotund birds are found across a wide swath of central South America, where they inhabit forest edges and savanna.
White-eared puffbirds are sit-and-wait hunters, perching in open spaces for long periods and feeding opportunistically on insects, lizards, and even small mammals that happen to come near. They are most often found alone or in pairs and are relatively quiet birds, vocalizing only rarely.
The International Union for Conservation of Nature has listed the white-eared puffbird as being of least concern, due, in part, to their extensive range and stable population。