機(jī)器學(xué)習(xí)數(shù)據(jù)訓(xùn)練(影印版)
出版時(shí)間:2024年03月
頁數(shù):329
“本書360度地全面介紹了如何生成高質(zhì)量的訓(xùn)練數(shù)據(jù)并啟動(dòng)新項(xiàng)目。”
——Anirudh Koul
Pinterest數(shù)據(jù)科學(xué)及機(jī)器學(xué)習(xí)主管
“做好機(jī)器學(xué)習(xí)需要人們學(xué)習(xí)訓(xùn)練數(shù)據(jù)。這本書價(jià)值連城。”
——Neal Linson
InCite Logix和LLM Superstar
首席數(shù)據(jù)和分析官
訓(xùn)練數(shù)據(jù)與算法本身一樣關(guān)系到數(shù)據(jù)項(xiàng)目的成敗,因?yàn)榇蠖鄶?shù)AI系統(tǒng)的失敗都與訓(xùn)練數(shù)據(jù)有關(guān)。但是,盡管訓(xùn)練數(shù)據(jù)是AI和機(jī)器學(xué)習(xí)成功的基礎(chǔ),但卻很少有全面的資源能幫助你掌握這一過程。
在這本實(shí)踐指南中,作者Anthony Sarkis(Diffgram AI數(shù)據(jù)訓(xùn)練軟件的首席工程師)向技術(shù)專業(yè)人員、管理人員、主題專家展示了如何使用和擴(kuò)展訓(xùn)練數(shù)據(jù),同時(shí)闡明了監(jiān)督機(jī)器的人性化一面。工程領(lǐng)導(dǎo)者、數(shù)據(jù)工程師、數(shù)據(jù)科學(xué)專業(yè)人士都將深入;了解使用訓(xùn)練數(shù)據(jù)取得成功所需的概念、工具和流程。
通過本書,你將學(xué)習(xí)如何:
● 有效地使用包括模式、原始數(shù)據(jù)、注釋在內(nèi)的訓(xùn)練數(shù)據(jù)
● 改造你的工作、團(tuán)隊(duì)或組織,使其更加以AI/ML數(shù)據(jù)為中心
● 向其他員工、團(tuán)隊(duì)成員、利益相關(guān)者清晰地解釋訓(xùn)練數(shù)據(jù)概念
● 為生產(chǎn)級(jí)AI應(yīng)用設(shè)計(jì)、部署、交付訓(xùn)練數(shù)據(jù)
● 識(shí)別并糾正新的基于訓(xùn)練數(shù)據(jù)的故障模式,如數(shù)據(jù)偏差
● 自信地使用自動(dòng)化技術(shù)來更有效地創(chuàng)建訓(xùn)練數(shù)據(jù)
● 成功維護(hù)、操作、改進(jìn)訓(xùn)練數(shù)據(jù)記錄系統(tǒng)
- Preface
- 1. Training Data Introduction
- Training Data Intents
- Training Data Opportunities
- Why Training Data Matters
- Training Data in the Wild
- Generative AI
- Summary
- 2. Getting Up and Running
- Introduction
- Getting Up and Running
- Tools Overview
- Trade-Offs
- History
- Summary
- 3. Schema
- Schema Deep Dive Introduction
- Labels and Attributes—What Is It?
- Spatial Representation—Where Is It?
- Relationships, Sequences, Time Series: When Is It?
- Guides and Instructions
- Relation of Machine Learning Tasks to Training Data
- General Concepts
- Summary
- 4. Data Engineering
- Introduction
- Raw Data Storage
- Formatting and Mapping
- Data Access
- Security
- Pre-Labeling
- Summary
- 5. Workflow
- Introduction
- Glue Between Tech and People
- Getting Started with Human Tasks
- Quality Assurance
- Analytics
- Models
- Dataflow
- Direct Annotation
- Summary
- 6. Theories, Concepts, and Maintenance
- Introduction
- Theories
- General Concepts
- Sample Creation
- Maintenance
- Training Data Management
- Summary
- 7. AI Transformation and Use Cases
- Introduction
- AI Transformation
- Appoint a Leader: The Director of AI Data
- Use Case Discovery
- The New “Crowd Sourcing”: Your Own Experts
- Modern Training Data Tools
- Summary
- 8. Automation
- Introduction
- Getting Started
- Trade-Offs
- Pre-Labeling
- Interactive Annotation Automation
- Quality Assurance Automation
- Data Discovery: What to Label
- Augmentation
- Simulation and Synthetic Data
- Media Specific
- Domain Specific
- Summary
- 9. Case Studies and Stories
- Introduction
- Industry
- An Academic Approach to Training Data
- Summary
- Index
書名:機(jī)器學(xué)習(xí)數(shù)據(jù)訓(xùn)練(影印版)
國內(nèi)出版社:東南大學(xué)出版社
出版時(shí)間:2024年03月
頁數(shù):329
書號(hào):978-1492094524
原版書書名:Training Data for Machine Learning
原版書出版商:O'Reilly Media
Anthony Sarkis
Anthony Sarkis是Diffgram人工智能數(shù)據(jù)訓(xùn)練軟件的首席工程師,也是Diffgram公司的首席技術(shù)官和創(chuàng)始人。在此之前,他是Skidmore, Owings & Merrill公司的研發(fā)軟件工程師,并與他人共同創(chuàng)辦了DriveCarma.ca。
The animals on the cover of Training Data for Machine Learning are black-tailed prairie dogs (Cynomys ludovicianus). While they are actually a type of ground squirrel, they received the name prairie dog because of the habitats they live in and because the sound of their warning calls are similar to a dog’s bark.
Black-tailed prairie dogs are small rodents that weigh between 2 and 3 pounds and grow between 14 and 17 inches long. They have mostly tan fur that is lighter on their bellies and their namesake black tail tip. They have short, round ears, and eyes that are relatively large in comparison to the size of their bodies. Their feet have long claws, which are ideal for digging burrows into the ground.
True to their name, black-tailed prairie dogs live in a variety of grasslands and prairie in the Great Plains of North America. Their habitat usually consists of flat, dry, sparsely vegetated land, such as short grass prairie, mixed-grass prairie, sagebrush, and desert grasslands. Their expansive range is east of the Rocky Mountains in the United States and Canada to the border of Mexico.
Black-tailed prairie dogs may not be considered endangered, but they are a keystone species. They impact the diversity of vegetation, vertebrates, and invertebrates because of their foraging habits and presence as potential prey. It has been shown that grasslands inhabited by them have a higher degree of biodiversity than grasslands not inhabited by them. Prior to a large amount of habitat destruction, they used to be the most abundant species of prairie dog in North America. Many of the animals on O’Reilly covers are endangered; all of them are important to the world.