Содержание
- 2. Thinking More Deeply about Data and Computation We’ve seen: semi-structured HTML and unstructured text, represented using
- 3. A First Question: What Are We Trying to Capture? “Structured data should capture the semantics of
- 4. Part of the Goal: Modeling Concepts and Instances The famous example from logic and philosophy, attributed
- 5. Some Starting Points We model knowledge using notions dating back to ancient Greece: Classes, concepts, or
- 6. Modeling Classes, Instances, Properties Using Logical Predicates We can use logical assertions to describe everything. Classes:
- 7. We Can Instead Think of this As Links between Classes + Instances Person Adult Man Aristotle
- 8. We Can Instead Think of this As Links between Classes + Instances Person Adult Man Aristotle
- 9. Entity-Relationship Graphs Model Classes as Named Sets of Linked Instances Person Adult Man Life Stage subclassOf
- 10. Entity-Relationship Graphs: A Syntax for Entities, Properties, Relationships Person Adult Man Life Stage ID Name Birth
- 11. Entities and Relationships Correspond to Relationships or Dataframes! Entity set: represents all of the entities of
- 12. The Tables Let Us Encode a Graph within the Data! Person HasTeacher Aristotle Plato Socrates teacher
- 13. The Tables Let Us Encode a Graph within the Data! Person HasTeacher Aristotle Plato Socrates teacher
- 14. ER is a General Model: A Graph of Entities & Relationships Vyas et al, BMC Genomics
- 15. From the Basics of Entity-Relationship Diagrams to General Data(base) Design Deciding on the entities, relationships, and
- 16. Considering Non-“Flat” Data
- 17. A Common Point of Confusion “Relational data can only capture flat relationships” Not true: it represents
- 18. Hierarchy vs Relations (“NoSQL” vs “SQL”) Sometimes it’s convenient to take data we could codify as
- 19. NoSQL “Not-only SQL” Typically store nested objects, or possibly binary objects, by IDs or keys Note
- 20. Recap: Basic Concepts Knowledge is typically represented as concepts or classes, which can be generally thought
- 21. Let’s Work on Data Modeling, Given a Real Dataset! 1. Extracted data from LinkedIn ~3M people,
- 23. Parsing Even Not-So-Big Data Is Painfully Slow!
- 24. Can We Do Better? Maybe save the data in a way that doesn’t require parsing of
- 25. MongoDB NoSQL DBMS Lets Us Store + Fetch Hierarchical Data client = MongoClient('mongodb+srv://cis545:[email protected]/test?retryWrites=true&w=majority') linkedin_db = client['linkedin']
- 26. Data in MongoDB
- 27. Finding Things, in a Dataframe vs in MongoDB def find_skills_in_list(skill): for post in list_for_comparison: if 'skills'
- 28. How Do We Convert Hierarchical Data to Dataframes? Hierarchical data doesn’t work well for visualization or
- 29. The Basic Idea: Nesting Becomes Links (“Key/Foreign Key”) people experience
- 30. Reassembling through (Outer) Joins pd.read_sql_query("select _id, \'[\' + group_concat(org) + \']\'" +\ " from people left
- 31. Views Sometimes we use a query enough that we want to give its results a name,
- 32. Occasional Considerations: Access and Consistency Sometimes we may need to allow for failures and “undo”… We
- 34. Скачать презентацию