Hom o : A Principled Approach Jared Vasil University of Florida Undergraduate Senior Thesis, Department of Psychology Spring 2018
2 Contents Acknowledgements Abstract Main text 1.0. Introduction 1.1. Shared intentionality 1.2. The present thesis 2.0. Variational biology 2.1. Generative models and the Markov blanket formalism 2.2. Variational free energy, surprisal, and attunement 2.3. Active inference, expected precision, salience, and affordances 2.4 A multiscale ontology for the biosphere: Variational Neuroethology 3.0. Variational Sociogenesis 3.1. Function 3.2. Phylogeny 3.3. Ontogeny and mechanism 4.0. Summary and conclusion Bibliography
3 Acknowledgements This thesis has been a long time coming. Over my undergraduate years I undoubtedly just as numerous highly motivated undergraduates before and after me have come up with my own pet theories and takes on the evidence, and this thesis brings some of my (hope fully, more promising) thoughts together under one (variational) roof. Though I have, for better or worse, developed the account reported herein entirely on my own in my free time from class, I am exceedingly indebted to several folks who've helped me, eac h in their own unique way, to get certain foundational ideas straight, as well as to present them (and my extension of them) in a clear manner. I would be remiss not to begin by profusely thanking Dr. M. J. Farrar of the University of Florida Department of Psychology. I cannot begin to describe my thankfulness for the encouraging comments, discussion, and general help throughout my undergraduate years. This is not something that will ever be forgotten thank you. Also, I would like to thank my good friend Mr. M. J. D. Ramstead of the McGill University Departments of Philosophy and Social and Transcultural Psychiatry for his useful and encouraging feedback and discussion over the preceding months. Mr. Ramstead has been instrumental in helping me to get clear on aspects of the variational approach (not an easy task, for those interested). This thesis would almost certainly not have materialized without these two folk's comments, discussions, and help along the way. Moreover, I would like to thank Dr. Samuel Ve issiÂre of the McGill University Department of Social and Transcultural Psychiatry, as well as Jonathon St. Onge and Auguste Nahas, also at McGill University, for their comments and discussion over the preceding months. Lastly, I would like to thank the th ree readers of the present thesis, namely, Dr. M. J. Farrar; Dr. Robert D'Amico of the University of Florida Department of Philosophy; and Dr. Andreas Keil of the University of Florida Department of Psychology. Thank you all and I hope this thesis meets yo ur respective expectations.
4 Abstract This thesis introduces a productive metatheoretical framework for empirical and philosophical investigation into the characteristic (phenotypical) dynamics of Homo called Variational Sociogenesis This account is based on the Free Energy Principle (FEP) and its recent extensions. The FEP is a principled information theoretic formulation of biological systems' existence. The principle states that organisms exist in virtue of minimizing their variat ional free energy. This means that organisms minimize their informational uncertainty in or, equivalently, maximize their (Bayesian) generative model evidence for the hidden causes of their sensory dynamics. Variational free energy is minimized through action and perception, at all spatiotemporal scales, in a self evidencing process known as active inference.' Variational Sociogenesis applies the variational formulation to empirical and theoretical work investigating human uniqueness. Specifically, Var iational Sociogenesis provides a novel metatheoretical framework for investigating shared intentionality as it manifests in Homo by accounting for uniquely human forms of communication and collaboration through a variational lens. The most comprehensive em pirical account of shared intentionality in humans is the Shared Intentionality Hypothesis (SIH). The SIH posits that Homo is biologically predisposed (motivated) to cooperatively' share mental states with conspecifics, and has evolved special ized skills to do so. Variational Sociogenesis casts the predisposed motivation to share suggested by the SIH as an expectation for statistical attunement to conspecifics. Specifically, i t is suggested that human action and perception is (phenotypically) geared toward s maximizing the likelihood that sens ory evidence characteristic of sharing is experienced I fill out the multiscale implications of this proposal by replying to Tinbergen's four research questions. Variational Sociogenesis is explanatorily flush with the emerging variational (pragmatic) paradigm in cognitive science.
5 1.0. Introduction "Thinking would seem to be a solitary activity. And so it is for other animal species. But for humans, thinking is like a jazz musician improvising novel riff in the privacy of his own room. It is a solitary activity all right, but on an instrument made by others for that general purpose, after years of playing with it and learning from other practitioners, in a musical genre with a rich history of le gendary riffs, for an imagined audience of jazz aficionados. Human thinking is individual improvisation enmeshed in a sociocultural matrix," Tomasello (2014, p. 1). The present thesis synthesizes a range of evidence to introduce Variational Sociogenesis .' This is an account of the phenotypical dynamics of Homo in terms of a spatiotemporally deep information geometry constrained by the known laws of physics (Sengupta et al., 2016). The philosophical and, more recently, empirical (Tomasello et al., 2005) notion of shared intentionality' that is, the idea that mental states can be shared by conspecifics (e.g., Gilbert, 1989; Tuomela, 2013; Bratman, 1992) is presupposed by Variational Sociogenesis as fundamental and definitively basic to Homo The dynamics characteristic of sharing can be usefully cast into a r ecursive (hierarchical) organization encompassing a nonlinearly scaled spatiotemporal grain of analysis (Ramstead et al., 2017; 2018). Thus, in line with recent work (e.g., Badcock, 2012; Badcock et al., 2017), the explanatory framework provisioned by Vari ational Sociogenesis is pitched at the four spatiotemporally nested scales of function, phylogeny, ontogeny, and mechanism (Tinbergen, 1963). The formal core of Variational Sociogenesis is founded in recent work towards a variational formulation of mind a nd biological existence (Friston, 2010; 2012a; 2013a; 2015; Badcock, 2012; Friston and Stephan, 2007; Bruineberg and Rietveld, 2014; Kirchhoff and Froese, 2017; Ramstead et al., 2017; 2018; under review; Constant et al., 2018; Carhart Harris et al., 2014; Sengupta et al., 2016). Specifically, Variational Sociogenesis is an extension of the Free Energy Principle (FEP) of Friston (e.g., 2010; 2013a) to account for shared intentionality in Homo The FEP has witnessed several recent additions (e.g., those cited above) which, taken together, have been suggested to comprise a variational biology' (Ramstead et al., 2018). The FEP is a principled information theoretic formulation of biological self organization. This approach to life and mind casts biological syste ms as embodied hierarchical generative models of their sensory dynamics (Friston, 2012a; 2013a). Biotic systems are definitively and solely in the game of maximizing the likelihood of experiencing their (Bayesian) statistical expectations, embodied in thei r respective phenotypes and shaped by the nested processes of evolution, learning, and perception (Ramstead et al., 2017; e.g., Badcock et al., 2017). The variational approach thus provisions a physics of sentient systems (Sengupta and Friston, 2017; Sengu pta et al., 2016), the central premise being that in virtue of minimizing the quantity variational free energy' biological systems are such. Minimization of variational free energy entails that this class of system attune' their phenotype (their dynamics ) to the dynamics of their niche (Bruineberg et al., 2016; Constant et al., 2018). Dynamical minimization of variational free energy through situated action and perception increases the likelihood of biotic systems' continued existence (Sengupta et al., 20 16). Therefore, through considering the geometry of information flow in biological systems, the free energy formulation provides a principled account of how statistically bounded (biotic) systems minimize their thermodynamic entropy (Friston, 2012a; 2013a) Variational Sociogenesis extends these first principles of biotic self organization to provide a novel, productive account of the dynamics characteristic of Homo
6 1.1 Shared intentionality Much theoretical and empirical work in the cognitive sciences has demonstrated the uniquely human centrality of sociality and lived, situated, and embodied interaction and connectedness throughout the lifespan (Reddy, 2003; Over, 2015; Feldman, 2017; Tomasello, 2014a; Baumeister and Leary, 1995; Frith and Frith, 2010 ; Hari et al., 2015; Jensen et al., 2014; Carhart Harris et al., 2018). Variational Sociogenesis takes the philosophical notion of we intentionality' (Tuomela, 2013), jointness,' or sharing' in intentional action (e.g., Gilbert, 1989; Bratman, 1992) as picking out the basic phenomenon of uniquely human experience (Searle, 1995; 2010). Shared intentionality is the interactive sharing of mental states with others such that each other's mental states are mutually known, together ; that is, within common grou nd (Tomasello and Carpenter, 2007). Common ground constrains interactants' intentions and goals and is present in every instance of sharing (Clark, 1996). Several authors (e.g., Clark, 1996; GardenfÂšrs, 2014; Tomasello, 2014b) have suggested several (gener ally) overlapping ontologies for common ground. Variational Sociogenesis borrows from Tomasello (2014b) and Garrod and Pickering (2004). The former author suggests common ground be dichotomized into a culturally shared type (i.e., norms, conventions) and a personally shared type (i.e., individuals' histories of interaction with specific others; cf. Clark, 1996). This is proffered as a theoretical ontology for what is shared with whom (Bohn and KÂšymen, 2017), though it is important to note that the content o f each respective type develops over the nested timescales of ontogeny and (cultural) phylogeny, respectively (Tomasello, 2014b; Richerson and Boyd, 2005). This observation necessitates that the online build up of common ground characteristic of interactio n be accounted for (Garrod and Pickering, 2004), for instance, as occurs in interactive alignment (Menenti et al., 2012). However, clear conceptual distinctions of common ground are typically only heuristically useful abstractions, as phenomena at each s patiotemporal scale enter into and constrain the others (Falk and Bassett, 2017; Han and Ma, 2015). For example, during a communicative exchange, (fast changing) online relevance inferences (Sperber and Wilson, 1986) drawn about the referential intention o f one's interlocutor are understandable only within cultural common ground, that is, within the (slow changing) conventional meaning and illocutionary force of the utterance (Tomasello et al., 2007a; e.g., Liebal et al., 2013). If the interlocutors have a shared history of interaction, however, the relevance inferences drawn will be constrained further by their (middle changing') personal common ground built up across interactions throughout ontogeny (e.g., Liszkowski et al., 2007). A spatiotemporally nest ed common ground increases the likelihood of smooth, fluid exchanges with others by making others predictable, a key aspect of effective joint actions (Sebanz et al., 2006; 2009). This is intuitive: moving a couch (Vesper et al., 2017), playing a team spor t (AraÂœjo and Davids, 2016), or doing literally any other activity with another (in the sense of its being shared) is neither efficient nor effective if either individuals' actions are unpredictable (Vesper et al., 2010). The most comprehensive empirical account of sharing as a fundamental human uniqueness is the Shared Intentionality Hypothesis (SIH) and related theory of Tomasello (2008; 2009; 2014b) and Tomasello et al. (2005; 2007a; 2012). The SIH suggests that humans are characterized by a biologicall y predisposed motivation to engage in acts of shared intentionality (Tomasello et al., 2005), and, moreover, exercise phylogenetically unique social skills to do so (Herrmann et al.,
7 2007). Thus, citing Tomasello et al. (2005), Carpenter (2011) suggests th at shared intentionality encompasses "the skills and motivation to share goals, intentions, and other psychological states with others" (p. 107; emphasis mine). A recently proposed transformative' reading of the SIH (Kern and Moll, 2017) provides support for the present claim that the SIH supplies the most comprehensive empirical account of sharing. The transformative reading of shared intentionality construes shared intentionality as constitutive of a radically novel form of life for the species that poss esses it (cf. Wittgenstein, 1955), rather than merely a novel mechanism' or module employed in specific circumstances (see discussion in Tomasello, 2014b). This outlook suggests that, in species engaging in shared intentionality, there should be, e.g., an empirically discernable causal influence on the functioning of even traditionally nonsocial phenomena attributable to the (shared) form of life instantiated by the species  (Kern and Moll, 2017; see also Rietveld and Kiverstein, 2014). Evidence for thi s claim is provisioned in the present thesis. Moreover, both the variational formulation and the SIH find certain theoretical bases in cybernetic thought (Ashby, 1962). This is quite opportune, as it signals that both accounts make use of the same basic c ontrol systems' scheme for directed or intentional' behavior (Tschacher and Haken, 2007). Hence, leveraging the insights of the SIH is attractive in the present context. A "dual level cognitive structure of simultaneous jointness and individuality" (Toma sello, 2014b, p. 48) is suggested to arise within and across joint exchanges in ontogeny (Moll and Tomasello, 2008; see also Moll and Meltzoff, 2011). This is a nested representation of individuals' intentions, goals, and attention such that individuals po ssess a "bird's eye view" (Tomasello et al., 2005, p. 681), that is, a simultaneous view from above' from a shared we' perspective with mutually coordinated goals underlain by meshed intentions and attention (Bratman, 1992); and from below' from the point of view of each individual's perspective on the same, shared locus. The dual level representational format is argued to be basic to all manifestations of shared intentionality in humans and, hence, is intricately bound up with the notion of common g round (Tomasello, 2009; 2014b). Indeed, both the dual level structure and common ground have been proposed to follow a two step' trajectory (Tomasello, 2014b; Tomasello et al., 2012). In the first step, it is suggested that the dual level representation a nd common ground are restricted to interactants' personal histories within joint frames and, moreover, that such a representational structure manifests itself only within joint frames (Tomasello, 2014b). These developments precede a second step where, in p hylogeny, cultural evolutionary processes accumulate various patterned practices' (Roepstorff, 2013) such as the communicative conventions and collective institutions that shape the space of the (rational) cultural common ground for the individuals making up a culture (Mercier and Sperber, 2012; Rand and Nowak, 2013). The representations that guide interaction among unknown, anonymous individuals then stay within the (expected) space of what any reasonable person' does, says, thinks, and so on (Tomasello, 2014b). This likely manifests in ontogeny as a gradual extension of the dual level format to encompass everything the individual does, both inside and outside joint frames. An individual's action and thought comes to be structured by this extra layer' of cultural common ground (Rakoczy and Schmidt, 2013). In sum, dual level cognition constrains action and thought throughout ontogeny. Diverse experiences of interacting and coordinating with various others' models of the world allows one to tune in' to the culture's underlying common ground (Moll and Tomasello, 2008; Moll and Meltzoff, 2011).
8 It is suggested that Variational Sociogenesis largely agrees with usage based' approaches to studying human psychology and its development (though not exclusively; see below). This rather general class of theory stresses the importance of domain general learning mechanisms in shaping human cognition, such as pattern recognition, categorization, and schematization. For instance, usage based construals of human communi cation stress the importance of domain general schematization in extracting communicative form meaning pairings (constructions) in ontogeny (Tomasello, 2003; Lieven, 2016). In phylogeny, constructions are altered during interactions between idiolects (Beck ner et al., 2009), resulting in a dynamic set of constructions comprising a communicative system (Dingemanse et al., 2015; Kidd et al., 2017). Thus, the cultural ratchet' unique to human (cumulative) cultural evolution makes likely that, e.g., an interact ively useful set of referential devices is developed and maintained across generations (Tennie et al., 2009). Moreover, usage based approaches have been extended to mental state inference ( Liszkowski 2013; see also Moll and Tomasello, 2008). This is notab le, as communication and mental state inference are reciprocally impacted by the nested, interwoven trajectories of common ground and the dual level structure (Malle, 2002; Corballis, 2017; Tomasello, 2008; 2014b). Individuals communicate to coordinate per spectives on the same, shared locus (Tomasello et al., 2007a), which has the effect of enhancing the predictability of one's own or the other's attention, thoughts, or actions (Vesper et al., 2010; TylÂŽn et al., 2010). Mature interlocutors are adept at tun ing in to uncertainty in the exchange to guide their usage of communication (Pickering and Garrod, 2014), and this sensitivity to uncertainty within joint settings develops early in life (Tomasello et al., 2007a; KÂšymen et al., 2017). Indeed, cooperative c ommunication is the primary substrate for the creation of the (uncertainty minimizing) common knowledge between individuals that constrains their interactions (Clark, 1996; Tomasello, 2014b). It is thus sensible that such a receptivity to uncertainty withi n joint exchanges appears to develop so early in life (reviewed in Bohn and KÂšymen, 2017). Interestingly, this conception of communication as a tool to manage uncertainty in individual perspectives has conceptual overlap with proposals that social interaction may be, in at least certain cases, constitutive' of social cognition (De Jaegher et al., 2010; see also Di Paolo and De Jaegher, 2012). Through situated, interactive communication, interactants gain a sort of privileged access' to others' men tal states since both are aligned to each other, that is, share largely overlapping dual level representations of the joint frame (Pezzulo, 2011; Carpenter and Liebal, 2011). Interactants dynamically co regulate the predictability of themselves and others through communication at the grain deemed necessary to align each's model of the joint frame to the other (Fogel and Garvey, 2007; Vesper et al., 2010). For instance, Duguid et al. (2014) provide evidence that children modulate the grain of their communica tion in accord with the level of uncertainty in a joint task, thereby allowing them to set up and coordinate individual representations within a shared frame (see also Wyman et al., 2012). Indeed, a large literature suggests that, in ontogeny, the increasi ngly adept usage of communication to gain such privileged access to others' mental states manifests in predictable ways, namely, in increasing the success and flexibility of joint actions (e.g., Ashley and Tomasello, 1998; Brownell et al., 2006). The impor tance of the development of each phenomenon to the other can hardly be overstressed (Brownell, 2011; Tomasello and Hamman, 2012), and, indeed, there even exists much evidence suggesting a strong correlation between the development of offline' (non interac tive) mental state inference and communicative abilities (Miller, 2006; Milligan et al., 2007).
9 1.2. The present thesis Variational Sociogenesis unpacks shared intentionality as it characteristically manifests in Homo This is done through a discussion of how the shared form of life minimizes its free energy at each of the reciprocally interconnected, nested scales of function (biological adaptation), phylogeny (cultural evolution), ontogeny (development), and mechanism (real time). In brief, a function le vel explanation of sharing in Homo suggests that selection pressures favored (and favor) biological structure and function characteristically effective at encoding precise expectations of attending to (sensory data phenotypically produced by) conspecifics (cf. Tomasello et al., 2005). Heuristically, this means that the (sensory) dynamics of conspecifics are highly salient and hence afford attending to for humans. Attention implicates attunement' to the (attended) sensory data (Bruineberg et al., 2016), a t echnical term from the variational literature indicating an embodied statistical recapitulation of experienced sensory dynamics (see below). At the scale of phylogeny, individual level prior predictions of conspecific attunement entail nonlinearities (cf. novel' dynamics) at the spatiotemporal scale of the cultural group per se (Smaldino, 2014; Shuai and Gong, 2014), for instance, the path dependent dynamics of a communicative system across generations (see below). Dynamics at the scale of ontogeny is char acterized by a protracted process of attuning to one's sensorium. In the human niche, this entails that the dynamics (characteristic) of others and hence the dynamics (characteristic) of a culture per se (Searle, 1995; 2010; Ramstead et al., 2016) are statistically recapitulated in the individual (Falk and Bassett, 2017; Kidd et al., 2017). These considerations inform a mechanistic scale of explanation. Here, recent work (Friston and Frith, 2015a,b; Pezzulo et al., 2013; Grau Moya et al., 2013; Friston et al., 2015a; Perfors et al., 2011 a ) suggests that the variational approach can account for the characteristic behavioral and neural dynamics of cooperative (joint, model attuning; Grice, 1975) communicative action, where linguistic communication is merel y the signal case (Tomasello, 2014b). The degree of free energy minimizing attunement between conspecifics tracks the emergence of a single, shared model of sensory dynamics (Friston and Frith, 2015a,b; Bolt and Loehr, 2017). Crucially, it is quite temptin g to associate these dynamics with the dual level structure of the SIH, as key properties of each phenomenon are highly reminiscent of properties of the other (see section 4.0). Variational Sociogenesis thus accounts for individuals' reciprocal entrainment into shared (generative) dynamics instantiated by various types of low dimensional patterned couplings between interactants (De Jaegher and Di Paolo, 2007; Riley et al., 2011). This treatment is thus in line with proposals for a more ecologically valid interactionist' or, more generally, pragmatic' consideration of cognitive scientific phenomena and their spatiotemporally nested trajectories (Gallotti and Frith, 2013; Schilbach e t al., 2013; Dumas et al., 2014 ; De Jaegher et al., 2010; see Engels et al., 2015). Indeed, the notion of attunement that is central to the variational approach is a fundamentally multi body concept, though one that can be recursively decomposed to study whatever level of analysis one's research question requires (Ramstead et al., 2017; 2018). Finally, it is crucial to note that though Variational Sociogenesis extensively leverages insights from empirical work into shared intentionality, the present explication is largely restricted to the relationship between the motivation a nd skills for sharing and uniquely human communication and collaboration. This is to the exclusion of intricately related work investigating, e.g., the development of uniquely human forms of altruism and
10 egalitarianism in children (Warneken, 2016; McAuliff e et al., 2017), phenomena quite directly implicated by the SIH (see Tomasello, 2009). With the above considerations in place, the structure of this thesis is as follows: Next, a non mathematical introduction to the variational approach is provided. Third Variational Sociogenesis is outlined. This is a spatiotemporally nested (multiscale) account of the human phenotype, explicated through several brief reviews of empirical work investigating how shared intentionality manifests itself at each scale of anal ysis. (Due to space, a fully comprehensive account at each scale clearly cannot be provided.) Fourth, a conclusive summary is given alongside suggested future directions. 2. Variational biology 2.1 Generative models and the Markov blanket formalism Th e variational formulation of biotic self organization is founded in the Free Energy Principle (Friston, 2010; 2012a; 2013a; 2015), an information theoretic account of the dynamics characteristic of biological systems. This principled formulation seeks to e xplain how biological systems maintain their form while embedded within a dynamic environment, that is, how they maintain their precarious nonequilibrium steady state by actively avoiding thermodynamic erosion (Friston and Stephan, 2007). The formulation s uggests that biological systems appear to instantiate a hierarchical generative model of their niche (Friston, 2008; 2010). Organisms are solely in the business of minimizing the information theoretic quantity variational free energy,' a measure fundamental to the generative model they embody (Friston, 2012a; 2013a). Under simplifying assumptions, variational free energy is equivalent to the Baye sian model evidence for an embodied model of sensory dynamics (Hinton, 2007). A generative model recapitulates (learns) the underlying statistical structure (hidden causes) of training (sensory) data so as to generate that data itself. In short, generativ e models mirror the statistical structure of their sensory data, thereby allowing them to generate predictions of sensory data with the same statistical structure (see Pickering and Clark, 2014). Generating sensory data that recapitulates experienced senso ry data implicates a contextualization' of the dynamics of lower layers of the generative hierarchy with learned data. Learning in a generative model occurs in functionally hierarchical, stacked' layers of increasing abstraction and spatiotemporal depth (Perfors et al., 2011 a ; Friston et al., 2017a). Deep' spatiotemporality in the dynamics of a generative model means that high layers exhibit slow dynamics that constrain the fast dynamics unfolding at lower layers. Low level predictive dynamics cycle thro ugh several iterations within a single predictive cycle at high levels . Heuristically, this means that slow changing higher layers simply require greater amounts of unexplained sensory data to alter their dynamics than do fast changing lower layers (Ki ebel et al., 2008; 2009). Crucially, however, there exist mechanisms that enable quick learning of higher, contextualizing dynamics, chiefly the expected precision of sensory data (Feldman and Friston, 2010; see also Perfors et al., 2011 a ). Learned or emp irical' prior predictions are thus optimized via bottom up sensory input an error signal to impact on high level predictions (Friston and Feldman, 2010; Friston et al., 2014a). Moreover, high level predictions oftentimes comprise full' priors with dyn amics directly modulated by natural selection (Friston, 2013b; Campbell, 2016). Interestingly, under
11 simplifying assumptions bottom up error signals in the variational approach are formally equivalent to prediction error' in predictive coding (Clark, 2013 ; den Ouden et al., 2012). The variational approach thus furnishes a principled basis for Bayesian brain' accounts of global brain function (Friston, 2010; 2012b). Bayesian model (natural) selection is thus a corollary of the variational approach (Campbel l, 2016; Ramstead et al., 2017) and can be used to computationally model natural selection (cf. Bruineberg and Hesp 2017). The probability density function encoding the predictions of the generative model is known as a variational density' (Friston and Stephan, 2007). A variational density is an approximating' probability density insofar as its sufficient statistics (in our case, its mean and variance) are optimized through exposure to sensory data. Optimized' here means that prior predictions of the d ensity approximate posterior expectations over the statistical structure of sensory data Subsequent prior predictions encoded in the variational density thus increasingly (statistically) recapitulate sensory data (note that this does not entail any sort of representational accuracy' in the model; see discussion in Bruineberg, 2017). When error signals are minimized, the variational density is formally attuned' to the generating (sensory) density, i.e., the organism's dynamics are attuned to external dynam ics (Bruineberg et al., 2016). As Clark (2013) writes, "The free energy principle...states that all the quantities that can change; i.e., that are part of the system, will change to minimize free energy (Friston and Stephan, 2007, p. 427). Notice that, thus formulated, this is a claim about all elements of systemic organization (from gross morphology to the entire organization of the brain) and not just about cortical information processing [e.g., Friston, 2005]," (pp. 6 7). What formally counts as "p art of the [generative] system" (Clark, 2013, p. 6), that is, as part of the organism, is defined by the Markov blanket formalism (Pearl, 1988; Friston, 2013a). A Markov blanket statistically bounds a set of nodes (states) within some fluctuating energy ma p (environment; Figure 1.). Specifically, the blanket divides a random chaotic dynamical system into a set of internal' states embedded within a set of external' states (Friston, 2012a; Clark, 2017). The blanket itself is divided into sensory' and acti ve' states, which are associated with sensory epithelia and motor effectors, respectively (see Friston, 2013a). The organism per se is comprised of the union of sensory, active, and internal states; and the environment is comprised of the set of external s tates. Information flow through the blanket is constrained by conditional dependencies between pairs of states (see Figure 1). Specifically, external states send information to (i.e., influence) sensory states; sensory states influence internal and active states; internal states influence active states; and active states influence sensory states and external states (Friston, 2012a; 2013a; see Figure 1). Since internal states can only see' through the Markov blanket, they have indirect access to the dynamic s of external states (Hohwy, 2013 ). The dynamics of external states are thus technically hidden' behind the Markov blanket (Clark, 2017; Gallagher and Allen, 2016). Therefore, internal states (encoding the variational density) must approximate the dynamic s of external states through the dynamics of sensory states (Friston, 2012a; 2013a). As described below, internal states' recapitulation of the dynamics of external states means that the generative model (organism) is a control system for (active) inferenc e (Friston, 2012a), that is, a control system for active evidence gathering for its embodied model of its niche (Friston et al., 2017b; cf. Tomasello et al., 2005). This insight makes clear how free energy minimization extends the Good Regulator Theorem of cybernetics,
12 -------------------------------------------------------------------------------------------------------------------Figure 1. -------------------------------------------------------------------------------------------------------------------which states that "every good regulator of a system must [entail] a model of that system," (Conant and Ashby, 1970, p. 89; for instance, Pe zzulo et al., 2015a ). 2.2 Variational free energy, surprisal, and attunement Information theoretic variational free energy and its minimization is the single most important quantity and phenomenon (respectively) to consider in the variational approach. Free energy upper bounds the (information theoretic) entropy of the Markov blanket (see Lemma 1 of Friston, 2012a). The entropy of the blanket is equivalent to the long term average of surprisal.' Surprisal is also known as self information' and is the n egative log probability of experiencing some sensory state. Relatively surprisal ing' (Clark, 2013) states are relatively unexpected and therefore undesirable. Minimizing surprisal is imperative to the organism's continued existence (Friston 2012a; 2013a) Although surprisal is an intractable quantity and hence is not used by to the organism (Friston, 2010; Bruineberg et al., 2016), because free energy upper bounds entropy, the proximal imperative to minimize surprisal becomes the distal imperative to mini mize the entropy of sensory states, that is, to minimize variational free energy (on average and over time). This relation is clearest in one of the three (Friston, 2010) formal definitions of free energy, namely, the sum of surprisal and the Kullback Leib ler (KL) divergence between the internal and external states (Friston, 2010). By minimizing variational free energy, the organism implicitly minimizes surprisal by minimizing the KL divergence that is, by embodying the statistics over external states. Br uineberg and Rietveld (2014) and Bruineberg et al. (2016) describe this dynamic process of minimizing the KL divergence as formally attuning' the internal and
13 external states to one another. Technically, attunement is a product of the process of generali zed synchronization' between coupled chaotic dynamical systems (Friston, 2013a; see Bruineberg et al., 2016). Minimizing the KL divergence through generalized synchronization leads to an increase in the mutual information of the states of two coupled syste ms, that is, to attunement of the two systems ( see Hasson and Frith, 2016 ). Hence, prediction of the dynamics of one system given the dynamics of another can be quite accurate, as each literally embodies the (statistics describing) the other. Minimizing th e KL divergence (i.e., maximizing the mutual information) between the statistics of internal and external states means that the variational approach subsumes KL or infomax formulations of control (Friston et al., 2015b; see Butko and Movellan, 2010 ). Mutu al information is thus the quantity that formally defines attunement and is, by definition, irreducible to any one system ( cf. Fogel and Garvey, 2007 ). Since minimizing the KL divergence entails minimizing free energy (Bruineberg et al., 2016), free energy is a measure of the graded dis attunedness' of two informationally coupled systems (Bruineberg and Rietveld, 2014). The second definition of free energy is the complexity minus the accuracy of the generative model  (Friston, 2010). While the organis m is in the game of optimizing the statistics of its variational density so as to maximize the accuracy of the generative model's output, it must also optimize (minimize) the complexity of the parameters of the model encoding those predictions (e.g., synap tic connections and weights). This is intuitive: complex models are thermodynamically costlier to maintain and hence undesirable (Sengupta et al., 2013). We can heuristically cast this in terms of Occam's razor and the simplicity of explanations or hypothe ses for sensory input (Gregory, 1980; Friston et al., 2012a). The simplest explanation possessing a high posterior likelihood tends to be the most useful (adaptive) one, that is, the one that minimizes variational free energy (see Hohwy, 2013). Moreover, t his definition provides a principled grounding for observed model optimizing phenomena (Friston, 2010) such, as, e.g., Hebbian learning (Hebb, 1949) and synaptic renormalizati on (Wenger et al., 2017 ). The definition of free energy introduced in this paragr aph proves quite useful in section 3.2, where the ratcheted or iterative' optimization of human communicative systems across phylogeny is discussed (Tamariz and Kirby, 2016). In attuning its dynamics through action and perception, biological systems converge towards a low entropy subset of attracting (phenotypical) states (Friston and Stephan, 2007). Sensory dynamics consistent with the set of attracting states are (statistically) expected; and data outside the attracting set are unexpected, hence pos sessing a high free energy and motivating active avoidance (Sengupta et al., 2016). For instance, humans maintain an internal body temperature of roughly 98 degrees Fahrenheit, and, when outside this set range, interoceptive allostatic mechanisms (e.g., fe vers, striated muscle contractions, immune system mobilization, etc.) make likely that our bodies regain the expected, (statistically) unsurprising sensory experience (S eth, 2013; Pezzulo et al., 2015a ). The frequent state visitation and re visitation nece ssitated by minimizing free energy is described as ergodic.' Ergodicity roughly means that the average value of a random variable over a short time span closely approximates its value when observed over a long time span. This means that organisms tend to maintain their dynamics around homeostatic set points or zones, and engage in allostatic behavior to efficiently regain homeostasis when experiencing sensory dynamics outside this expected range (Sengupta et al., 2016; e.g., Kleckner et al., 2017). Ergodic ity is maintained through self organized intentional
14 behavior (Tschacher and Haken, 2007), that is, through free energy minimizing action policies (see below). The characteristic dynamics for a given biological system entailed by its maintaining a low entr opy over its experienced sensory states is its phenotype (Friston, 2012a; Ramstead et al., 2017). The phenotype is thus an embodied, dynamical, and transient instantiation of (Bayesian) expectations (selected for through dynamics perched at nested spatiote mporal scales; Badcock, 2012; Ramstead et al., 2017). 2.3 Active inference, expected precision, salience, and affordances As noted above, since biological systems are embedded within a fluctuating environment, these systems must act (manipulate their ac tive states) to maintain their characteristic probability distribution over (sensory) states their respective phenotypes. In lieu of this, biological systems risk deleterious phase transitions, the most extreme case being death (i.e., a return to equilib rium with their environment; Friston and Stephan, 2007). Thus, biotic systems must predict the future states of their active states such that predictions of the dynamics of active states minimize free energy. Since active states influence sensory states, p redictions of the dynamics of active states minimize the expected free energy of both active and sensory states. Action and perception work together to minimize free energy in a process called active inference' (Adams et al., 2013; Friston et al., 2015b; 2017b). Thus, in active inference, predictions of active states induce a discrepancy (prediction error) in experienced and expected active states that drive the dynamics of active states (Friston, 2011). Free energy is minimized by behaving in the expected way, that is, by acting (Friston, 2011). At an ontogenetic scale, this entails from the very beginning a generative model of (the sensory effects of) action policies (Kelso, 2016). Active inference entails that organisms preferably orient and attend to e xternal states possessing a high expected precision' (Feldman and Friston, 2010; Friston et al., 2012a). The expected precision of sensory data is the expected inverse variance, or expected negative entropy, of the possible causes (statistics) generating sensory data. Since entropy is upper bounded by free energy, maximizing the expected precision of attended sensory samples means that precise, low entropy (certain) sensory predictions guide attention and action (Friston et al., 2012a; 2015b; 2017b). Posse ssing a low expected free energy is thus (heuristically) equivalent to possessing a high expected precision. The expected free energy of sense data is that data's saliency,' which is defined as the expected free energy of active and sensory states were an action policy selected that causes the system to orient and attend to that sensory data (Parr and Friston, 2017). What this means is that certain (counterfactual) active states are expected to minimize variational free energy by varying amounts. The set o f sequential active states which has the most precise (free energy minimizing) expectations associated with it is, in control systems terminology, the action policy' or intention' that guides the system (Friston et al., 2015b; 2017b; cf. Tomasello et al. 2005). The dynamics of attention in active inference are thereby quasi normative': data with the highest expected precision should be attended to, given that the organism minimizes its free energy (Kirchhoff and Froese, 2017). Attention per se is sugges ted by the variational formulation to be a mechanism allowing for the singling out' precise sense data for gain enhancement (Feldman and Friston, 2010; Sengupta et al., 2016), which means that attended to sense data are leveraged to provide precision wei ghted' updates to high levels of the generative model (e.g., Moran et al., 2013). Attention directed towards precise stimuli is thus a process of neuromodulatory gating' or gain control' of prediction error (Yu and Dayan, 2005; Park and
15 Friston, 2013), l ikely implemented by midbrain dopaminergic and cholinergic efferents to the cortex (Friston et al., 2014a; 2014b). These processes balance the influence of embodied, top down prior expectations relative to bottom up error signals (Clark, 2013; Powers III e t al., 2016). In instances where the precision of low level sensory states is expected to be high relative to high level predictions encoded in internal states, sensory data ascends the generative hierarchy to impact high level predictions (Yildiz et al., 2013). In their variational treatment of affordances , Bruineberg and Rietveld (2014) situate action perception cycles within landscapes' and fields' of affordances (see also Rietveld and Kiverstein, 2014). Ramstead et al. (2016) define a landscape of affordances' as "the total ensemble of available affordances for a population in a given environment [and] corresponds to what evolutionary theoristscall a niche'," (p. 3; see Bruineberg and Rietveld, 2014). We can usefully construe affordances shape d through niche construction as constituting a precise set of third order prior expectations offloaded' onto the environment (Constant et al., 2018). Specifically, in addition to the first order priors encoded in cortex and second order priors encoded in midbrain sensory predictions and the expected precision of those predictions, respectively (Friston et al., 2014a) affordances constitute third order' prior expectations offloaded by the organism onto the niche itself (see also Ramstead et al., under review). A landscape of precise affordances is a corollary of active inference (Constant et al., 2018): action minimizes free energy, which entails that organisms shape their respective niche in such a way that it discloses itself at any given time as a pr ecise, low entropy field of affordances' (Bruineberg and Rietveld, 2014). Affordances themselves are cast as expected free energy gradients that intentions or action policies dissipate (Bruineberg and Rietveld, 2014; Tschacher and Haken, 2007). Indeed, th e exact same self organizing active inference dynamics are at work in chemical signaling gradients in cellular morphogenesis (Friston et al., 2015a) as are at work in the human niche (Ramstead et al., 2016; 2017). Affordances characterized by the most prec ise (strongest) gradient are solicitations' affordances that solicit the self organization of intentions to dissipate the gradient (Bruineberg and Rietveld, 2014; see also Cisek, 2007). Formulating intentions to effectively dissipate affordance gradient s means that the biological system formally exhibits a gradient descent on, or stepwise minimization of, free energy (Friston, 2012a). A gradient descent on free energy is instantiated by cyclic evidence gathering transients rolling cycles of action per ception' that continually dissipate and reshape the field of affordances (Ramstead et al., 2016). The particular affordances that are soliciting at a given time is a function of the complex, path dependent history of the organism in conjunction with its fo rm of life and immediate needs, disposition, and bodily states (Bruineberg and Rietveld, 2014; for instance, Pezzulo et al., 2015a ). Clearly, the landscape of affordances for Homo is unique, as it not only instantiates a species general form of (shared) l ife (Kern and Moll, 2017; Tomasello et al., 2005), but also numerous, cumulative, culture specific forms of life ( Winch, 1964; Richerson and Boyd, 2005 ). Ramstead et al. (2016) suggest that cultures constitute local free energy basins local ontologies' that can be formally parameterized in terms of the dynamics of phenomena at that scale of analysis  (see also Ramstead et al., 2017; 2018). As suggested below, furnished by this is what we may call a (formal, statistical) cultural attractor exhibiting c haracteristic (culture specific) dynamics and towards which the members comprising that ontology tend towards (e.g., Beckner et al., 2009). Within this context, the general functional role of numerous phenotypical traits unique to
16 Homo e.g., so called cultural learning ( Tomasello, 2016 ) that manifests in species unique traits like joint attention (Carpenter et al., 1998), teaching (Burdett et al., 2017), collaborative problem solving (Vygotsky, 1978), and engagement in ritual (Watson Jones and Legare, 2016) is rather straightforward. Each phenomenon functions to attune individual level dynamics to the cultural attractor (VeissiÂre, 2017; Ramstead et al., 2016). Since the local ontology is constituted solely by others' expectations (S earle, 1995; Ramstead et al., 2016), attuning to the cultural attractor is (largely ) just attuning to others' expectations of how being in time and space should unfold. This suggestion entails that dynamics at the local level shape and constrain the pa th dependent trajectory taken by a given cultural attractor (see below). 2.4 A multiscale ontology for the biosphere: Variational Neuroethology Variational Neuroethology (VNE) extends the variational approach to provision a spatiotemporally deep metathe oretical ontology for the biosphere, as well as a research heuristic for the biological sciences (Ramstead et al., 2017; 2018). VNE casts Markov blankets as fractally recursive, a nesting property following from attunement of the internal states of Markov blankets at a given scale (Ramstead et al., 2017; 2018). The product of this is so called deep evolution, or the genesis of novel levels or units of selectable structure (Watson and SzathmÂ‡ry, 2016; see also SzathmÂ‡ry, 2015). Indeed, though such structur e mechanistically instantiates itself in the human niche through a combination of imitative and innovative copying tendencies (Le gare and Nielsen, 2015 ) that le ad to cumulative cultural evolution (Creanza et a l., 2017 ) and consequent cultural group selecti on (Richerson and Boyd, 2005), the variational approach suggests th at the emergence of selectable hierarchy in biological self organization is a rather ubiquitous phenomenon  (see Corominas Murtra et al., 2013). For instance, hierarchically organized st ructure and dynamics are instantiated by, e.g., the nervous system (Huntenburg et al., 2018; Buckner and Krienen, 2013); the evolution of taxa (Bejan and Lorente, 2011); action (Csibra, 2008; Botvinick, 2008); language (Everaert et al., 2015); mental state inference (Koster Hale and Saxe, 2013; Chambon et al., 2017); and group scale organization (Anderson and Brown, 2010). Importantly, it is stressed that though hierarchical free energy minimization is the defining feature of biological existence (Friston 2012a; 2013a; Ramstead et al., 2017 the phenomena minimizing free energy, namely, the structure and dynamics of specific biological phenomena are exceptionally variable within and between levels of analysis (Allen and Friston, 2016; Sengupta et al., 20 16). By attuning states, what were the random fluctuations of the (generative) dynamics at the level above now constitute the dynamics of prediction a higher, contextualizing level of Markov blanket organization (Ramstead et al., 2017). The biosphere is characterized by a (restricted) power scaling law as one ascends spatiotemporal scales (Avnir et al., 1998). Thus, exactly as with internal dynamics, external dynamics tend to be characterized by several cycles of predictive dynamics (at the scale of, e.g ., ontogeny) constituting a single cycle at a higher level (at the scale of, e.g., phylogeny ). This entails that the same generative dynamics employed in attuning organisms and non biological aspects of the niche (Constant et al., 2018) is leveraged to attune dynamics between organisms (Friston and Frith, 2015a,b; Dindo et al., 2014), namely, the reciprocal interplay of the active states of one system and the sensory states of another (Fogel and Garvey, 2007; Renzi et al., 2017). The importance of this for Variational Sociogenesis
17 cannot be overstated: Variational Sociogenesis suggests how the sim ple expectation of attuning to others gives rise to the functional hierarchy and dynamics phenotypical of Homo 3.0 Variational Sociogenesis "We happen to observe behavior more readily than survival, and that is why we start at what really is an arbitrary point in the flow of events. If we would agree to take survival as the starting point of our inquiry, our problem would just be that of causation; we would ask: "How does the animal an unstable, improbable' system manage to survive?" Tinberg en, (1963, p. 418). This section presents Variational Sociogenesis (Figure 2) This is a variational treatment of the characteristic dynamics of humans through an investigation of shared intentionality as it manifests in humans This account is organized by Tinbergen's (1963) four research questions and their associated spatiotemporal domain (Badcock, 2012; Ramstead et al., 2017), though cross scale interactions are discussed to highlight the circular dynamics characterizing each scale of analysis within the variational approach (Friston, 2013a; Sengupta and Friston, 2017). Indeed, this explication presents phenomena at the spatiotemporal scale of ontogeny and mechanism within one subsection for explanatory clarity. Variational Sociogenesis assumes the sam e restricted power scaling law typical of the variational formulation (Ramstead et al., 2017). Explanation at the scale of function accounts for dynamics characteristic of biological evolution; at the scale of phylogeny, cultural evolution; at the scale of ontogeny, development; and at the scale of mechanism, online or real time' dynamics. 3.1. Function This section presents a variational approach to evidence at the explanatory scale of function suggesting that Homo underwent some degree of biological evo lution through natural selection for a predisposition to engage in shared intentionality (Tomasello et al., 2005). Importantly, when discussing neural dynamics, the brain's mentalizing network (Koster Hale and Saxe, 2013; Mahy et al., 2014) is the primary focus of discussion; and therein primarily paralimbic structures such as (anterior) cingulate cortex (ACC) This is intended merely to aid explanation: cingulate cortex (and much less simply its anterior portions) are quite clearly not assumed to be exhaus tive of the relevant neural regions and dynamics for the topics in this thesis (e.g., Grossmann, 2015; Mundy and Newell, 2007 ). For the purposes of the present section, it is simply stressed that the neurobiological architecture put forward here has as evi dence theoretical (e.g., Friston et al., 2012b; 2014a) and experimental (e.g., Moran et al., 2013; 2014) treatments in the variational approach. Specifically, Friston (2013b) suggests that areas of paralimbic cortex occupy the highest levels of the generat ive hierarchy. There is much evidence suggesting that, more specifically, cingulate cortex is a high level hub' node of the brain (Bullmore and Sporns, 2009), with widespread efferents throughout cortex ( van den Heuvel and Sporns, 2013 ) and subcortical ar eas (Etkin et al., 2011; Allman et al., 2001). The dynamics of paralimbic regions are suggested by Friston (2013b) to encode full priors' high level precise prior expectations selected for in evolution or learned very early in ontogeny (oftentimes makin g them appear innate'; see discussions of overhypotheses' in Kemp et al., 2007; Goodman et al., 2009). Generally speaking, the dynamics of cingulate cortex are thus thought to play a contextualizing' role in (the neurodynamics underlying) human cognitio n and action ( van den Heuvel and Sporns, 2013; Carhart Harris et al., 2014). We can illustrate this by considering the neural dynamics
18 characteristic of the declarative (sharing) pointing gesture (Tomasello, 2008). In adults, it was recently found that dorsal ACC activity uniquely positively correlates with cooperative pointing for conspecifics (Brunetti et al., 2014; notably, see also Haroush and Williams, 2015). Interestingly, ACC may integrate predictions in both self' and other' frames of reference to influence the decision to cooperatively point (Apps et al., 2016; see also Lavin et al., 2013). This suggests that pointing may be a cooperatively motivated option' an extended sequence of behavior that aids in learning (Holroyd and Yeung, 2012) and which is selected on the basis of its (expected) free energy minimizing efficacy (Friston et al., 2015b). Indeed, infant pointing has been demonstrated to be associated with enhanced learning of (intended) object labels (Begus et al., 2014). Thus, conspeci fics who are (expected to) produce precise sensory data characteristic of sharing (i.e., who embody precise affordance gradients for sharing) are selectively attended to (Begus et al., 2016; Marno et al., 2016), and, indeed, may selectively inhibit or main tain pointing behavior (cf. joint action) across interactions ( Liszkowski et al., 2004 ; see section 3.3). Interestingly, this neurobiological architecture and its putative place in the generative hierarchy are potentially informative regarding the emotional re activity' or self domestication' hypothesis of Hare and Tomasello (2005). The authors hypothesize that early selection in Homo favored evolutionarily novel prior expectations in (para)limbic regions, likely in response to altered feeding ecologies (niche dynamics). Selection on these regions is hypothesized to have favored models with dynamics encoding increased trust, tolerance, and reduced aggressiveness towards conspecifics in feeding contexts (Hare and Wrangham, 2017). It is suggested by Variational So ciogenesis that selection for novel high level paralimbic dynamics consequently "set the tone" (Friston, 2013b, p. 41; i.e., the precision) for internal dynamics contextualized by backwards efferents from, e.g., cingulate cortex. Top down (anterior) cingul ate dynamics been found to exert context sensitive control over action policy selection (e.g., Brunetti et al., 2014; see Holroyd and Yeung, 2012 ) and the deployment of the m entalizing network (Chambon et al., 2017 ). A prediction of this account is that em otion and temperament should relate to behavioral measures of theory of mind in ontogeny. Indeed, Wellman et al. (2011) recently describe novel evidence that suggests "temperament aids theory of mind achievement within human development," (Wellman et al., 2011, p. 324). Specifically, the authors found in 3 to 5 year olds that certain temperament traits a lack of aggression, shy withdrawn personality type, and social perceptual sensitivity correlated significantly with theory of mind achievement two yea rs later (see also Lane et al., 2013). Moreover, other behavioral evidence suggests that basic temperamental traits and motivations constrain and guide the human phenotype from remarkably early in life (Reddy, 2003; Over, 2015; Jensen et al., 2014). This, too, is suggestive of deep phylogenetic roots for basic human motivations when in the context of others (Over, 2015; Hare and Wrangham, 2017). When sensory evidence inconsistent with these evolutionarily deep, high level priors is not experienced, we shoul d expect to see robust (precise) forms of allostatic control (see also section 3.3). An example here is the increased fidelity in the copying behavior of children in response to experiencing ostracism (O ver and Carpenter, 2008 ). Children appear to selectiv ely enhance the precision afforded to conspecific produced sensory data in response to ostracizing cues so as to re align or re attune their dynamics with the other (Over, 2015). Moreover, other forms of social sensory disattunement, such as that suggeste d by Badcock et al. (2017), likely drive allostatic behavior. In this instance, Badcock et al. (2017) suggest that the increased sensory precision attributed to conspecific produced sense data characteristic of individuals with depression is a response to a perceived inability to adaptively leverage prior
19 predictions in social contexts (see also Carhart Harris et al., 2014 ). Since attributing high precision to external states requires the attenuation of internal state precision (Brown et al., 2013; Yildiz e t al., 2013), depressed individuals forego action. This suggests that depression is a maladaptive manifestation of the (typically) adaptive gathering of sensory evidence to alter one's model with respect to sensory data produced by conspecifics (Badcock et al., 2017). The temperamental evolution discussed above has been suggested (Tomasello, 2014b; Tomasello et al., 2012) to provide something of a cooperative base for subsequent selection favoring the unique social skills' characteristic of Homo (e.g., He rrmann et al., 2007), To this end, several authors have proposed (e.g., Skyrms, 2001; McLoone and Smead, 2014; Tomasello et al., 2012) that the game theoretic Stag Hunt is useful for modeling and empirically examining selection pressures subsequent to thos e discussed above (i.e., Hare and Tomasello, 2005). The Stag Hunt is a game theoretic scenario characterized by two evolutionarily stable payoff structures, namely, cooperation or defection (Skyrms, 2001). Two individuals have the choice of individually de fecting to pursue a low payoff, low risk, hare' or cooperating to capture a high payoff, high risk stag.' Crucially, the stag can only be captured with the other, and the other's mental (motivational) state is unknown. Moreover, if both commit to hunting stag, but one individual defects during the hunt, then it is certain that neither receives the stag nor the hare (Skyrms, 2001). Obtaining the high risk, high reward stag is consequently a problem of effectively managing ambiguity in the goals and intenti ons underlying the other's actions so as to successfully coordinate with them (Tomasello, 2014b). It is suggested by Tomasello (2014b) and Tomasello et al. (2012) that humans have evolved unique skills (underlain by the motivation to employ those skills co operatively) to solve this problem (see Duguid et al., 2014). Specifically, these authors suggest that humans cooperatively communicative to jointly mesh' or coordinate "to some hierarchical depth" (Tomasello et al., 2005, p. 680) individuals' respect ive intentions and goals outlined in the introduction  (see also Pezzulo, 2011). Below is discussed evidence suggesting the dynamics characteristic of individuals during stag hunt scenarios (e.g., Duguid et al., 2014) are predicted (and computationally modeled) by Variational Sociogenesis. Due to reciprocal (cooperative) coupling with conspecifics throughout ontogeny (Moll and Tomasello, 2008), Variational Sociogenesis suggests that certain embodied aspects of the generative of conspecifics (and oneself ) should be selected for in evolution to minimize one's own (and others') free energy. Interestingly, Tomasello et al. (2007b) have offered the cooperative eye hypothesis' (see also Grossmann, 2017). The authors propose that the uniquely human, large whit e sclera surrounding the eye's iris was favored in evolution, likely to facilitate gaze following and intention inference in early (and later) Homo Moreover, in addition to selection on morphological characteristics of the generative model, numerous autho rs have posited model selection favoring various cognitive biases and stances for the social world (e.g., Csibra and Gergely, 2011; Richerson and Boyd, 2005 ; Sperber et al., 2010; see Tomasello et al., 2012). There is no obvious reason to suggest that othe r such priors could not be captured by the present framework (see Dindo et al., 2014, for one interesting example), though space limits consideration of them here. It is worthwhile to note that Variational Sociogenesis suggests that at least certain propos als (e.g., Haun and Over, 2013; Marno et al., 2016) may be better considered by placing larger emphasis on the circular interaction of dynamics at nested scales (see below).
20 In sum, Variational Sociogenesis proposes that model selection in Homo favored (a nd favors) models that characteristically encoded precise expectations of sensory data phenotypically produced by conspecifics . Put another way, selection pressures favoring models characteristically effective at minimizing free energy, in the present setting, entails selection for models (characteristically effective at encoding) dynamics that minimize sensory uncertainty with respect to conspecifics. Thus, evidence gathering cycles of active inference directed towards conspecifics (i.e., entering into and maintaining shared representational frames) tends to be highly salient (motivating) for humans. It is suggested here that this just is what cooperative communication is, namely, intentionally produced cycles of active evidence gathering with respect t o conspecifics (see also Tomasello 2008; 2014b). Indeed: this, regardless of the type (prelinguistic or linguistic), degree of surface complexity (e.g., pointing or syntax), or physical form (e.g., spoken or written) that communication takes (Clark, 2006; Fusaroli et al., 2014). Cooperative communication is a (contextually) salient action that is intended to attune increase the mutual information of the internal states of interactants, hence minimizing their (individual and shared) free energies (Fristo n and Frith, 2015a,b; Pezzulo et al., 2013). A consequence of this is a feedforward process such that entering into and maintaining subsequent joint frames becomes an increasingly precise affordance, that is, becomes increasingly soliciting since mutually known prior predictions (common ground) can be leveraged to make likely predictable sensory exchanges . Moreover, Variational Sociogenesis suggests that uniquely human forms of cooperative communication and collaboration emerge alongside precise expect ations for engaging in shared intentionality (Tomasello, 2014b; see below). This is because just as individuals attune to the generative model underlying the nonsocial world by acting in it (Friston et al., 2015b; 2017b) individuals attune to the gener ative model(s) underlying the social world by acting with it (Friston and Frith, 2015a,b; Moll and Tomasello, 2008). These considerations at the scale of function enable a principled account of the phenotypical dynamics of humans at the nested scales below 3.2 Phylogeny In this subsection evidence is presented suggesting that the complex dynamics characteristic of human communicative systems per se (Beckner et al., 2009) optimize a variational free energy bound over sensory evidence. Specifically, since one definition of free energy is complexity minus accuracy (Friston, 2010), evidence is presented that communicative systems (i) tend to minimize their complexity with respect to accuracy. Then, evidence is presented that (ii) communicative systems appear to maximize their accuracy with respect to complexity. Finally, these dynamics are (iii) concretely illustrated with an example. In this subsection, the blanket term communicative system' or some obvious derivative is used in place of language,' linguis tic' and so on to stress adaptive dynamics in the cultural evolution of human communication With this in mind, we associate the complexity of a communicative system with its informational (Kolmogorov) complexity (cf. Seoane and SolÂŽ, 2018). Formally, this allows us to formulate a solution to the Kolmogorov forward (Fokker Planck) equation, hence furnishing a free energy minimizing ergodic distribution over sense data (Friston, 2012a; 2013a). This means that as a communicative system grows in internal inter connectivity between nodes that is, as either (i) more speakers are added to a communicative system (Lupyan and Dale, 2010; Fay and Ellison, 2013); or as (ii) a static population increases in internal connectivity (Reali et al., 2018) dynamics such as chaos (Sanders et al., 2018), critical slowing (Gandhi et al., 1998), and
21 parameter reduction (Riley et al., 2011) should nonlinearly manifest (Shuai and Gong, 2014). Heuristically, an ergodic distribution over the sensory states of a communicative system per se suggests an intergenerational decoupling' of the high level dynamics of communicative systems from the low level (fast) dynamics that constitute them (see Shuai and Gong, 2014). Intuitively, complexity minimization tends to increase communicative s ystems' learnability (Kirby et al., 2015), consequently facilitating their intergenerational transmission. Iterated learning' studies suggest communicative systems minimize their complexity. Iterated learning is the learning of information from another, who in turn learned that information from another, and so on (Scott Phillips and Kirby, 2010). This is often investigated within the context of transmission' or diffusion chain' paradigms (reviewed in Tamariz and Kirby, 2016; Kirby et al., 2014), which seek to experimentally approximate the ratcheted phenomena characteristic of the scale of human cultural evolution (Tennie et al., 2009). Individuals are placed within transmission chains' where information e.g., how to use a tool (Flynn and Whiten, 200 8) or some piece of communication (Tamariz and Kirby, 2016) is initially provided to one end of the chain. The information is communicated from one link to the next and the evolution of the variable is examined across the chain. For example, Smith and Wo nnacott (2010) leverage this paradigm to provide evidence that intergenerational transmission of communication tends the system towards a minimization of its complexity. Specifically, the authors provided an initial subject with communicative data containi ng two randomly placed (nominalized) plural markers. Each individual marker thus possessed maximal complexity in its distribution across nominal forms, i.e., the two plural markers were equally likely to attach to a given noun. It was found that only in ch ains of learners and not when presented to an individual learner did each plural marker minimize its distributional complexity (by becoming grammaticalized to specific nominal forms). This highlights the intergenerational (nonlinear) amplification of i ndividuals' (weak) inductive biases typical of diffusion chains (Reali and Griffiths, 2009; Tamariz et al., 2014). Moreover, concluded by the authors was that their findings provided evidence of complexity minimization in a communicative system. However, t he authors did not investigate accuracy (defined below) and, indeed, did not situate transmission episodes within a joint task. This latter point is important: degenerate communicative systems possessing low complexity but low accuracy have been shown to e volve in the laboratory in the absence of transmission situated within some joint task (e.g., Kirby et al., 2008). Thus, Kirby et al. (2015) provide computational and experimental evidence that by implementing transmission within a joint task (cf. by mod ulating the motivation for transmitting information; Tomasello et al., 2005) communicative systems optimize both their learnability (complexity) as well as their expressivity (accuracy). The above findings, as well as much other work, are reviewed in Tam ariz and Kirby (2016) and Kirby et al. (2014). Both reviews suggest that macroscopic organization in human communication systems can be understood as a ratcheted, dynamic product of relatively weak (complexity minimizing; Friston, 2010) biases in individua l learners. Though space limits me from further considerations, these results are in line with the suggestions of Variational Sociogenesis. The active state dynamics characteristic of human communication is tightly bound with the set of communicative constructions encoded by those systems (Goldberg, 2003; Tomasello, 2003). Since constructions form the most basic aspect of the (cultural) common ground of interlocutors (Searle, 1995), in the remainder of the present thesis cultural common ground' is heuristically
22 associated with the set of constructions embodied in a cooperative communication system (cf. Tomasello, 2014b). Cultural common ground is thus the set of affordance gradients the high level' communicative system affords for low level' individuals attuned to its dynamics and is hence a third order prior unique to Homo (Constant et al, 2018). Specifically, (the active state dynamics prod ucing) constructions are just affordance gradients whose precision varies by context and at nested spatiotemporal scales. Cultural common ground just is the dynamical action driving expectations of a (culturally shared) sensorium (VeissiÂre, 2017; Ramstead et al., 2016). These considerations furnish the suggestion that human communicative systems should tend towards a maximization of their communicative accuracy at the spatiotemporal scale of cultural evolution. Specifically, accuracy maximization suggests an increase in a communicative system's expressivity (Tamariz and Kirby, 2016), that is, the likelihood that it enables attuned low level systems (speakers) to comprehensibly talk about the things in their world. Clearly, one could have an arbitrarily accu rate communicative system, but this would likely decrease the learnability of the communicative system owing to a lack of underlying statistical regularities structuring the sensory input (Kirby et al., 2015). Thus, Variational Sociogenesis suggests that a ccuracy is maximized with respect to complexity (Friston, 2010). Evidence suggests that communicative systems maximize their accuracy (with respect to complexity). Since arbitrariness greatly increases the design space of a communicative system (Hockett, 1960; Seoane and SolÂŽ, 2018), the expressivity of a communicative system can be increased through encoding arbitrary constructions (Dingemanse et al., 2015). Variational Sociogenesis thus suggests an increase in the arbitrariness of the dynamics of communi cative systems. Indeed, other authors have already posited a communicative "drift to the arbitrary" (Tomasello, 2008, p. 219) in the set of constructions comprising human communicative systems. Specifically, it has been suggested by several authors that co mmunication in humans began with cooperatively motivated pointing gestures (Corballis, 2017; Tomasello, 2008; Fay et al., 2013 ; Tomasello et al., 2012). Pointing as the primary means of cooperatively communicating with others may have been facilitated by i conic gestures such as pantomime and other iconic gesturing (Tomasello, 2008; Perniss and Vigliocco, 2014) that appeared simultaneously, or very nearly so, with the pointing gesture (see Bohn et al., 2016). In line with Variational Sociogenesis, the subseq uent trajectory of communication in Homo is suggested by theory, computation, and experiment to have traced a path from iconic constructions through to in many (perhaps most) instances fully arbitrary conventions (reviewed in Tomasello, 2008; 2014b; Ta mariz and Kirby, 2016; Perniss and Vigliocco, 2014; Dingemanse et al., 2015). Immediately, however, two important nuances should be stressed given this very general trajectory. Firstly, the exact degree of arbitrariness varies within and across communicati ve systems as a function of several factors (see Dingemanse et al., 2015). Thus, arbitrariness, iconicity, and systematicity (regularities within a given communicative system) likely co exist with one another within, e.g., the vocabulary structure of a com municative system (Dingemanse et al., 2015). Variational Sociogenesis suggests that the dynamics of even relatively fine grained phenomena such as word structure exist in virtue of their optimizing the likelihood of reciprocal attunement between low and hi gh level dynamics (recall Clark's, 2013, definition above). This means that word structure is dynamically optimized across phylogeny to optimize its use by adults and its uptake by children (Tamariz and Kirby, 2016; Dingemanse et al., 2015). Secondly, as d iscussed above it appears integral to the cultural evolution of (shared) accuracy in communicative systems that communication be situated within some shared problem solving situation (e.g., a Stag Hunt;
23 Santos et al., 2011). Indeed, Variational Sociogenesi s suggests just this, as individuals' communicative (active state) dynamics are modulated by (contextualized) affordance gradients (Friston et al., 2015b; 2017b). The above review motivates a novel take on what is perhaps one of the most noteworthy featur es of human communicative systems: recursive or hierarchical structure (Everaert et al., 2015). Though the variational formalism itself already implies pervasive hierarchy in biological systems (Ramstead et al., 2017), we can synthesize independent (corrob oratory) results with the suggestions of the variational approach to strengthen the present proposal. Specifically, it is noted that free energy minimization maximizes the thermodynamic work of biological systems (Sengupta et al., 2013; 2016). We thus link Variational Sociogenesis with the insights of Bejan and Lorente (2011), who show that hierarchical structure in flow (i.e., biotic or abiotic) systems is the thermodynamically most efficient means to access the energetic currents that feed their growth (c f. Ramstead et al., 2017; 2018). A maximization of the work of biological systems manifest in their hierarchical growth thus suggests a minimization of their free energy . Group scale phenomena such as hierarchically structured constructions synerg istically self organize via repeated (fast) informational couplings occurring at the ontogenetic scale (Beckner et al., 2009; Riley et al., 2011). Cumulative growth in the constructions comprising communicative systems likely facilitated (and was facilitat ed by) collaborative joint actions in ontogeny (Tomasello, 2014b; see also Angus and Newton, 2015). This is because cooperative communicative systems both imply and enable precise (niche constructed and biologically selected) expectations for sharing menta l states (Ramstead et al., 2016; Constant et al., 2018). Indeed, this circularity is the main crux of Variational Sociogenesis (Sengupta and Friston, 2017): expectations of sharing generative models with conspecifics entail (expectations for) the cooperati ve communication that brings about (sensory evidence for) sharing in the first place (see section 3.3). Because sharing is a contextualizing full prior for Homo (see above), successful joint actions increase the precision of the posterior expectation over sensory dat a consistent with sharing This, in turn, drives salience mappings (i.e., the precision of affordance gradients) in subsequent contexts affo rding sharing with conspecifics, which in turn, drives the free energy minimizing dynamics characterizing communicative systems per se discussed above. Circularly, this is because intentional communication is the means by which individuals gather sensory evidence for a shared generative model (Clark, 1996; To masello, 2014b ). A cultural evolutionary dynamics is suggested by this, namely, that successive generations of Homo (or links in a transmission chain) may be characterized by step wise (nonlinear) increases in attunement at the local scale owing to increas ingly sophisticated (communicative) dynamics at the scale above (Angus and Newton, 2015; Shuai and Gong, 2014 ). The above c o nsiderations suggest two things. Specifically, Variational Sociogenesis suggests both idiosyncrasies and similarities across cultu ral Markov blankets. That is, there exists the distinct possibility of global free energy minima attractor dynamics characteristic of all cultural Markov blankets such as, e.g., some manifestation of hierarchical structure (Corominas Murtra et al., 2013) We can usefully characterize this dynamics as full priors over the sensory dynamics of communicative systems (see Griffiths and Kalish 2005; 2007). Nested within this dynamics is, nonetheless, an exceptionally large (but bounded) design space within whi ch fast dynamics may (idiosyncratically) evolve (Seoane and SolÂŽ, 2018). For instance, Perfors and Navarro (2014) leverage an iterated learning approach to experimentally demonstrate that the
24 contextually situated nature of human communication impacts the meanings encoded by a communicative system. Clearly, semantic change is implicated in the intergenerational increase in the accuracy of communicative systems highlighted above (Tamariz and Kirby, 2016). Hence, cultural common ground in the present settin g, phenomena such as, e.g., a group's concepts and categories (Gelman and Roberts, 2017) and the semantic shifts operating on those categories (Perc, 2012; see Youn et al., 2016) may thus reflect (relatively) fast idiosyncratic dynamics operating within a (relatively) slow global regime characterized by a tendency towards hierarchical organization. This suggests the possibility of an intrascale spatiotemporal partitioning in the cultural evolutionary dynamics of human communicative systems per se For ins tance, a graded distinction in the spatiotemporal scale characteristic of semantic shifts (Perc, 2012) and that characteristic of hierarchical growth (Tomasello, 2008). Indeed, such an intrascale decomposition is reminiscent of the deep (neural) dynamics o f an individual (see above). Indeed, this may be just to say that the (spatiotemporally deep) dynamics characteristic of communicative systems maximize their mutual information with the (spatiotemporally deep) brains from whi ch they find their origins (cf. Christiansen and Chater, 2008). In this way, cultural Markov blankets come to be characterized by their own (boundedly) idiosyncratic (cultural) attractor dynamics while nonetheless remaining learnable and usable at the spatiotemporal scales of ontogeny a nd mechanism (see below). In summary, these considerations are in line with Variational Sociogenesis: human communicative systems per se appear to minimize their free energy. Their evolution depends on being transmitted within joint, situated settings at the local scale. Induced by this is a deep, inherently circular causality in the characteristic dynamics of human communicative systems, and human culture more generally (Han and Ma, 2015; Falk and Bassett, 2017). These systems manifest as a novel unit of selectable organism characterized by its own cultural attractor (Markov blanket) dynamics (SzathmÂ‡ry, 2015; Ramstead et al., 2017). This blanket constitutes the contextualizing cultural common ground shared (constituted) by interactants (Tomasello, 2014b; Searle, 1995). Hence, the cultural blanket is (reciprocally) attuned to throughout ontogeny (Beckner et al., 2009; Kidd et al., 2017). In virtue of this the blanket constrains (shapes) interaction with others (predicted to be) attuned to that same blanket In this way, attunement to the cultural attractor tends to minimize the free energy of individuals attuned to those dynamics. Though global optima such as, e.g., hierarchical structure (Corominas Murtra et al., 2013) likely exist, the design space within which fine grained dynamics at this scale of analysis may evolve is nonetheless staggering in scope (Seoane and SolÂŽ, 2018). This allows for the evolution of (bounded) idiosyncrasies in the, e.g., the semantics encoded in communicative systems (Perfors an d Navarro, 2014; Youn et al., 2016). Finally, given that I've focused on information dynamics in the cultural niche, the considerations in this subsection may have wider generality. Indeed, the dynamics of the cultural Markov blanket formalizes at least in principle phenomena not presently considered explicitly, such as the norms and rationality of a culture (see also VeissiÂre, 2017 ; Ramstead et al., 2016 ). This claim of wider generality to cultural dynamics can be empirically examined through studies in cliodynamics' (Turchin, 2008). This is an emerging field which proposes to quantitatively study complex cultural (historical) phenomena and seeks to supplement existing work in the cultural evolutionary accounts of historical data (e.g., van Schaik and Carel, 2016). Studies in this field have already indicated similar bifurcating (hierarchical growth) dynamics to those suggested above, as in the evolution of the properties of popular music genres across decades (Mauch et al., 2015). Moreover, related
25 wo rk has indicated interesting coupled culture subculture dynamics (Bunce and McElreath, 2018), a fundamental aspect of the variational approach (Friston, 2013a; Sengupta and Friston, 2017). Indeed, Variational Sociogenesis may be particularly useful in stud ying these phenomena, as the framework provides an integrative account with dynamics at other scales of analysis (see also Hari et al., 2016). 3.3. Ontogeny and mechanism This section casts the ontogenetic and mechanistic scales of cooperative communicat ion and collaboration as (individual) active inference recursively situated within a (joint) coupled dynamical systems framework (cf. Tomasello, 2014b; Tomasello et al., 2005). This framework allows for productive consideration of the developmental traject ory of communicatively (informationally) co regulated couplings (Renzi et al., 2017). Co regulated couplings (joint interactions) "self organiz[e]in such a way that [individuals] temporarily lose their "individual" identities, thereby forming cooperative units, or coordinative structures, that have unique properties that transcend the individual components [e.g., Riley et al., 2011]," (Fogel and Garvey, 2007, p. 252). In joint frames, individuals attune to each other through cooperative communication (Gric e, 1975; Tomasello, 2014b; TylÂŽn et al., 2010). Attunement, formally defined by the mutual information between two interactants (see above), is a statistical notion. Hence, jointness' here is not an all or nothing' phenomenon, but is, rather, a graded m ore or less' (Pacherie, 2012; Bolt and Loehr, 2017). Through cooperative communication, individuals transiently lose their individuality to some degree, formally defined in such contexts by the mutual information characterizing the two individuals during a joint exchange (Hasson and Frith, 2016; Friston and Frith, 2015a,b). The key effect of interactive attunement is a greatly facilitated capacity for joint action (Tomasello, 2014b; Vesper et al., 2010), as individuals have jointly attuned their generative models of sensory data (Pezzulo, 2011), that is, their predictive models of cultural affordances (Ramstead et al., 2016; e.g., Bach et al., 2014; Maranesi et al., 2014). Indeed, human cooperative communication is phenotypically defined by its ability to en gender this higher level of organization (Tomasello, 2014b; SzathmÂ‡ry, 2015). Model averaging across diverse joint frames in ontogeny (Moll and Tomasello, 2008) Bayes optimally attunes individuals to their cultural embedding (Moran et al., 2014; Karmali et al., 2018). In the present context, model averaging entails individuals' attunement to the Markov blanket characterizing their communication system (Kidd et al., 2017). It is interesting to consider this as a developmental trajectory through a semantic c ontinuum' (GardenfÂšrs, 2014) that leads individuals to embody increasingly adept constructions and pragmatic co regulatory skills for attuning states some way down the hierarchy. The dynamical end point' of this continuum is attunement to one's wider comm unicatory (linguistic) embedding (Kidd et al., 2017), which allows for the skilled creation and maintenance of transient, task dependent (free energy minimizing) collaborative couplings with the entirety of one's cultural group (Tomasello, 2014b). With the above in place, I first more concisely outline the underlying variational (neuro)computational dynamics by discussing Variational Sociogenesis in relation to recent computational modeling of development (Tenenbaum et al., 2011; Perfors et al., 2011 a ). Thi s allows me to situate the following with some form of (computationally tractable, experimentally testable) common ground. Then, I sketch out an (admittedly) rough developmental trajectory, using existing developmental research to make my case chronologica lly across the lifespan.
26 As noted above, attention adaptively serves biotic self organization by enhancing the precision of sensory data (Feldman and Friston, 2010; Sengupta et al., 2016). Gain enhancement enables sense data to ascend cortical hierarchies to impact on high level neurodynamics that inform sensory predictions at lower layers of the hierarchy (e.g., Kanai et al., 2016). Since precision is the (inverse) variance of the causes underlying sensory data (Feldman and Friston, 2010), we can link up the variational formalism and, in particular, precision (see Friston, 2009) with work in mainstream developmental science employing the same (hierarchical Bayesian) computational architecture. Perfors et al. (2011 a ) and Tenenbaum et al. (2011) review t he potential utility for such models to computationally replicate empirical observations of a diverse array of social and nonsocial phenomena in ontogeny, in particular the early learning of abstract syntactic and semantic knowledge (e.g., Bannard et al., 2009; Perfors et al., 2011 b ; Thaker et al., 2017; see appendix A.2 in Perfors et al., 2011 a ). Moreover, work in the variational approach has shown promise for related models in replicating human performance in online recognition of speech input (Kiebel et al., 2008; 2009; Yildiz et al., 2013), as well as reading comprehension (Friston et al., 2017a) and conversational alignment (Friston and Frith, 2015a,b). In describing their work, Tenenbaum, Perfors, and colleagues note what they term the blessing of abs traction' (see Gershman, 2017), namely, the ability for hierarchical Bayesian (generative) models to, in some instances, learn high level (abstract) features of sensory data faster than low level (concrete) features. This is suggested to be due to (i) rela tively larger sample spaces for abstract hypotheses enabling a wider array of low level data to be relevant for creating these (abstract) hypotheses; and (ii) relatively fewer hypotheses as one ascends levels (hence abstraction; see Perfors et al., 2011 a ). An important neurocomputational observation here is that all backwards connections providing top down predictions are complemented by bottom up sensory error signals in the brain (Bastos et al., 2012) whose strength is modulated by attention (Feldman and Friston, 2010). This keeps predictions anchored to sensory data and accommodates (modulates) diverse forms of top down and bottom up learning  (Friston, 2010; Clark, 2013 ). These considerations allow us to supplement the considerations of existing deve lopmental work by Tenenbaum, Perfors, and colleagues, as well as related proposals for hierarchical predictive processing approaches to development (e.g., Fabry, 20 17 ; Joiner et al., 2017). Specifically, Variational Sociogenesis complements these accounts by suggesting a principled, neurobiologically plausible active inference (cf. dynamical systems; Smith and Thelen, 2003 ) framework for considering how human infants and children (and adults) act to optimize their (attention to) sensory input to optimize le arning of cultural affordances ( Byrge et al., 2014; Pezzulo et al., 2015b ). Importantly, I use optimize' here with technical reference to boundedly rational decision making schemes (Gershman et al., 2015), where actors leverage embodied computational mech anisms that are bounded by maturity and thermodynamic constraints to do (Bayes optimal) inference on sensory data (see Sengupta et al., 2013). Indeed, that (boundedly rational) Bayes optimal inference appears to be performed by biological systems is a coro llary of the variational approach (Friston, 2013 a ). Hence, infants and children, just as adults, are engaging in (boundedly) optimal action perception cycles given their (developing) embodied generative architecture ( Chiel and Beer, 1997 ). Variational So ciogenesis thus suggests a principled and "amazing set of necessary cognitive skills namely, the statistical learning of concrete and abstract auditory [cf. sensorimotor] patterns that are ready to be put to use in constructing the grammatical dimensio ns of language" (Tomasello, 2003, p. 30). While there is debate ( Reddy, 2003; Mundy and Newell,
27 2007) the best behavioral evidence currently available suggests that the expectation for sharing emerges in all cultures around nine to twelve months of age wi th the onset of the cooperative pointing gesture  (Lieven and Stoll, 2013; Tomasello et al., 2007a; Matthews et al., 2012). One key example of sensory evidence consistent with sharing is infants' own production, and noticing of adults', so called look backs' between the item of interest and their interactant (Tomasello et al., 2007a; cf. nonhuman great apes; Carpenter and Call, 2013). Variational Sociogenesis suggests that when sensory evidence inconsistent with this prediction is experienced, inf ants should engage allostatic feedback loops and exhibit subsequent learning when error signals continue despite allostatic control ( cf. section 3.1). Interestingly, Liszkowski et al. (2004) found that, with trials, 12 month old infants pointed more often for uncooperative adults who didn't provide look backs. Moreover, infants in this study pointed less on successive trials when repeatedly interacting with an uncooperative adult. When adults cooperated by providing look backs, however, Liszkowski et al. (2 004) found that these had to be accompanied by (attuned) emotive sharing between the infant and adult for the infant to appear satisfied, suggesting another important piece of sensory data which infants may use to gauge the sharedness of an interaction (se e also Reddy, 2003). These results suggest that infants were attempting to quickly error correct' via a fast increase in pointing behavior when experiencing sensory data inconsistent with their top down prediction for sharing. Infants then learned across trials which adults (don't) embody precise gradients for eliciting sensory evidence consistent with sharing, hence resulting in minimized action directed towards those adults. This suggests the presence of nested attunement dynamics operating at two spatio temporal scales (perceptual inference and learning) that mutually influence the other (Ramstead et al., 2017), something which future developmental work employing complementary methods (e.g., Siegler and Crowley, 1991) may look to explore in greater depth. Moreover, other sensory data likely play (age and context dependent) roles in individuals' phenomenal identification of sharing, for instance, the presence of various ostensive cues demonstrating communicative intent (Tomasello et al., 2007a), such as ey e contact or infant directed speech (Csibra, 2010); and spatiotemporally contingent dynamics (Levinson, 2016). Contingency (turn taking) is an ontogenetically early emerging (Gratier et al., 2015) feature of bidirectionally coupled dynamical systems (Frist on and Frith, 2015a,b) that appears to have old phylogenetic roots (Levinson, 2016). Butko and Movellan (2010) recently demonstrated that by using precise information per se as the value signal (i.e., infomax control) they could computationally replica te 10k month old infants' empirically observed behavior during interaction with a contingently acting robot ( Movellan and Watson 1987). Indeed, using similar infomax computational simulations of coupled dynamics, Friston and Frith (2015a,b) suggest that c ontingent behavior may just be an (optimal) novel phenomenon that emerges when informationally coupled generative models are motivated (i.e., expect to) maximize their information with respect to one another. It is pertinent to note that precision is rela ted, but by no means equal to familiarity (Friston et al., 2012c; Schwartenbeck et al., 2013). Hence, relatively imprecise sensory data that diminish shared sensory expectations (and hence shared action) are quite diverse in nature: they can be familiar o r unfamiliar, e.g., languages or accents ( reviewed in Cohen, 2012 ); incorrect adult informants (Harris and Corriveau, 2011; Harris et al., 2017); or different shirt colors (Dunham et al., 2011; McClung et al., 2017 ; see also footnote ten ). On the other han d, overly familiar sensory data can be dispreferred as well (Friston et al., 2012c), perhaps especially in early development (Kidd et al., 2012). This is because individuals (including infants) preferably dissipate affordance
28 gradients that are just right in their (model relative) complexity  (Friston et al., 2017c) leading to context sensitive attunement to niche dynamics across ontogeny  ( e.g., Frankenhuis et al., 2016 ). Since Variational Sociogenesis predicts an a priori high precision for sen sory evidence consistent with sharing, phenotypical shared frames beginning around 9 12 months should be associated with enhanced learning throughout the lifespan (Hasson et al., 2012). Importantly, however, joint contexts may be particularly important at certain stages of human life history (see Gopnik et al., 2017), particularly in early development (Tomasello et al., 2005). Indeed, Moll et al. (2007) found that engaging in joint interaction appears necessary for 14 month old infants to perceive some refe rent as within common ground (see also Moll and Meltzoff, 2011). Specifically, the authors found that shared interactions with an adult directed towards a set of toys and not simply third person observation of the adult with the toys was necessary for 14 month old infants to disambiguate the adult's request for a specific toy. Variational Sociogenesis suggests that this is due to the high precision of (gain enhanced) sensory data in such contexts. Moreover, since (expected) precision drives salience map pings and hence action, infant pointing should be associated with enhanced learning of, e.g., the label of an intended referent. Evidence for this comes from Begus et al. (2014), who found that 16 month old infants exhibit enhanced learning (fast mapping) of the labels provided by their caregivers to the objects they pointed to. Similarly, Lucca and Wilbourn (2016) found that 18 month olds (but, interestingly, not 12 month olds) fast mapped the labels provided by caregivers specifically when caregivers labe led the intended (expected, precise, shared) referent. Infants no longer fast mapped when adults labeled an incorrect, unintended object. For their part, caregivers in at least one culture preferably provide labels to the (intended) referent of infant poin ts over verbalizations (Wu and Gros Louis, 2015). These findings suggest the importance of triadic joint attention to the other and an outside entity for learning cultural affordances (common ground) such as form meaning pairings (Tomasello, 2003; Voulouma nos and Curtin, 2014; Renzi et al., 2017). The necessity of jointly coupled dynamics for learning may, nonetheless, alter with age (see Moll et al., 2007; Gopnik et al., 2017). This likely enables the learning of (shared) affordances on a more individualis tic (observational) basis later in ontogeny (Joiner et al., 2017). Though the limited repertoire of means for young infants to actively co regulate interactants (Carpenter, 2009) is clearly present in, e.g., Liszkowski et al. (2004), we can illustrate co regulatory dynamics more clearly by considering its ontogenetic trajectory. Brownell (2011) highlights that joint action is characterized by a movement from primarily asymmetric, adult guided exchanges early in the second year; to more symmetrically co re gulated exchanges with adults typically around late in the second year (possibly enabling children's interaction with peers early in the third year; Tomasello and Hamann, 2012). For instance, Aureli and Presaghi (2010) found that unilateral' mother infant dynamics prevailed early in the second year, with mothers actively directing and shaping their infant's attention and action towards some shared locus during at home play sessions (see Smith et al., 2018). Over the course of the second year, symmetric' i nteractions gradually became the predominant form of joint action, with mother and infant taking a more equal role in shaping each's action and attention. Moreover, those dyads which spent relatively more time in symmetrical interactions also spent relativ ely more time communicating linguistically to co regulate perspectives. These findings are captured by Variational Sociogenesis: the capacity for joint co regulation is suggested to develop gradually throughout ontogeny as individuals (gradually) attune th eir generative dynamics to the generative dynamics causing others' active dynamics. Adults, and in particular mothers (Hrdy,
29 2011 ; see also Feldman, 2015), actively shape the dynamics of the infant environment such as through unilateral' joint engagemen t (Aurelia and Presaghi, 2010) such that infants are efficiently pulled in' to their statistical embedding within and across interactions (Constant et al., 2018). In virtue of this, infants and children gradually embody these dynamics, enabling them to adeptly (jointly) predict the dynamics of their interaction partners with respect to the cultural affordances on offer (Bohn and KÂšymen, 2017; Rakoczy and Sc hmidt, 2013 ). For instance, Ashley and Tomasello (1998; relatedly, see Brownell et al., 2006) found an ontogenetic trend in preschool children's ability to coordinate with another to solve a joint task. Specifically, preschool dyads' ability to reverse complementary roles to complete a complex joint task correlated with their usage of pointed linguistic directives to help the other with their role within the task (though the authors note that simple motor development also likely played a role; see Meyer et al., 2010). Variational Sociogenesis suggests that the gradually developing ability to leverage apt pointed communication enabled children to successfully form and manage shared representations of the joint task (Pezzulo, 2011; Tomasello, 2014b). Attunement to the other enables the skilled deployment of communication to (co ) regulate the dynamics of t he other (Pezzulo and Dindo, 2013; TylÂŽn et al., 2010), since one literally embodies a statistical recapitulation of the other. Hence, implicit in the notion of attuned generative dynamics is a (gradually developing) ability for role reversal (Tomasello et al., 2005) and socially recursive' (Tomasello, 2014b) forward modeling of the, e.g., relevance and rationality of one's communicative message (Pickering and Garrod, 2014). For instance, Duguid et al. (2014) found that children's communication within a S tag Hunt scenario tracked the unpredictability of the other during the interaction. In particular, the authors found that children (but not chimpanzees; see also Bullinger, 2011) preferably formed joint couplings with each other regardless of the risk (unc ertainty) involved in capturing a stag. In particular, children were found to manage varying amounts of uncertainty by leveraging pointed, specific communication, as well as communicating more in general. While the communicative signals necessary for child ren to establish a joint coupling in Stag Hunt scenarios may be quite minimal (e.g., merely a smile and eye contact; Wyman et al., 2012), individuals' communicative dynamics are nonetheless quite flexible. Specifically, a notable feature of human communica tion is that it can be intentionally modulated as a function of (perceived) uncertainty so as to minimize that uncertainty (Sebanz et al., 2006; Vesper et al., 2010). First, consider Grau Moya et al. (2013), who leveraged computational and experimental wor k to demonstrate that human behavior in Stag Hunt scenarios is sensitive to the amount of information (certainty) they have about the motivation of a virtual player. Specifically, the authors found that individuals modified their cooperative behavior (to e ngage a stag together') in accord with their model certainty about the generative dynamics (motivation) causing the (predicted) behavior of the virtual partner. Though individuals' decisions whether or not to jointly pursue a stag with the virtual partner in Grau Moya et al. (2013) could be computationally replicated as minimizing the variational free energy of the individual, since the interactions involved a virtual partner the authors could not investigate whether individuals' communicative dynamics cou ld be modeled as minimizing free energy. Thus, in putting forth their joint action optimization framework,' Pezzulo et al. (2013) show that human communication within joint interactions minimizes the free energy of individuals (and hence dyads). Specifica lly, the authors showed that the decision to leverage communication and, if so, at what grain optimized a tradeoff between minimizing model complexity (cost of action) and maximizing model accuracy (certainty; see related
30 discussion in Rabinovich et al ., 2012). The authors concluded that this attunement dynamics "emerges naturally from the objective of optimizing a joint goal," (p. 9), and, indeed, Friston and Frith (2015a,b) leverage the variational approach to provide corroboratory results. The author s found that informationally coupling two generative models that are motivated to predict the other appears to entail reciprocal attunement of each's dynamics. Notably, this dynamics appeared to manifest phenomena reminiscent of interactive alignment in hu man communication ( Menenti et al., 2012 ). These results are captured by Variational Sociogenesis, which forefronts the centrality of tight attunement (a high mutual information) with conspecifics through cooperative communication. Interestingly, Hasson and Frith (2016) suggest the potentially wide utility of mutual information for quantifying the similarity of interbrain neural dynamics observed during neuroimaging studies of transient informational coupling. For instance, reliable transformations from a sp eaker's brain to a listener's brain (a high mutual information) has been shown to strongly predict the meaningfulness of the speech stream to the listener (Stephens et al., 2010; Liu et al., 2017; see also SchmÂŠlzle et al., 2015). Indeed, reliably co varyi ng inter brain dynamics during communication scales with the subsequent capacity for listeners to leverage that information to constrain their common ground with the speaker (Zadbood et al., 2017; see also Dikker et al., 2017). These results speak to relat ed empirical (Bolt and Loehr, 2017) and philosophical (Pacherie, 2012) findings suggesting that the phenomenal jointness' of a given interaction gradually scales with the predictability of the other within the joint setting (cf. Sebanz et al., 2009 ). The results reviewed in this subsection speak directly to the key claim of Variational Sociogenesis: the motivation to share manifests in action geared towards maximizing one's certainty that sensory data consistent with sharing is experienced. Hence, indi viduals act to minimize their own uncertainty with respect to (the model dynamics underlying) another's action through communicating (Tomasello, 2014b; Carpenter and Liebal, 2011). This greatly enhances the predictability of the other ( Liszkowski 2013; Ve sper et al., 2010), since both individuals possess attuned (joint) generative models of sensory data (Pezzulo, 2011; De Jaegher et al., 2010). Consequently, effective (free energy minimizing) joint action is greatly facilitated (Tomasello, 2014b). It is cr ucial to note that attunement to the shared world can only begin through diverse, recurring experiences of setting up and managing joint frames (Moll and Tomasello, 2008; Moll and Meltzoff, 2011). This shared base later enables individuals to learn a share d world through more observational forms of learning (Joiner et al., 2017), though learning in shared frames likely continues to be a key form of learning throughout life (Has son et al., 2012 ). Individuals embody increasingly attuned generative models to t hose structuring their cultural group, thus enabling increasingly sophisticated, effective joint co regulation with the entirety of one's cultural group (Bohn and KÂšymen, 2017; Rakoczy and Schmidt, 2013). Conceptually, it is quite tempting to consider this ontogenetic trajectory as constituting a semantic continuum' (GardenfÂšrs, 2014) that allows for increasingly privileged access to the mental states of interactants (De Jaegher et al., 2010). In any case, gradual attunement to the cultural Markov blanket allows individuals to leverage their own free energy minimizing, socially sculpted predictions to, e.g., meet a friend at culturally salient locations when no prior meeting point has been specified (Goldvicht Bacon and Diesendruck, 2016). Indeed, cultural Markov blankets themselves constitute precise, constructed designer environments landscapes of affordances that further aid in prediction of the dynamics of others (Clark, 2016; Constant et al., 2018). This suggests a basic circular causality in the dy namics occurring at an ontogenetic
31 -------------------------------------------------------------------------------------------------------------------Figure 2. Variational Sociogenesis (cf. Ramstead et al., 2017, Figure 2). -------------------------------------------------------------------------------------------------------------------scale of analysis (Witherington, 2007; Han and Ma, 2015). In particular, is interesting to speculate that reciprocal attunement to the Mark ov blanket characterizing one's communicative system influences humans' information processing abilities To be more specific, in virtue of attuning to the communicative system in early development individuals statistically recapitulate its network structu re and dynamics (Falk and Bassett, 2017; e.g., SchmÂŠlzle et al., 2017). This has been shown to manifest in statistically significant differences in neural dynamics between cultures (Han, 2015 ). However, it is possible that these surface differences reflect Bayes optimal model averaging across interactions in the lifespan (Moran et al., 2014; Karmali et al., 2018). This may influence humans neural dynamics in universal ways. For instance, coordinating with numerous others' models of the world (Moll and Toma sello, 2008) may influence individuals' development of slightly sub critical (ordered but flexible) information processing dynamics (Carhart Harris et al., 2014). 4.0. Summary and conclusion This thesis has provisioned a physics for the human phenotype b y considering the information dynamics of sharing as manifest in Homo Figure 2 provides a graphical summary of the spatiotemporally nested explanatory framework furnished by Variational Sociogenesis. The dynamics occurring at each respective scale of anal ysis circularly feeds into, and is fed by, each other level of analysis. At each respective scale, various phenomena (characteristically) manifest to minimize the free energy that each scale. For instance, we've considered how, in ontogeny, individuals rec iprocally attune to the Markov blanket constituted by their communicative system. They do this through embodying an increasingly (statistically) average model of their sensory dynamics. Consequently, this makes likely the predictable, free energy minimizin g relations
32 between self and other (constitutive of a we') that biological self organization in the human niche characteristically manifests. In addition to reviewing much work from a variational viewpoint, throughout this thesis I've explicitly noted ho w future work may look to expand on the present account through computational, experimental, theoretical, and philosophical investigations. Indeed, part of the intrinsic appeal of the variational approach is its (proposed) capacity to provide a unified the oretical framework for computation and experiment (Allen and Friston, 2016; Ramstead et al., 2018). This provides a scientific base which is likely to be bettered through a fruitful exchange with philosophical inquiry. An immediately obvious avenue for phi losophical inquiry is the relation between attunement nested (cultural) Markov blankets (Ramstead et al., 2017) and the dual level structure of human thought and action (Tomasello, 2014b). I shall merely suggest that rather stark, clear conceptual parallel s between the two exist. An earlier version of the present thesis had included the beginnings of a conceptual analysis of the dual level structure in relation to Variational Sociogenesis, however the empirical review quickly took precedent; the intended sc ope of the present thesis, I believe, required space for sufficient empirical considerations prior to philosophical reflection. Moreover, philosophical work may look to investigate perhaps more proximal (if no less important) themes in, for instance, polit ical philosophy and, in particular, the nature of propaganda (Chomsky, 1997; Stanley, 2015). Variational Sociogenesis provides a principled means for considering the information dynamics of highly connected hub nodes and their influence on network dynamics (relatedly, see Falk and Bassett, 2017). One last example of potentially fruitful philosophical inquiry can be found in considering whether shared intentionality has in fact evolved several times in the history of life on Earth (SzathmÂ‡ry, 2015). By this, I mean shared intentionality per se the phenomenon understood to (functionally) manifest in Homo A prior to attune generative models with conspecifics may simply be how novel units of selectable structure are instantiated (e.g., Friston et al., 2015a). Such an analysis may fit well with recent work defending a strong life mind continuity thesis (Kirchhoff and Froese, 2017; see also Kirchhoff, 2016). These are merely three (semi )randomly selected examples of what, as noted above, should be a fruitful, mu tually informative (and mutually constraining) exchange between science and philosophy. Taken together, the present thesis has submitted an admittedly ambitious project which future installments will help to flesh out. I have provided several brief review s of the empirical (and, particularly in section 1, philosophical) evidence at each scale of analysis, with the intended aim being to show that humans, as biotic systems, minimize their free energy. They do this by actively gathering sensory evidence for s haring. The best empirical evidence currently available (Tomasello et al., 2005) suggests that this active evidence gathering begins around nine to twelve months of age with the onset of the cooperative pointing gesture. This likely is the product of selec tion on high level dynamics in the generative model instantiating Homo (Friston, 2013b; Hare and Tomasello, 2005). Local dynamics in ontogeny self organize to produce novel phenomena constituting a higher level of Markov blanket organization. Numerous and diverse local interactions at the ontogenetic scale all but ensures (p henotypical) attunement to the cultural attractor dynamics one helps constitute. Indeed, attunement dynamics means that the whole of the spatiotemporally deep hierarchical scaling of human self organization as merely but one way in which biology, and, he nce, physics and the universe realizes itself is no more than an (exceptionally) improbable, ephemeral mirroring of the dynamics in which it is situated.
33 The whole thing, the whole of human self organization, culture, language, and so on is but a cheap s olution settled on by evolution that, at least for the most part and so far, seems to work out well enough. It all begins, Variational Sociogenesis suggests, with the simple expectation to attune dynamics with conspecifics.
34 Footnotes  "The defining feature of the additive account is the assumption that it is possible to characterize a living individual as engaging in activities that manifest collective intentionality regardless of whether this individual instantiates a collective fo rm of life [i.e., shared intentionality as merely a mechanism]. From this perspective, the question of whether an individual can engage in activities that manifest collective intentionality is considered neutral with respect to the question of what kind of life form this individual instantiates. By contrast, the transformative account takes the main lesson from Wittgenstein's Philosophical Investigations to be that we have to invoke humans' collective form of life to adequately understand any given human a ctivity, be it eating or calculating, walking, or talking [i.e., shared intentionality as an evolutionarily novel form of life]," (Kern and Moll, 2017, p. 324).  For the interested reader, in this footnote I briefly outline a rather technical, abstract introduction to the dynamics suggested to implement the hierarchical generative model specified by the variational approach (Friston, 2010). For the unfamiliar but interested reader, useful introductions to the various themes discussed in this footnote can be found in Barton (1994), Boeing (2016), Rabinovich et al. (2015), and Afraimovich et al. (2012). With this in place, the notion of metastability' has been proposed to be useful for conceptualizing the transient dynamics of large scale neural ensembles (Friston, 1997; Kelso, 2012 ). Specifically, metastability denotes the transient, task dependent formation and dissolution (soft assembly'; Clark, 2008), of neuronal ensembles that appears to be a key feature of adaptive neural dynamics (Tognoli and Kelso, 2014). Put another way, metastability denotes the phenomenon of fluid maintenance and switching between various (metastable) states or modes that is key to adaptive brain behavior dynamics (Kelso et al., 2013) and which appears to be disrupted in various psychopathologies (Carhart Harris and Friston, 2010; Carhart Harris et al., 2014). For instance, the activity of the default mode network (DMN) during cognitively demanding tasks (Raichle et al., 2001) involves the transient dissolution of the DMN concomit ant to transient, task dependent network formation in others areas of the brain (Bullmore and Sporns, 2009). This dynamics can be modeled using phase space models where a given metastable mode is exhibited as a temporally transient slowing of dynamics with in the locality of an attractor (Rabinovich et al., 2015). Switching between modes can then be can modeled as a dynamical transient between attractor regions. Specifically, we can model sequential switching between metastable modes using stable heteroclin ic channels' (SHCs; Rabinovich et al., 2015 ). SHCs prescribe a sequence of unidirectional transients between attractors along a manifold topology. When neural dynamics are within the locality of a given attractor, the dynamics temporarily tend towards this attractor; and then move (unidirectionally) to the next attractor region in some topological space. Thus, sequential switching between dynamical (metastable) modes corresponds to cyclic transitions between a sequence of metastable states (constituting a l imit cycle). This dynamics is robust to noise but flexible in the face of unexpected sensory data (e.g., Rabinovich et al., 2014) hence making it an attractive candidate for modeling neural dynamics (Rabinovich et al., 2015). In particular, the variationa l approach suggests a hierarchical ordering of SHCs in the brain (e.g., Kiebel et al., 2008). This is because a spatiotemporal partitioning (abstraction) of the dynamics of SHCs appear to be a necessary phenomenon of feeding information into recurrent neural networks (Hinton, 2007; Perfors et al., 2011 a ), and, most importantly, there is much neurobiological evide nce in favor of hierarchically organized dynamics in the brain (Friston, 2010). As noted in the main text, in spatiotemporally deep
35 neurodynamics the fast dynamics of a level of the neural hierarchy are prescribed (predicted) by the slow dynamics of the (f unctional) level above. Heuristically, a high level SHC goes through a single limit cycle in the time it takes for a low level SHC to cycle several times (Kiebel et al., 2008; 2009). For instance, Jensen and Colgin (2007) describe cortical nestings' of os cillatory dynamics (e.g., gamma rhythms nested within theta rhythms) that may functionally implement a wide range of phenomena, in particular when oscillatory dynamics are synchronized (Fries, 2005; Fell and Axmacher, 2011). Gain enhancement of sensory dat a expected to be highly precise (Feldman and Friston, 2010) causes ascending prediction error signals update the attractor topology of higher level dynamics. Neuromodulatory neurotransmitters such as dopamine (Friston et al., 2014a) and acetylcholine (Mora n et al., 2013) likely variably implement gain enhancement of precise sensory data by adjusting the post synaptic gain of ascending error signals. This leads to the fluid (metastable) switching of large scale neural ensembles between contextualized modes o f behavior that necessarily induces a novel (contextualized) dynamical regime at lower levels of the hierarchy (e.g., Tajima et al., 2017). Further empirical observations that spatiotemporally decomposable neural dynamics can be used to computationally rep licate include, e.g., chunking dynamics in working memory (Parr and Friston, 2017; Rabinovich et al., 2014). Indeed, this may lend explanatory power to the underlying neural dynamics of infant working memory, which has recently been demonstrated to be hier archically organized by as early as 7 months of age (Rosenberg and Feigenson, 2013). Moreover, Kiebel and Friston (2012) note that hierarchically organized SHCs are in fact the least expensive (neurobiologically plausible) computational implementation of r ecurrent neural networks, suggesting a neurobiological architecture for implementing (boundedly) optimal Bayesian inference throughout the lifespan (Gershman et al., 2015) and across species (Friston, 2013 a ; Ramstead et al., 2017).  The third definition is accuracy minus the Kullback Leibler divergence (Friston, 2010).  I follow Ramstead et al. (2016) who, following Bruineberg and Rietveld (2014), define an affordance' as "a relation between a feature or aspect of organisms' material environment [i.e ., niche] and an ability availability in their form of life," (p. 3; cf. Chemero, 2003).  Though I do not discuss this topic further, the interested reader is directed to work on the behavioral traditions' of other animals (reviewed in Whiten, 2011). A s has already been well rehearsed, the key difference between humans and other species' cultural proclivities seems to lie more in the ratcheted' memory of cumulative cultural evolution (Tennie et al., 2009), underlain by species unique motives for sharin g states (Tomasello et al., 2005).  This is, of course, not by any means to imply that one is a slave to one's (expectations of) the internal states of others. Indeed, issues such as novelty and exploration (Legare and Nielsen, 2015) are understood to b e integral to both constraining and altering the manifold space describing a culture's attractor dynamics. Though this is an issue of fundamental importance and one which lacks any applied research from the variational framework, due to space I leave these issues largely unexplored. The interested reader is encouraged to refer to work in the variational literature, for instance, Fris ton et al. (2012c; 2015b; 2017 b,c) and Schwartenbeck et al. (2013).  This is to say nothing about why the emergence of nove l units of biological organization on which natural selection operates is so rare (SzathmÂ‡ry, 2015). This is an interesting question to pursue within variational biology, and I may suggest that this has something to do with the fact that variational free e nergy is an extensive' quantity, that is, that free energy is additive in coupled systems (Constant et al., 2018; Friston et al., 2015a).
36  It is worthwhile to, however, that Tomasello (2014b) highlights that, if necessary, humans can reflect several la yers deep into recursive inference on others' mental states (cf., the problem of other minds; see Liebal and Carpenter, 2011). I do not pursue this topic any further in the present thesis, however important it may be.  This gets into the problematic not ion of innateness.' Just because some trait is adaptive (e.g., Begus et al., 2016; Marno et al., 2016) does not entail that it is selected for over evolutionary or phylogenetic timescales (McLoone and Smead, 2014). If external dynamics are such that there exists a large basin of attraction (i.e., set of initial states) for some phenotypic trait (i.e., attractor), then internal dynamics do not have to be modified as greatly over evolutionary timescales but are may be shaped over ontogenetic timescales to mi nimize free energy, as external dynamics make it likely that the trait emerges in a range of models encoding diverse full priors. An obvious benefit of this is flexibility in the developmental trajectory of the trait(s) over ontogeny (e.g., Frankenhuis et al., 2016). See McLoone and Smead (2014) and Tomasello et al. (2005) for useful discussion of this within the context of human social uniqueness.  Interestingly, this suggests an a priori preference for interacting with others with traits perceived as self similar, since sensory cues indicating overlapping generative models indicate that one can (reasonably) rely on existing shared predictions (common ground) and hence minimize the communicative effort required to attune representations. Pre theoretical ly, such cues to self similarity could potentially range from the communicatively explicit (e.g., linguistic communication to ensure model alignment; Pickering and Garrod, 2014) to the communicatively implicit (e.g., accent; Kinzler et al., 2007) and could include cultural or idiosyncratic items (e.g., clothing or religion; van Schaik and Carel, 2016; Dunham et al., 2011) or embodied aspects of the self (Richter et al., 2016). Compellingly, this a priori suggestion is in line with recent empirical reviews s uggesting the centrality of self similarity in guiding and constraining individual social motivations (Meltzoff, 2007; Jensen et al., 2014) and large scale cultural evolution (Haun and Over, 2013; Jensen et al., 2014).  The circularity here is duly not ed: why do communicative systems minimize their free energy? Because they instantiate hierarchical structure and dynamics (Ramstead et al., 2017). Well, why do they instantiate hierarchical structure and dynamics? Because they minimize free energy. Indeed, such circularity is well noted within the variational literature and suggests the problem of experimental unprovability for the entire paradigm (Allen and Friston, 2016). Specifically, the variational approach is (apparently) open to the criticism of jus t so' stories (e.g., Berwick et al., 2013 a ) any biological can (apparently) be said to minimize free energy given the starting assumptions of the whole paradigm. To this end, the reader is directed to discussion in, e.g., Bruineberg and Hesp (2017) and A llen and Friston (2016), where this question is pursued more fully. The importance of modeling studies informed by the variational formalism (e.g., Friston et al., 2015a) to complement empirical work is made central in Bruineberg and Hesp (2017). Additiona lly, the reader is invited to consider the nature of existing explanation in the biological sciences in relation to the variational formalism: the variational approach (proposes to) account for all of biological self organization (Friston, 2012a; 2013a; Ra mstead et al., 2017; Constant et al., 2018; Ramstead et al., under review). There is thus no phenomenon studied by the biological sciences that is not (proposed to be) subsumed by the variational formalism that one could point to in order to ground' their explanation in some outside phenomenon. (E.g., why is some theoretical point X reasonable to make? Well, unrelated theories say so and so, which accords with X. Hence, X appears reasonable. So and so phenomena can thus serve as unrelated but nonetheless u seful groundings' for one's theoretical
37 pursuit. They are useful insofar as their unrelatedness allows one to relate them to one's theory.) Instead, one can only non circularly ground the variational formulation that is, point to something in the world within (the physics and geometry of) information per se (Sengupta et al., 2016). Depending on one's outlook, this is an interesting development.  Variational Sociogenesis thus suggests that (i) at least certain abstract concepts such as syntax or some language module may not have to be innate' (e.g., Berwick et al., 2013 b ). Rather, this may be more helpfully construed as a product of domain general ontogenetic learning mechanisms (such as gain enhancement) reciprocally impacted by other scales wit hin a hierarchically nested env ironment (Ramstead et al., 2017 ). A weaker, related claim suggests that (ii) humans possess unique expectations of the kind generalizability of ostensively communicated information received specifically from conspecifics (Csi bra and Gergely, 2011 ). The considerations of precision provided by Variational Sociogenesis speak against this and, indeed, counterevidence already exists (Schmidt et al., 2011; Szufnarowska et al., 2014). For instance, regarding (i), it has been shown th at the neurocomputational architecture suggested here (Friston, 2010) can, e.g., parse syntactic (Banna rd et al., 2009; Perfors et al., 2011 b ) and semantic (Thaker et al., 2017) categories from sparse input. Moreover, work in the variational approach has s hown the promise of these neurocomputational models for implementing human like online inference (recognition) of speech input (Kiebel et al., 2008; 2009; Yildiz et al., 2013). Under the active inference formulation of biotic dynamics (Friston, 2013a), lea rning (recognition) of sensory data (e.g., a noisy speech stream) entails (the capacity for) subsequent production of that same sensory input (cf. comprehension vs. production in child language development). Indeed, this is just what it means to embody a g enerative model: generative models statistically recapitulate sensory (e.g., linguistic) input (Kidd et al., 2017). Children learn to talk by being attending to that talk ( Vouloumanos and Curtin, 2014 ; Begus et al., 2016; Marno et al., 2016 ). Thus, though Variational Sociogenesis does not by any means rule out model selection favoring specifically linguistic predispositions at the scale of function (i.e., biological adaptations for language; Berwick et al., 2013 b ), it does place heavy emphasis on the learna bility of the communicative system in (early) ontogeny via a (principled) generative architecture underlain by domain general processes of gain enhancement, in line with usage based approaches to language acquisition (Tomasello, 2003; Lieven, 2016).  T hough I do not pursue this further in the present thesis, cooperative motives' underlying infants' points are suggested to be (i) a cooperativized' (Tomasello, 2014b) form of great ape imperatives (see Tomasello, 2008). In humans, the imperative motive i s suggested to manifest as a continuum of requestive' motives, from more demanding to politer (Tomasello et al., 2007a). Moreover, there is suggested to exist (ii) declarative motives underlying pointing that are unique to Homo (Tomasello, 2008). Declarat ives are proposed by Tomasello et al. (2007a) to be decomposable into two unique forms, namely, expressive declaratives' and informative declaratives' (see Figure 1. of Tomasello et al., 2007a). Other authors have challenged this ontology by proposing th at, e.g., there exists an interrogative motive' that motivates infants to learn about their world from their caregiver. This motive has been proposed to either to (a) take the place of the supposed sharing' (declarative) motive proposed by Tomasello et a l. (2007a) (Southgate et al., 2007); or to simply exist alongside it as another motive for infants' points (Begus and Southgate, 2012). For better or worse, I do not pursue these specific motives here, nor do I consider the relevant distinction between top down versus bottom up joint attention (Carpenter and Liebal, 2011). Rather, I simply note that each motive presupposes jointly shared attention towards some outside (Tomasello, 2008).
38  Relatedly, I briefly note that precise sensory data may find their source in unfamiliar places (e.g., nonhuman primates), leading to quite interesting learning effects (e.g., Perszyk and Waxman, 2016; Ferguson and Waxman, 2016; Ferguson and Lew Williams, 2016 ; Ferry et al., 2013). It is interesting to consider whether th ese learning effects are specifically mediated by the presence of similarities in the underlying (precise) auditory features of human speech (Zarate et al., 2015) and the (generally imprecise) auditory feature s of nonhuman chimpanzee calls These effects a re most likely at particularly early ages, prior to widespread perceptual tuning (Mauer and Werker 2013) and may deserve closer inspection in future work.  A more comprehensive account of how individual differences in sharing develop in ontogeny (e.g. Kidd et al., 2017) will likely complicate matters rather greatly. Variational Sociogenesis provides a principled framework, however, for considering (hypothesizing) these highly complex dynamics. It suggests that individual (i) mechanistic and (ii) ontog enetic differences in generative model dynamics combine with (iii) wider culture specific factors such as, e.g., ingroup outgroup dynamics (Kinzler et al., 2012) developing at the scale of (cultural) phylogeny; which themselves are (iv) differentially enme shed in, and impacted by, phenotypical human life history patterns developing at the scale of function (Gurven and Gomes, 2017; Gopnik et al., 2017). Complicating matters further, this hyper dense matrix of differentially weighted modulatory phenomena circ ularly interacts between and within scales of analysis (Ramstead et al., 2017) to shape, constrain, and enable how the relationship between precision and familiarity with regard to shared interaction (and hence attunement to the cultural Markov blanket) ma nifests itself at any one point throughout the lifespan in an individual human.
39 Bibliography Adams, R. A., Shipp, S., and Friston, K. J. (2013). Predictions not commands: active inference in the motor system. Brain Struct. Funct ., 218 611 643. doi: 10.1007/s00429 012 0475 5 Afraimovich, V. S., Rabinovich, M. I., and Varona, P. (2012). Short Guide to Modern Nonlinear Dynamics. In M. I. Rabinovich et al. (Eds.) Principles of Brain Dynamics: Global State Interactions (pp. 313 338). Cambridge, MA: The MIT Press. Allen, M. and Friston, K. J. (2016). From cognitivism to autopoiesis: towards a computational framework for the embodied mind. Synthese doi: 10.1007/s11229 016 1288 Allman, J. M., Hakeem, A., Erwin, J. M., Nimchins ky, E., and Hof, P. (2001). The Anterior Cingulate Cortex: The Evolution of an Interface between Emotion and Cognition. Ann. NY Acad. Sci ., May:935, 107 117. Anderson, C. and Brown, C. E. (2010). The functions and dysfunctions of hierarchy. Research in Organizational Behavior 30 55 89. doi: 10.1016/j.riob.2010.08.002 Apps, M. A. J., Rushworth, M F. S., and Chang, S. W. C. (2016). The Anterior Cingulate Gyrus and Social Cognition: Tracking the Motivation of Others. Neuron 90 692 707. do i: 10.1016/j.neuron.2016.04.018 Angus, S. D. and Newton, J. (2015). Emergence of Shared Intentionality Is Coupled to the Advance of Cumulative Culture. PLoS Comput. Biol ., 11 (10), e1004587. doi: 10.1371/journal.pcbi.1004587 AraÂœjo, D. and Davids, K. (2016 ). Team Synergies in Sport: Theory and Measures. Front. Psychol ., 7 :1449. doi: 10.3389/fpsyg.2016.01449 Ashby, W. R. (1962). Principles of the self organizing system. In H. von Foerster and G. W. Zopf (Eds.) Principles of Self Organization: Transactions of the University of Illinois Symposium (pp. 255 278). Pergamon Press: London. Ashley, J. and Tomasello, M. (1998). Cooperative Problem Solving and Teaching in Preschoolers. Soc. Dev. 7 (2), 143 163. Avnir, D., Biham, O., Lidar, D., and Malcai, O. (1998). Is the Geometry of Nature Fractal? Science 279 (5347), 39 40 Aureli, T. and Presaghi, F. (2010). Developmental Trajectories for Mother Infant Coregulation in the Second Year of Life. Infancy 15 (6), 557 585. doi: 10.1111/j.1532 7078.2010.00034.x Bach, P ., Nicholson T., and Hudson, M. (2014). The affordance matching hypothesis: how objects guide action understanding and prediction. Front. Hum. Neurosci. 8 :254. doi: 10.3389/fnhum.2014.00254 Badcock, P. (2012). Evolutionary Systems Theory: A Unifying Meta Theory of Psychological Science. Review of General Psychology 16 (1), 10 23. doi: 10.1037/a0026381 Badcock, P. B., Davey, C. G., Whittle, S., Allen, N. B., and Friston, K. J. (2017). The Depressed Brain: An Evolutionar y Systems Theory. Trends in Cog nitive Sciences 21 (3), 182 192. doi: 10. 1016/j.tics.2017.01.005 Bannard, C., Lieven, E., and Tomasello, M. (2009). Modeling children's early grammatical knowledge. Proc. Natl. Acad. Sci. USA 106 (41), 17284 17289. doi: 10.1073/pnas.0905638106 Barton, S. ( 1994). Chaos, Self Organization, and Psychology. Am. Psychol ., 49 (1), 5 14. Bastos, A. M., Usrey, M., Adams, R. A., Mangun, G. R., Fries, P., and Friston, K. J. (2012). Canonical microcircuits for Predictive Coding. Neuron 76 695 711. doi: 10.1016/j.neuron.2012.10.038 Baumeister, R. F. and Leary M. R. (1995). The need to belong: desire for interpersonal attachments as a fundamental human motivation. Psychol. Bull ., 117 497 529. doi: 10.1037/0033 117.3.497 Beckner, C., Blythe, R., Bybee, J ., Christiansen, M. H., Croft, W., Ellis, N. C., Holland, J., Ke, J., Larsen Freeman, D., Schoenemann, T. (2009). Language Is a Complex Adaptive System: Position Paper. Lang. Learn ., 59 (1), 1 26. Begus, K. and Southgate, V. (2012). Infant pointing serves an interrogative function. Dev. Sci. doi: 10.1111/j.1467 7686.2012.01160.x Begus K., Gliga, T., and Southgate, V. (2014). Infants Learn What They Want to Learn: Responding to Infant Pointing Leads to Superior Learning. PLoS ONE 9 (10): e108817. doi: 10 .1371/journal.pone.010887 Begus, K., Gliga, T., and Southgate, V. (2016). Infants' preferences for native speakers are associated with an expectation of information. Proc. Natl. Acad. Sci. USA 113 (44), 12397 12402. doi: 10.1073/pnas.1603261113 Bejan, A. and Lorente, S. (2011). The constructal law and the evolution of design in nature. Phys. Life Rev. 8 209 240. doi : 10.1016/j.plrev.2011.05.010 Berwick, R. C., Hauser, M. D., and Tattersall, I. (2013a). Neanderthal language? Just so stories tak e center stage. Front. Psychol ., 4 :671. doi: 10.3389/fpsyg.2013.00671
40 Berwick, R. C., Friederici, A. D., Chomsky, N., and Bolhuis, J. J. (2013b). Evolution, brain, and the nature of language. Trends Cog. Sci ., 17 (2), 89 98. doi: 10.1016/j.tics.2012.12.00 2 Boeing, G. (2016). Visual Analysis of Nonlinear Dynamical Systems: Chaos, Fractals, Self Similarity, and the Limits of Prediction. Systems 4 (37). doi: 10.3390/systems4040037 Bohn, M. and KÂšymen, B. (2017). Common Ground and Development. Child Dev. Pers p doi: 10.1111/cdep.12269 Bohn, M., Call, J., and Tomasello, M. (2016). Comprehension of iconic gestures by chimpanzees and human children. J. Exp. Child Psychol ., 142 1 17. doi: 10.1016/j.jecp.2015.09.001 Bolt, N. K. and Loehr, J. D. (2017). The predictability of a partner's actions modulates the sense of joint agency. Cognition 161 60 65. doi: 10.1016/j.cognition.2017.01.004 Botvinick, M. M. (2008). Hierarchical models of behavior and prefrontal functio n. Trends Cog. Sci ., 12 (5), 201 208. doi: 10.1016/j.tics.2008.02.009 Bratman, M (1992). Shared Cooperative Activity. The Philosophical Review 101 (2), 327 341. Brown, H., Adams, R. A., Parees, I., Edwards, M., and Friston, K. (2013). Active inference, sensory attenuation and illusions. Cogn. Process. 14 411 427. doi: 10.1007/s10339 013 0571 3 Brownell, C. (2011). Early Developments in Joint Action. Rev. Phil. Psych doi: 10.1007/s13164 011 0056 1 Brownell, C., Ramani, G. B., and Zerwas, S. (2006). Becoming a Social Partner With Peers: Cooperation and Social Understanding in One and Two Year Olds. Child Development 77 (4), 803 821. Bruineberg, J. (2017). Activ e Inference and the Primacy of the I Can'. In T. Metzinger & W. Wiese (Eds.) Philosophy and Predictive Processing: 5 Frankfurt am Main: MIND Group. doi: 10.155027/9783958573062 Bruineberg, J. and Rietveld, E. (2014). Self organization, free energy minim ization, and optimal grip on a field of affordances. Front. Hum. Neurosci. 8 :599. doi: 10.3389/fnhum.2014.00599 Bruineberg, J. and Hesp, C. (2017). Beyond blanket terms: Challenges for the explanatory value of variational (neuro )ethology. Commentary o n "Answering SchrÂšdinger's question: A free energy formulation" by Maxwell James DÂŽsormeau Ramstead et al. Physics of Life Reviews doi: 10.1016/j.plrev.2017.11.015 Bruineberg, J., Kiverstein, J., and Rietveld, E. (2016). The anticipating brain is not a sc ientist: the free energy principle from an ecological enactivist perspective. Synthese doi: 10.1007/s11229 016 1239 1 Brunetti, M., Zappasodi, F., Marzetti, L., Perucci, M. G., Cirillo, S., Romani, G. L., Pizzella, V., and Aureli, T. (2014). Do you know what I mean? Brain oscillations and the understanding of communicative intentions. Front. Hum. Neurosci ., 8 (36). doi: 10.3389/fnhum.2014.00036 Buckner, R. L. and Krienen, F. M. (2013). The evolution of distributed association networks in the human brain. Trends Cog. Sci. 17 (12), 648 665. doi: 10.1016/j.tics.2013.09.017 Bullinger, A. F., Melis, A. P., and Tomasello, M. (2011). Chimpanzees, Pan troglodytes, prefer individuals over collaborative strategies towards goals. Animal Behaviour 82 1125 1141. do i: 10.1016/j.anbehav.2011.08.008 Bullmore, E. and Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10, 186 198. doi: 10.1038/nrn2575 Bunce, J. A. and McElreath, R. (201 8). Sustainability of minority culture when inter ethnic interaction is profitable. Nature Hum. Behav doi: 10.1038/s41562 018 0306 7 Butko, N. J. and Movellan, J. R. (2010). Detecting contingencies: An infomax approach. Neural Networks 23 973 984. doi : 10.1016/j.neunet.2010.09.001 Byrge, L., Sporns, O., and Smith, L. B. (2014). Developmental process emerges from extended brain body behavior networks. Trends in Cog. Sci. 19 (8), 395 403. doi: 10.1016/j.tics.2014.04.010 Campbell, J. O. (2016). Universal Darwinism As a Process of Bayesian Inference. Front. Syst. Neurosci ., 10 :49. doi: 10.3389 /fnsys.2016.00049 Carhart Harris, R. L. and Friston, K. J. (2010). The default mode, ego function and free energy: a neurobiological account of Freudian ideas. Brain 133 1265 1283. doi: 10.1093/brain/awq010 Carhart Harris, R. L., Leech, R., Hellyer, P. J., Shanahan, M., Feilding, A., Tagliazucchi, E., Chialvo, D. R., and Nutt, D. (2014). The entropic brain: a theory of conscious st ates informed by neuroimaging research with psychedelic drugs. Front. Hum. Neurosci. 8 :20. doi: 10.3389/fnhum.2014.00020 Carhart Harris, R. L., Erritzoe, D., Haijen, E., Kaelen, M., and Watts, R. (2018). Psychedelics and connectedness. Psychopharmacology 235 547 550. doi: 10.1007/s00213 017 4701 y Carpenter, M. (2009). Just How Joint Is Joint Action in Infancy? Topics in Cognitive Science 1 380 392. doi: 10.1111/j.1756 8765.2009.01026.x Carpenter, M. (2011). Social Cognition and Social Motivations in Infancy. In U. Goswami (Ed.) The Wiley Blackwell Handbook of Childhood Cognitive Development (pp. 106 28). Malden, MA: Wiley Blackwell.
41 Carpenter, M. and Liebal K. (2011). Joint attention, communication, and knowing together in infancy. In A. Seemann (Ed.), Joint attention: New developments in psychology, philosophy of mind, and social neuroscience (pp. 159 181). Cambridge, MA: MIT Press. Carpenter, M. and Call, J. (2013). How joint is the joint attention of apes and human infants? In J. Metcalfe and H. S. Terrace (Eds.), Agency and joint attention (pp. 49 61). New York, NY: Oxford University Press. Carpenter, M., Nagell, K., and Tomasello, M. (1998). Social Cognition, Joint Attention, and Communicative Competence from 9 to 15 months of age. Monogr. Soc. Res. Child Dev ., 63 (4). Chambon, V. et al. (2017). Neural coding of prior expectations in hierarchical intention inference. Sci. Rep ., 7 :1278. doi: 10.1038/ s41598 017 01414 y Chemero, A. ( 2003). Outline of a Theory of Affordances. Ecol. Psychol. 15 (2), 181 195. Chiel H. J. and Beer, R. D. (1997). The brain has a body: adaptive behavior emerges from interactions of nervous system, body, and environment. Trends Neurosci ., 20 (12), 553 557. Christiansen, M. H. and Chater N. (2008). Language as shaped by the brain. Behav. Brain Sci ., 31 489 558. doi: 10.1017/S0140525X08004998 Chomsky, N. (1997). Media Control: The Spectacular Achievements of Propaganda New York, NY: Seven Stories Press. Cisek, P. (2007). Cortical mechanisms of action selection: the affordance competition hypothesis. Phil. Trans. R. Soc. B 362 1585 1599. doi: 10.1098/rstb.2007.2054 Clark, H. H. ( 1996). Using Language Cambridge, UK: Cambridge University Press. Clark, A. (2006). Language, embodiment, and the cognitive niche. Trends in Cognitive Sciences 10 (8), 370 4. doi: 10.1016/j.tics.2006.06.012 Clark, A. (2008). Supersizing the Mind: Embodiment, Action, and Cognitive Extension Oxford, U.K.: Oxford Univers ity Press Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences doi: 10.1017/S0140525X12000477 Clark, A. (2016). Surfing Uncertainty: Prediction, Action, and the Embodied M ind Oxford, UK: Oxford University Press. Clark, A. (2017). How to Knit Your Own Markov Blanket: Resisting the Second Law with Metamorphic Minds. In T. Metzinger & W. Wiese (Eds.) Philosophy and Predictive Processing : 3. Frankfurt am Main: MIND Group. do i: 10.15502/9783958573031 Cohen, E. (2012). The Evolution of Tag Based Cooperation in Humans: The Case for Accent. Curr. Anthro ., 53 (5), 588 616. doi: 10.1086/667654 Conant, R. C. and Ashby, W. R. (1970). Every Good Regulatory of a System Must Be a Model of That System. Int. J. Systems Sci ., 1 (2), 89 97. Constant, A., Ramstead, M. J. D., VeissiÂre, S. P. L., Campbell, J. O., and Friston, K. J. (2018). A Variational Approach to Niche Construction. Proc. R. Soc. Interface 15 :20170685. doi: 10.1098/rsif.2017.0685 Corballis, M. (2017). Language Evolution: A Changing Perspective. Trends Cogn. Sci ., 21 (4), 229 235. doi: 10.1016/j.tics.2017.01.013 Corominas Murtra, B., GoÂ–i, J., SolÂŽ, R. V., and Rodriguez Caso, C. (2013). On the origins of hiera rchy in complex systems. Proc. Natl. Acad. Sci. USA 110 (33), 13316 13321. doi: 10.1073/pnas.1300832110 Creanza, N., Kolodny, O., and Feldman, M. W. (2017). Cultural evolutionary theory: How culture evolves and why it matters. Proc. Natl. Acad. Sci. USA 114 (30), 7782 7789. doi: 10.1073/pnas.1620732114 Csibra, G. (2008). Action mirroring and action understanding: an alternative account. Action mirroring and action interpretation: an alternative account. In P. Haggard et al. (Eds.) Sensorimotor Foundations of Higher Cognition (Attention and Performance Vol. 22) (pp. 435 459). Oxford, UK: Oxford University Press. Csibra, G. (2010). Recognizing Communicative Intentions in Infancy. Mind and Language 25 (2), 141 168. Csibra, G. and Gergely, G. (2011) Natural pedagogy as evolutionary adaptation. Phil. Trans. R. Soc. B 366 1149 1157. doi: 10.1098/rstb.2010.0319 De Jaegher, H. and Di Paolo, E. (2007). Participatory sense making: An enactive approach to social cognition. Phenom. Cogn. Sci. 6 485 507. doi: 10.1007/s11097 007 9076 9 De Jaegher, H., Di Paolo, E., and Gallagher, S. (2010). Can social interaction constitute social cognition? Trends Cogn. Sci ., 14 (10), 441 447. doi: 10.1016/j.tics.2010.06.009 den Ouden, H. E. M., Kok, P., and de La nge, F. P. (2012). How prediction errors shape perception, attention, and motivation. Front. Psychol. 3 :548. doi: 10.3389/fpsyg.2012.00548 Di Paolo, E. and De Jaegher, H. (2012). The interactive brain hypothesis. Front. Hum. Neurosci. 6 :163. doi: 10.33 89/fnhum.2012.00163
42 Dikker, S. et al. (2017). Brain to Brain Synchrony Tracks Real World Dynamic Interactions in the Classroom. Curr. Biol ., 27 1375 1380. doi: 10.1016/j.cub.2017.04.002 Dindo, H., Donnarumma, F., Chersi, F., and Pezzulo, G. (2014). The i ntentional stance as structure learning: a computational perspective on mindreading. Biol. Cybern doi: 10.1007/s00422 015 0654 6 Dingemanse, M., Blasi, D. E., Lupyan, G., Christiansen, M. H., and Monaghan, P. (2015). Arbitrariness, Iconicity, and System aticity in Language. Trends Cog. Sci ., 19 (10), 603 615. doi: 10.1016/j.tics.2015.07.013 Duguid, S., Wyman, E., Bullinger, A. F., Herfurth Majstorovic, K., and Tomasello, M. (2014). Coordination strategies of chimpanzees and human children in a Stag Hunt g ame. Proc. R. Soc. B 281 :20141973. doi: 10.1098/rspb.2014.1973 Dumas, G., Kelso, J. A. S., and Nadel, J. (2014). Tackling the social cognition paradox through multi scale approaches. Front. Psychol ., 5 :882. doi: 10.3389/fpsyg.2014.00882 Dunham, Y., Baron, A. S., and Carey. (2011). Consequences of "Minimal" Group Affiliations in Children. Child Dev 82 (3), 793 811. doi: 10.1111/j.1467 8624.2011.01577.x Engels, A. K., Friston, K. J., and Kragic, D. (2015). The Pragmatic Turn: Toward Actio n Oriented Views in Cognitive Science Cambridge, MA: The MIT Press. Etkin, A., Egner, T., and Kalisch, R. (2011). Emotional processing in anterior cingulate and medial prefrontal cortex. Trends Cog. Sci ., 15 (2), 85 93. doi: 10.1016.j.tics.2010.11.004 Everaert, M. B. H., Huybregts, M. A. C., Chomsky, N., Berwick, R. C., and Bolhuis, J. J. (2015). Structures, Not Strings: Linguistics as Part of the Cognitive Sciences. Trends Cog. Sci ., 19 (12), 729 743. doi: 10.1016/j.tics.201.09.008 Fabry, R. (2017). B etwixt and between: the enculturated predictive processing approach to cognition. Synthese doi: 10.1007/11229 017 1334 y Falk, E. B. and Bassett, D. S. (2017). Brain and Social Networks: Fundamental Building Blocks of Human Experience. Trends Cogn. Sci 21 (9), 674 690. doi: 10.1016/j.tics.2017.06.009 Fay, N. and Ellison, M. (2013). The Cultural Evolution of Human Communication Systems in Different Sized Populations: Usability Trumps Learnability. PLoS ONE 8 (8): e71781. doi: 10.1371/journal.pone.007178 1 Fay, N., Arbib, M., and Garrod, S. (2013). How to Bootstrap a Human Communication System. Cog. Sci doi: 10.1111/cos.12048 Feldman, R. (2015). The adaptive human parental brain: implications for children's social development. Trends Neurosci. doi: 10. 1016/j.tins.2015.04.004 Feldman, R. (2017). The Neurobiology of Human Attachments. Trends Cog. Sci ., 21 (2), 80 99. doi: 10.1016/j.tics.2016.11.007 Feldman, H. and Friston K. J. (2010). Attention, uncertainty, and free energy. Front. Hum. Neurosci. 4 (215). doi: 10.3389/fnhum.2010.00215 Fell, J. and Axmacher, N. (2011). The role of phase synchronization in memory processes. Nat. Rev. Neurosci ., 12 105 118. doi: 10.1038/nrn2979 Ferguson, B. and Lew Williams, C. (2016). Communicative signals support a bstract rule learning by 7 month old infants. Sci. Rep. 6 :25434. doi: 10.38/srep25434 Ferguson, B. and Waxman, S. R. (2016). What the [beep]? Six month olds link novel communicative signals to meaning. Cognition 146 185 189. doi: 10.1016/j.cognition.2 015.09.020 Ferry, A. L., Hespos, S. J., and Waxman, S. R. (2013). Nonhuman primate vocalizations support categorization in very young human infants. Proc. Natl. Acad. Sci. USA 110 (38), 15231 15235. doi: 10.1073/pnas.12221166110 Flynn, E. and Whiten, A. (2008). Cultural Transmission of Tool Use in Young Children: A Diffusion Chain Study. Soc Dev ., 17 (3), 699 718. doi: 10.1111/j.1467 9507.2007.00453. Fogel, A. and Garvey, A. ( 2007). Alive communication. Infant Behavior and Development 30 251 257. doi: 10.1016/j.infbeh.2007.02.007 Frankenhuis, W. E., Panchanathan, K., and Nettle, D. (2016). Cognition in harsh and unpredictable environments. Curr. Op. Psychol. 7 76 80. doi: 10.1016/j.copsyc.2015.08.011 Fries, P. (2005). A mechanism for cognitive dynamics: neuronal communication through neuronal coherence. Trends Cog. Sci. 9 (10), 474 480. doi: 10.1016/j.tics.2005.08.011 Friston, K. J. (1997). Transients, Metastability, and Neuronal Dynamics. NeuroImage 5 : N1970259, 164 171. Friston, K. (2005). A theory of cortical responses. Phil. Trans. R. Soc. B 360 815 836. doi: 10.1098/rstb.2005.1622 Friston, K. (2009). The free energy principle: a rough guide to the brain? Trends in Cognitive Sciences 13 (7), 293 301. doi: 10.101 6/j. tics.2009.04.005 Friston, K. (2010). The free energy principle: a unified brain theory? Nature Reviews Neuroscience doi:
43 10.1038/nrn2787 Friston, K. (2011). What Is Optimal About Motor Control? Neuron, 72 488 498. doi: 10.1016/j.neuron.2011.10.018 Friston, K. (2012a). A Free Energy Principle for Biological Systems. Entropy 14 2100 2121. doi: 10.3390/e14112100 Friston, K. (2012b). The history and future of the Bayesian Brain. NeuroImage 62 1230 1233. d oi: 10.1016/j.neuroimage.2011.10.004 Fristo n, K. (2013a). Life as we know it. J. R. Soc. Interface 10 :20130475. doi:10.1098/rsif.2013.0475 Friston, K. (2013b). Consciousness and Hierarchical Inference. Neuropsychoanalysis 15 (1), 38 42. Friston, K. J. (2015). The Mindful Filter: Free Energy and Ac tion. In A. K. Engel, K. J. Friston, and D. Kragic (Eds.) The Pragmatic Turn: Toward Action Oriented Views in Cognitive Science (pp. 97 108). Cambridge, MA: The MIT Press. Friston, K. and Frith, C. (2015a). A Duet for one. Consciousness and Cognition 36 390 405. doi: 10.1016/j.concog.2014.12.003 Friston, K. and Frith, C. (2015b). Active inference, communication and hermeneutics. Cortex 68 129 143. doi: 10.1016/j.cortex.2015.03.025 Friston, K. J. and Stephan, K. E. (2007). Free energy and the brain. Synthese 159 417 458. doi: 10.1007/s11229 007 9237 y Friston, K., Adams, R. A., Perrinet, L., and Breakspear, M. (2012a). Perceptions as hypotheses: saccades as experiments. Front. Psychol. 3 :151. doi: 10.3389/fpsyg.2012.00151 Friston, K. J. et al. (2 012b). Dopamine, Affordance and Active Inference. PLoS Comput Biol 8 (1): e1002327. doi: 10.1371/journal.pcbi.1002327 Friston, K., Thornton, C., and Clark, A. (2012c). Free energy minimization and the dark room problem. Front. Psychol. 3: 130. doi: 10.3389/fpsyg.2012.00130 Friston, K., Schwartenbeck, P., FitzGerald, T., Moutoussis, M., Behrens, T., and Dolan, R. J. (2014a). The anatomy of choice: dopamine and decision making Phil. Trans. R. Soc. B 369 : 20130481. doi: 10.1098/rstb.2013.0481 Fristo n, K. J., Stephan, K. E., Montague, R., and Dolan, R. J. (2014b). Computational psychiatry: the brain as a phantastic organ. Lancet Psychiatry 1 148 158. Friston, K. J., Levin, M., Sengupta, B., and Pezzulo, G. (2015a). Knowing one's place: a free energ y approach to pattern regulation. J. R. Soc. Interface 12 :20141383. doi: 10.1098/rsif.2014.1383 Frith, U. and Frith, C. (2010). The social brain: allowing humans to boldly go where no others species has been. Proc. Trans. R. Soc. B 365, 165 175. doi: 10 .1098/rstb.2009.00160 Fusaroli, R., Gangopadhyay, N., and TylÂŽn, K. (2014). The dialogically extended mind: Language as skilful intersubjective engagement. Cognitive Systems Research 29 30 31 39. doi: 10.1016/j.cogsys.201306.002 Gallagher, S. and Allen, M. (2016). Active inference, enactivism and the hermeneutics of social cognition. Synthese doi: 10.1007/s11229 016 1269 8 Gallotti, M. and Frith, C. D. (2013). Social cognition in the we mode. Trends in Cognitive Sciences 17 (4), 165 165. doi: 10.1016/j.tics.2013.02.002 GardenfÂšrs, P. (2014). The Geometry of Meaning: Semantics Based on Conceptual Spaces Cambridge, MA: The MIT Press. Garrod, S. and Pickering, M. J. (2004). Why is conversation so easy? Trends in Cognitive Sciences 8 (1), 8 11. d oi: 10.1016/j.tics.2003 Gandhi, A., Levin, S., and Orszag, S. (1998). "Critical Slowing Down" in Time to extinction: an Example of Critical Phenomena in Ecology. J. Theor. Biol ., 192 363 376. Gelman, S. A. and Roberts, S. O. (2017). How language shapes the cultural inheritance of categories. Proc. Natl. Acad. Sci. USA 114 (30), 7900 7907. doi: 10.1073/pnas.1621073114 Gershman, S. J. (2017). On the blessing of abstraction. The Quarterly Journal of Experiment al Psychology 70 (3), 36 1 365. doi: 10.1080/17470218.2016.1159706 Gershman, S. J., Horvitz, E. J., and Tenenbaum, J. B. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349 (6245), 273 278. doi: 10.1126/science.aac6076 Gilbert, M. (1989). On Social Facts Princeton, NJ: Princeton University Press. Goldberg, A. E. (2003). Constructions: a new theoretical approach to language. Trends Cog. Sci ., 7 (5), 219 224. doi: 10.1016/S1364 6613(03)00080 9 Goldvicht Bacon, E. and Diesendruck, G. (2016). Children's capacity to use cultural focal points in coordination problems. Cognition 149 95 103. doi: 10.1016/j.cognition.2015.12.016 Goodman, N. D., Ul lman, T. D., and Tenenb aum, J. B. (2009). Learning a Theory of Causality. Psychol. Rev .,
44 118 (1), 110 119. doi: 10.1037/a0021336 Gopnik, A. et al. (2017). Changes in cognitive flexibility and hypothesis search across human life history from childhood to adolescence to adulthood. Proc. Natl. Acad. Sci. USA 114 (30), 7892 7899. doi: 10.1073/pnas.1700811114 Gratier, M., Devouche, E., Guellai, B., Infanti, R., Yilmaz, E., and Parlato Oliveira, E. (2015). Early development of turn taking in vocal interaction between mothers and infa nts. Front. Psychol ., 6: 1167. doi: 10.3389/fpsyg.2015.01167 Grau Moya, J., Hez, E., Pezzulo, G., and Braun, D. A. (2013). The effect of model uncertainty on cooperation in sensorimotor interactions. J. R. Soc. Interface 10 :20130554. doi: 10.1098/rsif.2013.0554 Gregory, R. L. (1980). Perceptions as hypotheses. Phil. Trans. R. Soc. Lond. B 290 181 197. Grice, C. (1975). Logic and Conversation. In Cole et al. (Eds.) Syntax and Semantics 3: Speech arts (pp. 41 58). New York, NY: Academic Pr ess. Griffiths, T. L. and Kalish, M. (2005). A Bayesian view of language evolution by iterated learning. In B. G. Bara et al. (Eds.) Proceedings of the 27 th Annual Conference of the Cognitive Science Society (pp. 827 832). Austin, TX: Cognitive Science Society. Griffiths, T. L. and Kalish, M. (2007). Language evolution by iterated learning with Bayesian agents. Cog. Sci ., 31 441 480. Grossmann, T. (2015). The Development of Social Brain Functions in Infancy Psychol. Bull. doi: 10.1037/bil0000002 Grossmann, T. (2017). The Eyes as Windows Into Other Minds: An Integrative Perspective. Persp. Psychol. Sci ., 12 (1), 107 121. doi: 10.1177/1745691616654457 Gurven, M. D. and Gomes, C. M. (2017). Mortality, Senescence, and the Lifespan. In M. Muller et al. (Eds.) Chimpanzees and Human Evolution (pp. 181 216). Cambridge, MA: Harvard Belknap Press. Hare, B. and Tomasello, M. (2005). Human like social skills in dogs? Trends Cogn. Sci. 9 (9), 439 444. doi: 10.1016/j.tics.2005.07.003 Hare B. and Wrangham, R. W. (2017). Equal, Similar, but Different: Convergent Bonobos and Conserved Chimpanzees. In M. Muller et al. (Eds.) Chimpanzees and Human Evolution (pp. 142 173). Cambridge, MA: Harvard Belknap Press. Hari, R., Henriksson, L., Malinen, S., and Parkkonen, L. (2015). Centrality of Social Interaction in Human Brain Function. Neuron 7 (88), 181 193. doi: 10.1016/j.neuron.2015.09.022 Hari, R., Sams, M., and Numme nmaa, L. (2016). Attending to and neglecting people: bridging neuroscience, psychology, and sociology. Phil. Trans. R. Soc. B 371 : 20150365. doi: 10.1083/rstb.2015.0365 Haroush, K. and Williams, Z. M. (2015). Neuronal Prediction of Opponent's Behavior during Cooperative Social Interchange in Primates. Neuron 160 1 13. doi: 10.1016/j.neuron.2015.01.045 Harris P. L. and Corriveau, K. H. (2011). Young children's selectiv e trust in informants. Phil. Trans. R. Soc. B 366 1179 1190. doi: 10.1098/rstb.2010.0321 Harris, P. L., Bartz, D. T., and Rowe, M. L. (2017). Young children communicate their ignorance and ask questions. Proc. Natl. Acad. Sci. USA 114 (30), 7884 7891. doi: 10.1073/pnas.1620745114 Han, S. (2015). Understanding cultural differences in human behavior: a cultural neuroscience approach. Curr. Op. Behav. Sci. 3 68 72. doi: 10.1016/j.cobeha.2015.01.013 Han, S. and Ma, Y. (2015). A Culture Behavior Brain Loop Model of Human Development. Trends Cog. Sci ., 19 (11), 666 676. doi: 10.1016/j.tics.2015.08.010 Hasson U. and Frith, C. D. (2016). Mirroring and beyond: coupled dynamics as a generalized framework f or modelling social interaction. Phil. Trans. R. Soc. B 371 :20150366. doi: 10.1098/rstb.2015.0366 Hasson, U., Ghazanfar, A. A., Galantucci, B., Garrod, S., and Keysers, C. (2012). Brain to brain coupling: a mechanism for creating and sharing a social wo rld. Trends Cog. Sci. 16 (2), 114 121. doi: 10.1016/j.tics.2011.12.007 Haun, D. B. M. and Over, H. (2013). Like Me: A Homophily Based Account of Human Culture. In P. J. Richerson and M. H. Christiansen (Eds.) Cultural Evolution (pp. 75 85). Cambridge, MA: The MIT Press Hebb, D. O. (1949). The Organization of Behavior New York, NY: Wiley. Herrmann, E., Call, J., HernÂ‡ndez Lloreda, M. V., Hare, B., and Tomasello, M. (2007). Humans Have Evolved Specialized Skills of Social Cognition: The Cultural Intelligen ce Hypothesis. Science 317 1360 1366. doi: 10.1126/science .1146282 Hinton, G. E. (2007). Learning multiple layers of representation. Trends Cog. Sci. 11 (10), 428 434. doi: 10.1016/j.tics.2007.09.004 Hockett, C. F. (1960). The Origin of Speech. Scientific American 203 89 97.
45 Hohwy, J. (2013 ). The Predictive Mind Oxford, UK: Oxford University Press. Holroyd, C. B. and Yeung, N. (2012). Motivation of extended behaviors by anterior cingulate cortex. Trends Cog. Sci ., 16 (2), 122 128. doi: 10.1016 /j.tics.2011.12.008 Hrdy, S. B. (2011). Mothers and Others: The Evolutionary Origins of Mutual Understanding Cambridge, MA: Harvard University Press. Huntenburg, J. M., Bazin, P. L., and Margulies, D. S. (2018). Large Scale Gradients in Human Cortical O rganization. Trends Cog. Sci ., 22 (1), 21 31. doi: 10.1016/j.tics.2017.11.002 Jensen, O. and Colgin, L. L. (2007). Cross frequency coupling between neuronal oscillations. Trends Cog. Sci ., 11 (7), 267 269. doi: 10.1016/j.tics.2007.05.003 Jensen, K., Vaish, A., and Schmidt, M. H. F. (2014). The emergence of human prosociality: aligning with others through feelings, concerns, and norms. Front. Psych. 5 :822. doi: 10.3389/fpsyg.2014.00822 Joiner, J., Piva, M., Turrin, C., and Chang, S. W. C. (2017). Social learning through prediction error in the brain. npj Science of Learning 2 (8). doi: 10.1038/s41539 017 0009 2 Kanai, R., Komura, Y., Shipp, S., and Friston, K. (2016). Cerebral hierarchies: predictive processing, precision, and the pulvinar. Phil. Trans. R. Soc. B 370: 20140169. doi: 10.1098/rstb.2014.0169 Karmali, F., Whitman, G. T., and Lewis, R. E. (2018). Bayesian optimal adaptation explains age related human sensorimotor changes. J. Neurophysiol ., 119 509 520. doi: 10.1152 /jn.00710.2017 Kelso, J. A. S. (2012). Multistability and metastability: understanding dynamic coordination in the brain. Phil. Trans. R. Soc. B 367 906 918. doi: 10.1098/rstb.2011.0351 Kelso, J. A. S. (2016). On the Self Organizing Origins of Agency. Trends Cog. Sci. 20 (7), 490 499. doi: 10.10 16/j.tics.2016.04.004 Kelso, J. A. S., Dumas, G., and Tognoli, E. (2013). Outline of a general theory of behavior and brain coordination. Neural Networks 37 120 131. doi: 10.1016/j.neunet.2012.09.003 Kemp, C. Perfors, A., and Tenenbaum, J. B. (2007). Learning overhypotheses with hierarchical Bayesian models. Dev. Sci ., 10 (3), 307 321. doi: 10.1111/j.1467 7687.2007.00585.x Kern, A. and Moll, H. (2017). On the transformational character of collective intention ality and the uniqueness of the human. Philosophical Psychology 30 (3), 315 333. doi: 10.80/09515089.2017.1295648 Kidd, C., Piantadosi, S. T., and Aslin, R. N. (2012). The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Ne ither Too Simple Nor Too Complex. PLoS ONE 7 (5): e36399. doi: 10.1371/journal.pone.0036399 Kidd, E., Donnelly, S., and Christiansen, M. H. (2017). Individual Differences in Language Acquisition and Processing. Trends Cog. Sci ., 22 (2), 154 169. doi: 10.1 016/j.tics.2017.11.006 Kiebel, S. J. and Frisotn, K. J. (2012). Recognition of Sequences of Sequences Using Nonlinear Dynamical Systems. In M. I. Rabinovich et al. (Eds.) Principles of Brain Dynamics: Global State Interactions (pp. 113 140). Cambridge, MA: The MIT Press. Kiebel, S. J., Daunizeau, J., and Friston, K. J. (2008). A Hierarchy of Time Scales and the Brain. PLoS Comput. Biol ., 4 (11): e1000209. doi: 10.1371/journal.pcbi.1000209 Kiebel, S. J., von Kriegstein, K., Daun izeau, J., and Friston, K. J. (2009). Recognizing Sequences of Sequences. PLoS Computational Biology 5 (8), e1000464. doi: 10.1371/journal.pcbi.1000464 Kinzler, K. D., Dupoux, E., and Spelke, E. S. (2007). The native language of social cognition. Proc. Natl. Acad. Sci. USA, 104 (30), 12577 12580. doi: 10.1073/pnas.0705345104 Kinzler, K. D., Shutts, K., and Spelke, E. S. (2012). Language based Social Preferences among Children in South Africa. Lang. Learn. Dev ., 8 (3), 215 232. doi: 10.1080/15475441 .2011.583611 Kirby, S ., Cornish, H., and Smith, K. (2008). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proc. Natl. Acad. Sci. USA 105 (31), 10681 10686. doi: 10.1073/pnas.070783 5105 Kirby, S., Griffiths, T., and Smith, K. (2014). Iterated learning and the evolution of language. Curr. Op. Neurobiol ., 28 108 114. doi: 10.1016/j.conb.2014.07.014 Kirby, S., Tamariz, M., Cornish, H., and Smith, K. (2015). Compression and communication in the cultural evolution of linguistic structure. Cognition 141 87 102. doi: 10.1016/j.cognition.2015.03.016 Kirchhoff, M. D. (2016). Autopoiesis, free energy, and the life mind continuity thesis. Synthese doi: 10.1007/s11229 016 1100 6 Kirchhoff, M. D. and Froese, T. (2017). Where There Is Life There Is Mind: In Support of a Strong Life Mind Continuity Thesis. Entropy 19 169. doi: 10.3390/e19040169 Kleckner, I R., Zhang, J., Touroutoglou, A., Chanes, L., Xia, C., Simmons, W. K., Quigley, K. S., Dickerson, B. C.,
46 and Barrett, L. F. (2017). Evidence for a large scale brain system supporting allostasis and interoception in humans. Nature Hum. Behav ., 1 :0069. doi :10.1038/s41562 017 0069 Koster Hale, J. and Saxe, R. (2013). Theory of Mind: A Neural Prediction Problem. Neuron 79 836 848. doi: 10.1016/j.neuron.2013.08.020 Lane, J. D., Wellman, H. M., Olseon, S. L., Miller, A. L., Wang, L., and Tardif, T. (2013). Relations Between Temperament and Theory of Development in the United States and China: Biological and Behavioral Correlates of Preschoolers' False Belief Understa nding. Dev. Psychol ., 49 (5), 825 836. doi: 10.1037/a002825 Lavin, C., Melis, C., Mikulan, E., Gelomini, C., Huepe, D., and IbaÂ–ez, A. (2013). The anterior cingulate cortex: an integrative hub for human socially driven interactions. Front. Neurosci ., 7 :64 doi: 10.3389/fnins.2013.00064 Legare, C. H. and Nielsen, M. (2015). Imitation and Innovation: The Dual Engines of Cultural Learning. Trends Cog. Sci ., 19 (11), 688 699. doi: 10.1016/j.tics.2015.08.005 Levinson, S. C. (2016). Turn taking in Human Communica tion Origins and Implications for Language Processing. Trends Cog. Sci. 20 (1), 6 14. doi: 10.1016/j.tics.2015.10.010 Liebal, K., Carpenter, M., and Tomasello, M. (2013). Young children's understanding of cultural common ground. Brit. J. Dev. Psychol ., 31 88 96. doi: 10.1111/j.2044 835X.2012.02080.x Lieven, E. (2016). Usage based approaches to language development: Where do we go from here? Language and Cognition 8 346 368. doi: 10.1017/langcog.2016.16 Lieven, E. and Stoll, S. (2013). Early Communi cative Development in Two Cultures: A Comparison of the Communicative Environments of Children from Two Cultures. Hum. Dev ., 56 178 206. doi: 10.1159/000351073 Liszkowski U. (2013). Using Theory of Mind. Child Development Perspectives 7 (2), 104 109. do i: 10.1111/cdep.12025 Liszkowski U., Carpenter, M., Henning, A., Striano, T., and Tomasello, M. (2004). Twelve month olds point to share attention and interest. Dev. Sci ., 7 (3), 297 307. Liszkowski U., Carpenter, M., and Tomasello, M. (2007). Pointing out new news, old news, and absent referents at 12 months of age. Dev Sci ., 10 (2), F1 F7. doi: 10.1111/j.1467 7687.2006.00552.x Liu, Y., Piazza, E. A., Shewokis, P. A., Onaral, B., Hasson, U., and Ayaz, H. (2017). Measuring speaker listener neural coupling with functional near infrared spectroscopy. Sci. Rep. 7 :43293. doi: 10.1038/srep43293 Lucca, K. and Wilbourn, M. P. (2016). Communicating to Learn: Infants' Pointing Gestures Result in Optimal Learning. Child Dev doi: 10.1111/cdev.12707 Lupyan, G. and Dale, R. (2010). Language Structure is Partly Determined by Social Structure. PLoS ONE 5 (1): e8559. doi: 10.1371/journal .pone.0008559 Mahy, C. E. V., Moses, L. J., and Pfeifer, J. H. (2 014). How and where: Theory of mind in the brain. Dev. Cog. Neurosci ., 9 68 81. doi: 10.1016/j.dcn.2014.01.002 Malle, B. F. (2002). The relation between language and theory of mind in development and evolution. In T. GivÂ—n and B. F. Malle (Eds.) The evo lution of language out of pre language (pp.265 284). Amsterdam, The Netherlands: Benajmins. Maranesi, M., Bonini, L., and Fogassi, L. (2014). Cortical processing of objects affordances for self and others' action. Front. Psychol. 5 :538. doi: 10.3389/fpsyg.2014.00538 Marno, H., Guellai, B., Vidal, Y., Franzoi, J., Nespor, M., and Mehler, J. (2016). Infants Selectively Pay Attention to the Information They Receive from a Native Speaker of Their Language. Front. Psychol ., 7 (1150). doi: 10.3389/fpsyg.2016.01150 Matthews, D., Behne, T., Lieven, E., and Tomasello, M. (2012). Origins of the human pointing gesture: a training study. Dev. Sci ., 15 (6), 817 829. doi: 10.1111/j.1467 7687.2012.01181.x Mauch, M., MacCallum, R. M., Levy, M., and Le roi, A. M. (2015). The evolution of popular music: USA 1960 2010. R. Soc. open sci ., 2 : 150081. doi: 10.1098/rsos.150081 Maurer, D. and Werker, J. F. (2013). Perceptual Narrowing During Infancy: A Comparison of Language and Faces. Dev. Psychobiol doi: 1 0.1002/dev.21177 McAuliffe, K., Blake, P. R., Steinbeis, N., and Warneken, F. (2017). The developmental foundations of human fairness. Nature Human Behavior 1 0042. doi: 10.1038/s41562 016 0042 McClung, J., Placi, S., Bangerter, A., ClÂŽment, F., and Bshary, R. (2017). The language of cooperation: shared intentionality drives variation in helping as a function of group membership. Proc. R. Soc. B, 284 : 20171682. doi: 10.1098/rspb.2017.1682 McLoon e, B. and Smead, R. (2014). The ontogeny and evolution of human collaboration. Biol. Philos ., 29 559 576.
47 doi: 10.1007/s10539 014 9435 1 Meltzoff, A. N. (2007). Like me': a foundation for social cognition. Dev. Sci. 10 (1), 126 124. doi: 10.1111/j.1467 7687.2007.00574.x Menenti, L., Pickering, M. J., and Garrod, S. C. (2012). Toward a neural basis of interactive alignment in conversation. Front. Hum. Neurosci ., 6 :185. doi: 10.3389/fnhum.2012.00185 Meyer, M., Bekkering, H., Paulus, M., and Hunnius, S. ( 2010). Joint action coordination in 2.5 and 3 year old children. Front. Hum. Neurosci ., 4 :220 doi: 10.3389/fnhum.2010.00220 Miller, C. A. (2006). Developmental Relationships Between Language and Theory of Mind. American Journal of Speech Language Pathology 15 142 154. Milligan, K., Astington, J. W., and Dack, L. A. (2007). Language and Theory of Mind: Meta Analysis of the Relation Between Language Ability and False belief Understanding. Child Dev ., 78 (2), 622 646. Moll, H. and Tomasello, (2008). Cooperation and human cognition: the Vygotskian intelligence hypothesis. Phil. Trans. R. Soc. B 362 639 648. doi: 10.1098/rstb.2006.2000 Moll, H. and Meltzoff, A. N. (2011). Joint Attention as the Fundamental Basis of Perspective Taking. In A. Seemann (Ed.), Joint attention: New developments in psychology, philosophy of mind, and social neuroscience (pp. 393 413). Cambridge, MA: MIT Press. Moll, H., Carpenter, M., and Tomasello, M. (2007). Fourteen month olds know what others experience only in joint engagement. Dev. Sci ., 10 (6), 826 835. doi: 10.1111/j.1467 7687.2007.00615.x Moran, R. J., Campo, P., Symmonds, M., Stephan, K. E., Dolan, R. J., and Friston, K. J. (2013). Free Energy, Precision, and Learning: The Role of Cholinergic Neuromodulati on. J. Neurosci. 33 (18), 8227 8236. doi: 10.1523/JNEUROSCI.4255 12.2013 Moran, R. J., Symmonds, M., Dolan, R. J., and Friston, K. J. (2014). The Brain Ages Optimally to Model Its Environment: Evidence from Sensory Learning over the Adult Lifespan. PLoS C omp. Biol ., 10 (1), e1003422. Movellan, J. R. and Watson, J. S. (1987). Perception of directional attention. In Infant behavior and development: abstracts of the 6 th international conference on infant studies Mundy, P. and Newell, L. (2007). Attention, Joint Attention, and Social Cognition. Curr. Dir. Psychol. Sci ., 16 (5), 269 274. Over, H. (2015). The origins of belonging: social motivation in infants and young children. Phil. Trans. R. Soc. B 371 : 20150072. doi: 10.1098/rstb.2015.0072 Over, H. and Carpenter, M. (2008). Priming third party ostracism increases affiliative imitation in children. Dev. Sci ., 12 (3), F1 F8. doi: 10.1111/j.1467 7687.2008.00820.x Pacherie, E. (2012). The Phenomenology of Joint Action: Self Agency vs. Joint Agency. In A. Seemann (Ed.) Joint Attention New Developments (pp. 343 389). Cambridge, MA: The MIT Press. Park, H. and Friston, K. (2013). Structural and Functional Brain Networ ks: From Connections to Cognition. Science 342 1238411. doi: 10.1126/science.1238411 Parr, T. and Friston, K. J. (2017). Working memory, attention, and salience in active inference. Scientific Reports 7 :14678. doi: 10.1038/s41598 017 15249 0 Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference San Francisco, CA: Morgan Kauffman. Perc, M. (2012). Evolution of the most common English words and phrases over the centuries. J. R. Soc. Interface doi: 10.1098/rsif.2012.0491 Perfors, A. and Navarro, D. J. (2014). Language Evolution Can Be Shaped by the Structure of the World. Cog. Sci ., 38 775 793. doi: 10.1111/cogs.12102 Perfors, A., Tenenbaum, J. B., Griffiths, T. L., and Xu, F. (2011 a ). A tutorial i ntroduction to Bayesian models of cognitive development. Cognition 120 302 321. doi: 10.1016/j.cognition.2010.11.015 Perfors, A., Tenenbaum, J. B., and Regier, T. (2011 b ). The learnability of abstract syntactic principles. Cognition 118 306 338. doi: j.cognition.2010.11.001 Perniss, P. and Vigliocco, G. (2014). The bridge of iconicity: from a world of experience to the experience of language. Phil. Trans. R. Soc. B 369 20130300. doi: 10.1098/rstb.2013.0300 Perszyk, D. R. and Waxman, S. R. (2016). L istening to the calls of the wild: The role of experience in linking language and cognition in young infants. Cognition 153 175 181. doi: 10.1016/j.cognition.2016.05.004 Pezzulo, G. (2011). Shared Representations as Coordination Tools for Interaction. R ev. Phil. Psych ., 2 303 333. doi: 10.1007/s13164 011 0060 5 Pezzulo, G. and Dindo, H. (2013). Intentional strategies that make co actors more predictable: The case of signaling. Behav. Brain Sci ., 36 (4). doi: 10.1017/S0140525X12002816
48 Pezzulo, G., Donnarumma F., and Dindo, H. (2013). Human Sensorimotor Communication: A Theory of Signaling in Online Social Interactions. PLoS One 8 (11):e79876. doi: 10.1371/journal.pone.0079876 Pezzulo, G., Rigoli, F., and Friston, K. (2015a). Active inference, home ostatic regulation and adaptive behavioral control. Progress in Neurobiology 134 17 35. doi: 10.1016/j.pneurobio.2015.09.001 Pezzulo, G. et al. (2015b). Acting Up: An Approach to the Study of Cognitive Development. In A. K. Engels et al. (Eds.) The Pra gmatic Turn: Toward Action Oriented Views in Cognitive Science (pp. 49 77). Cambridge, MA: The MIT Press. Pickering, M. J. and Clark, A. (2014). Getting ahead: forward models and their place in cognitive architecture. Trends in Cog. Sci. 18 (9), 451 456. doi: 10.1016/j.tics.2014.05.006 Pickering, M. J. and Garrod, S. (2014). Self other and joint monitoring using forward models. Front. Hum. Neurosci ., 8 (132). doi: 10.3389/fnhum.2014.00132 Powers III, A. R., Kelley, M., and Corlett, P. (2016). Hallucinations as Top Down Effects on Perception. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 1 393 400. doi: 10.1016/j.bpsc.2016.04.003 Rabinovich, M. I., Bick, C., and Varona, P. (2012). The Stability of Information Flows in the Brain. In M. I. Rabinovich et al. (Eds.) Principles of Brain Dynamics: Global State Interactions (pp. 141 164). Cambridge, MA: The MIT Press. Rabinovich, M. I., Simmons, A. N., and Varona, P. (2015). Dynamical bridge between brain and mind. Trends Cog. Sci ., 19 (8), 453 461. doi: 10.1016/j.tics.2015.06.005 Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., and Shulman, G. L. (2001). A d efault mode of brain function. Proc. Natl. Acad. Sci. USA 98 (2), 676 682. Rakoczy, H. and Schmidt, M. F. H. (2013). The Early Ontogeny of Social Norms. Child Dev. Perspect ., 7 (1), 17 21. doi: 10.1111/cdep.12010 Ramstead M. J. D., VeissiÂŽre, S. P. L., a nd Kirmayer, L. J. (2016). Cultural Affordances: Scaffolding Local Worlds Through Shared Intentionality and Regimes of Attention. Front. Psychol. 7 :1090. doi: 10.3389/fpsyg.2016.01090 Ramstead, M. J., Badcock, P. B., and Friston, K. J. (2017). Answering SchrÂšdinger's question: A free energy formulation. Phys. Life Rev. doi: 10.1016/j.plrev.2017.09.001 Ramstead, M. J. D., Badcock, P. B., and Friston, K. J. (2018). Variational Neuroethol ogy: Answering Further Questions: Reply to comments on "Answering SchrÂšdinger's question: A fee energy formulation." Physics of Life Reviews doi: 10.1016/j.plrev.2018.01.003 Ramstead, M. J. D., Constant, A., Badcock, P. B., and Friston, K. J. (under revi ew). Variational Ecology and the Physics of Sentient Systems. Phys. Life. Rev. Special Issue Rand, D G. and Nowak, M. A. (2013). Human cooperation. Trends in Cognitive Sciences 17 (8), 413 25. doi: 10.1016/j.tics/2013.06.003 Reali, F., Chater, N., and Christiansen, M. H. (2018). Simpler grammar, larger vocabulary: How population size affects language. Proc. R. Soc. B 285 : 20172586. doi: 10.1098/rspb.2017.2586 Reali, F. and Griffiths, T. L. (2009). The evolution of frequency distributions: Relating reg ularization to inductive biases through iterated learning. Cognition 111 317 238. doi: 10.1016/j.cognition.2009.02.012 Reddy, V. (2003). On being the object of attention: implications for self other consciousness. Trends Cogn. Sci ., 7 (9), 397 402. doi: 10.1016/S1364 6613(03)00191 8 Renzi, D. T., Romberg, A. R., Bolger, D. J., and Newman, R. S. (2017). Two Minds Are Better Than One: Cooperative Communication as a New Framework for Understanding Infant Language Learning. Translational Issues in Psycholog ical Science 3 (1), 19 53. doi: 10.1037/tps0000088 Rietveld, E. and Kiverstein, J. (2014). A Rich Landscape of Affordances. Ecological Psychology 26 323 52. doi: 10.1080/10407413.2014.958035 Richerson, P. J. and Boyd, R. (2005). Not by Genes Alone: How Culture Transformed Human Evolution Chicago, Il: The University of Chicago Press. Richter, N., Tiddeman, B., and Haun, D. B. M. (2016). Social Preferences in Preschoolers: Effects of Morphological Self Similarity. PLoS ONE 11 (1): e0145443. doi: 10.1371 /journal.pone.0145443 Riley, M. A., Richardson, M. J., Shockley, K., and Ramenzoni, V. C. (2011). Interpersonal synergies. Front. Psych. 2 :38. doi: 10.3389/fpsyg.2011.00038 Roepstorff A. (2013). Interactively human: Sharing time, constructing materialit y. Behavioral and Brain Sciences 36 :3, 224 5. doi: 10.1017/S0140525X12002427 Rosenberg, R. D. and Feigenson, L. (2013). Infants hierarchically organize memory representations. Dev. Sci., 16 (4), 610 621. doi: 10.1111/desc.12055 Sanders, J. B. T., Farmer, J. D., and Galla, T. (2018). The prevalence of chaotic dynamics in games with many
49 players Sci. Rep ., 8 (4902). doi: 10.1038/s41598 018 22013 5 Santos, F. C., Pacheco, J. M., and Skyrms, B. (2011). Co evolution of pre play signaling and cooperation. J. Theor. Biol., 274 30 35. doi: 10.1016/j.jtbi.2011.01.004 Schillbach, L., Timmermans, B., Reddy, V., Costall, A., Bente, G., Schlicht, T., and Vogeley, K. (2013). Toward a second person neuroscience. Behav. Brain Sci ., 36 (4), 393 414. doi: 10.1017/S01405 25X12000660 SchmÂŠlzle, R., HÂŠcker, F. E. K., Honey, C. J., and Hasson, U. (2015). Engaged listeners: shared neural processing of powerful politics speeches. Soc. Cog. Affect. Neurosci doi: 10.1093/scan/nsul68 SchmÂŠlzle, R., O'Donnell, M. B., Garcia, J. O ., Cascio, C. N., Bayer, J., Bassett, D. S., Vettel, J. M., and Falk, E. B. (2017). Brain connectivity dynamics during social interaction reflect social network structure. Proc. Natl. Acad. Sci. USA. doi: 10.1073/pnas.1616130114 Schmidt, M. F. H., Rakoczy, H., and Tomasello, M. (2011). Young children attribute normativity to novel actions without pedagogy or normative language. Dev. Sci ., 14 (3), 530 539. doi: 10.1111/j.1467 7687.2010.01000.x Schwartenbeck, P., FitzGerald, T., Dol an, R. J., and Friston, K. (2013). Exploration, novelty, surprise, and free energy minimization. Front. Psychol. 4 :710. doi: 10.3389/fpsyg.2013.00710 Scott Phillips, T. C. and Kirby, S. (2010). Language evolution in the laboratory. Trends Cog. Sci ., 14 411 417. doi: 10.1016/j.tics.2010.06.006 Searle, J. R. (1995). The Construction of Social Reality New York, NY: Free Press. Searle, J. R. (2010). Making the Social World: The Structure of Human Civilization Oxford, UK: Oxford University Press. Sebanz, N., Bekkering, H., and Knoblich, G. (2006). Joint action: bodies and minds moving together. Trends Cog. Sci. 10 (2), 70 76. doi: 10.1016/j.2005.12.009 Sebanz, N. and Knoblich, G. (2009). Prediction in Joint Action: What, When, and Where. Topics in Cognitive Science 1 353 367. doi: 10.1111/j.1756 8765.2009.01024.x Sengupta, B. and Friston, K. (2017). Sentient Self Organization: Minimal dynamics and circular causality. arXiv:1705.08265v1 Sengupta, B., Stemmler, M. B., and Friston, K. J. (2013). Information and Efficiency in the Nervous System A Synthesis. PLoS Comput. Biol ., 9 (7), e1003157. doi: 10.1371/pcbi.1003157 Sengupta, B., Tozzi, A., Cooray, G. K., Douglas, P. K., and Friston, K. J. (2016). Towards a Neuronal Gauge Theory. PLoS Biol ., 14 (3), e1002400. doi: 10.1371/journal.pbio.1002400 Seoane, L. F. and SolÂŽ, R. (2018). The morphospace of language networks. arXiv: 1803.01934v1 Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends Cog. Sci ., 17 (11), 56 5 573. doi: 10.1016/j.tics .2013.09.007 Shuai, L. and Gong, T. (2014). Language as an emergent group level trait. Behav. Brain Sci. 37 :3, 274 5. doi: 10.1017/S0140525X13003026 Siegler, R. S. and Crowley, K. (1991). The microgenetic method. A direct mean s for studying cognitive development. Am. Psychol ., 46 (6), 606 620. Skyrms, B. (2001). The Stag Hunt. Proceedings and Addresses of the American Philosophical Association 75 31 41. Smaldino, P. E. (2014). The cultural evolution of emergent group level traits. Behav. Brain Sci., 37 243 95. doi: 10.1017/ S0140525X13001544 Smith, K. and Wonnacott, E. (2010). Eliminating unpredictable variation through iterated learning. Cognition 116 444 449. doi: 10.1016/cognition.2010.06.004 Smith, L. B. and Thelen, E. (2003). Development as a dynamic system. Trends in Cog. Sci. 7 (8), 343 8. doi: 10.1016/S1364 6613(03)00156 6 Smith, L. B., Jayaraman, S., Clerkin, E., and Yu, C. (2018). The Developing Infant Creates a Curriculum for Statistical Learning. Trends Cog. Sci ., 22 (4), 325 336. doi: 10.1016/j.tics.2018.02.004 Southgate, V., van Maanen, C., and Csibra, G. (2007). Infant Pointing: Communication to Cooperate or Communication to Learn? Child Dev. 78 (3) 735 740. Sperber, D. and Wilson, D. (1986). Relevance: Communication and Cognition Oxford, UK: Blackwell. Sperber, D. and Mercier H. (2012). Reasoning as a Social Competence. In H. Landemore, J. Elster (Eds.) Collective Wisdom: Princi ples and Mechanisms (pp. 368 92). Oxford, UK: Oxford University Press. Sperber, D., ClÂŽment, F., Heintz, C., Mascaro, O., Mercier, H., Origgi, G., and Wilson, D. (2010). Epistemic Vigilance. Mind & Language 25 (4), 359 393. Stanley, J. (2015). How Propaga nda Works Princeton, NJ: Princeton University Press. Stephens, G. J., Silbert, L. J., and Hasson, U. (2010). Speaker listener neural coupling underlies successful
50 communication. Proc. Natl. Acad. Sci. USA 107 (32), 14425 14430. doi: 10.1073/pnas.1008662107 SzathmÂ‡ry, E. (2015 ). Towards major evolutionary transitions theory 2.0. Proc. Natl. Acad. Sci. USA 112 (33), 10104 10111. doi: 10.1073/pnas.1421398112 Szufnarowska, J., Rohlfing, K. J., Fawcett, C., and GredebÂŠck, G. (2014). Is ost ension any more than attention? Sci. Rep ., 4 :5304. doi: 10.1038/srep.05304 Tajima,. S et al. (2017). Task dependent recurrent dynamics in visual cortex. eLife, 6 :e26868. doi: 10.7554/elife.26868 Tamariz, M. and Kirby, S. (2016). The cultural evolution of language. Curr. Op. Psychol. 8 37 43. doi: 10.1016/j.copsyc.2015.09.003 Tamariz, M., Ellison, T. M., Barr, D. J., and Fay, N. (2014). Cultural selection drives the evolution of human communication systems. Phil. Trans. R. Soc. B 281 : 20140488. doi: 10.1098/rspb.2014.0488 Tenenbaum, J. B., Kemp, C., Griffiths, T. L., and Goodman, N. D. (2011). How to Grow a Mind: Statistics, Structure, and Abstraction. Science, 331 1279 1285. doi: 10.1126/science.1192788 Tennie, C., Call, J., and Tomasello, M. (2009 ). Ratcheting up the ratchet: on the evolution of cumulative culture. Phil. Trans. R. Soc. B 364 2405 2415. doi: 10.1098/rstb.2009.0052 Thaker, P., Tenenbaum, J. B., and Gershman, S. J. (2017). Online learning of symbolic concepts. J. Math. Psychol ., 7 7 10 20. doi: 10.1016/j.jmp.2017.01.002 Tinbergen, N. (1963). On aims and methods of ethology. Zeitschrift fÂŸr Tierpsychologie 20 410 433. Tognoli, E. and Kelso, J. A. S. (2014). The Metastable Brain. Neuron 81 35 48. doi: 10.1016/j.neuron.2013.12.022 Tomasello, M. (2003). Constructing a Language: A Usage Based Theory of Language Acquisition Cambridge, MA: Harvard University Press. Tomasello M. (2008). The Origins of Human Communication Cambridge, MA: The MIT Press. Tomasello, M. (2009). Why We Coo perate Cambridge, MA: The MIT Press. Tomasello, M. (2014a). The ultra social animal. Eur. J. Soc. Psychol ., 44 187 194. doi: 10.1002/ejsp.2015 Tomasello, M. (2014b). A Natural History of Human Thinking Cambridge, MA: Harvard University Press Tomasello, M. and Carpenter, C. (2007). Shared intentionality. Dev. Sci. 10 (1), 121 5. doi: 10.1111/j.1467 7687.2007.00573.x Tomasello, M. (2016). Cultural Learning Redux. Child Dev. 87 (3), 643 53. doi: 10.1111/cdev.12499 Tomasello and Hamann (2012). Collaboration in young children. Q. J. Exp. Psychol ., 65 (1), 1 12. doi: 10.1080/17470218.2011.608853 Tomasello, M., Carpenter, M., Call, J., Behne, T., and Moll, H. M. (2005). Understanding and sharing intentions: The origin s of cultural cognition. Behav. Brain Sci. 28 675 735. Tomasello, M., Carpenter, M., and Liszkowski U. (2007a). A New Look at Infant Pointing. Child Dev. 78 (3), 705 22. Tomasello, M., Hare, B., Lehmann, H., and Call, J. (2007b). Reliance on head versu s eyes in the gaze following of great apes and human infants: The cooperative eye hypothesis. Journal of Human Evolution 52, 314 320. Tomasello, M., Melis, A. P., Tennie, C., Wyman, E., and Herrmann, E. (2012). Two Key Steps in the Evolution of Human Co operation: The Interdependence Hypothesis. Curr. Anthro. 53 (6), 673 92. doi: 10.1086/668207 Tschacher, W. and Haken, H. (2007). Intentionality in non equilibrium systems? The functional aspects of self organized pattern formation. New Ideas in Psychology 25 1 15. doi: 10.1016/j.newideapsych.2006.09.002 Tuomela, R. (2013). Who Is Afraid of Group Agents and Group Minds? In M. Schmitz et al. (Eds.) The Background of Social Reality. Studies in the Philosophy of Sociality, vol 1 (pp. 13 35). Springer, Dordrecht: Germany. Turchin, P. (2008). Arise cliodynamics'. Nature 454, 34 35. TylÂŽn, K., Weed, E., Wallentin, M., Roepstorff, A., and Frith, C. D. (2010). Language as a Tool for Interacting Minds. Mind & Language 25 (1), 3 29. van den Heuvel, M. P. an d Sporns, O. (2013). Network hubs in the human brain. Trends Cog. Sci ., 17 (12), 683 696. doi: 10.1016/j.tics.2013.09.012 van Schaik, C. and Michel, K. (2016) The Good Book of Human Nature: An Evolutionary Reading of the Bible New York, NY: Basic Books. VeissiÂre, S. P. L. (2017). Cultural Markov Blankets? Mind the Other Minds Gap! Comment on "Answering SchrÂšdinger's question: A free energy formulation." Physics of Life Reviews doi: 10.1016/j.plrev.2017.11.001 Vesper, C., Butterfill, S., Knoblich, G., a nd Sebanz, N. (2010). A minimal architecture for joint action. Neural Networks 23 998 1003. doi: 10.1016/j.neunet.2010.06.002 Vesper, C. et al. (2017). Joint Action: Mental Representations, Shared Information and General Mechanisms for Coordinating with Others. Front. Psychol. 7 :2039. doi: 10.3389/fpsyg.2016.02039
51 Vouloumanos, A. and Curtin, S. (2013). Foundational Tuning: How Infants' Attention to Speech Predicts Language Development. Cog. Sci ., 38 1675 1686. doi: 10.1111/cogs.12128 Vygotsky, L. S. (1978). Mind in Society Cambridge, Mass: Harvard University Press. Warneken, F. (2016). Insights into the biological foundation of human altruistic sentiments. Current Opinion in Psychology 7 51 56. doi: 10.1016/j.copsyc.2015.07.013 Wat son, R. E. and Legare, C. H. (2016). The Social Functions of Group Rituals. Curr. Dir. Psychol. Sci ., 25 (1), 42 46. doi: 10.1177/0963721415618486 Watson, R. A. and SzathmÂ‡ry, E. (2016). How Can Evolution Learn? Trends Ecol. and Evol. 31 (2), 147 157. doi: 10.1016/j.tree.2015.11.009 Wellman, H. M., Lane, J. D., LaBounty, J., and Olson, S. L. (2011). Observant, nonagressive temperament predicts theory of mind development. Dev. Sci ., 14 (2), 319 326. doi: 10.1111/j.1467 7687.2010.00977.x Wenger, E., Brozzol i, C., Lindenberger, U., and LÂšvden, M. (2017). Expansion and Renormalization of Human Brain Structure During Skill Acquisition. Trends Cog. Sci ., 21 (12), 930 939. doi: 10.1016/j.tics.2017.09.008 Whiten, A. (2011). The scope of culture in chimpanzees, humans and ancestral apes. Phil. Trans. R. Soc. B 366 997 1007. doi: 10.1098/rstb.2010.0334 Winch, P. (1964). Understanding a Primitive Society. American Philosophical Quarterly 1 (4), 307 324. Wit herington, D. C. (2007). The Dynamic Systems Approach as Metatheory for Developmental Psychology. Hum. Dev ., 50 127 153. doi: 10.1159/000100943 Wu, Z. and Gros Louis, J. (2014). Caregivers provide more labeling responses to infants' pointing than to inf ants' object directed vocalizations. J. Child Lang ., 42 538 561. doi: 10.1017/S0305000914000221 Wyman, E., Rakoczy, H., and Tomasello, M. (2012). Non verbal communication enables children's coordination in a "Stag Hunt" game. European Journal of Develop mental Psychology 10 (5), 597 610. doi: 10.1080/17405629.2012.726469 Yildiz, I. B., von Kriegstein, K., and Kiebel, S. (2013). From Birdsong to Human Speech Recognition: Bayesian Inference on a Hierarchy of Nonlinear Dynamical Systems. PLoS Comput. Biol ., 9 (9): e1003219. doi: 10.1371/journal.pcbi.1003219 Youn, H., Sutton, L., Smith, E., Moore, C., Wilkins, J. F., Maddieson, I., Croft, W., and Bhattacharya, T. (2016). On the universal structure of human lexical semantics. Proc. Natl. Acad. Sci. USA ., 11 3 (7), 1766 1771. doi: 10.1073/pnas.1520752113 Yu, A. J. and Dayan, P. (2005). Uncertainty, Neuromodulation, and Attention. Neuron 46 681 692. doi: 10.1016/j.neuron.2005.04.026 Zadbood, A., Chen, J., Leong, Y. C., Norman, K. A., and Hasson, U. (2017). How We Transmit Memories to Other Brains: Constructing Shared Neural Representations. Cerebral Cortex doi: 10.1093/cercor/bhx202 Zarate, M., Tian, X., Woods, K. J. P., and Poeppel D. (2015). Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci. Rep ., 5 :11475. doi: 10.1038/srep11475