Not necessarily (but the pure state case is often what we study), and not exactly (but kind of).
WITH REGARD TO THE FIRST QUESTION:
I don’t think it is necessary to restrict the concept of entanglement to pure states. One could do so, but it would be odd to describe a state with a 50% chance of being in an entangled pure state as “not entangled”.
I would therefore say that a composition of two systems can be said to be in an entangled state whenever that state is represented by a density operator that includes at least one term which is not a pure tensor product of state vectors for the corresponding component systems.
WITH REGARD TO THE SECOND QUESTION:
In order to make sense of this question we need so decide what is meant by the state of a subsystem of a composite system. The standard approach is to use the “relative state” which is implemented mathematically by a “partial trace” which gives “marginal probability” distributions of observed quantities. And if one does that, then yes, the relative state of a subsystem may indeed be mixed state even for a pure state of the combined system.
Before going on I should point out that it is important to understand both the distinction between a statistical mixture and a linear superposition, and the fact that for a composite system the states that are “pure tensors” are just a subset of all the pure states.
First, with regard to mixtures vs superpositions:
For any system, mixed states (which are classical statistical mixtures of pure states) can be represented by so-called density matrices which are operators of the form [math]\rho=\Sigma p_{i}|\psi_{i}\rangle\langle \psi_{i}|[/math] where [math]p_{i}[/math] is the probability of being in pure state [math]\psi_{i}[/math] and the operator [math]|\psi_{i}\rangle\langle \psi_{i}|[/math] is just the projector onto the subspace spanned by the state vector [math]|\psi_{i}\rangle[/math]
The expectation of observable [math]O[/math] in state [math]\rho[/math] is then given by the trace [math]\langle O\rangle=Tr(\rho O)=\Sigma p_{i}\langle\psi_{i}|O \psi_{i}\rangle[/math], which is just the overall expectation from a process which gives the expected value for state [math]|\psi_{i}\rangle[/math] with probability [math]p_{i}[/math] . The case of a pure state [math]\psi[/math] corresponding to a single vector [math]|\psi\rangle[/math] can also be represented by a density matrix in which the sum has only one term and so the density matrix is a one dimensional projector (ie of rank one), and the trace formula gives [math]\langle O\rangle=Tr(|\psi\rangle\langle \psi| O)=\langle \psi| O\psi\rangle[/math] which is just the usual form for the expectation value.
Another quite different way of combining states is by way of linear combination of state vectors to create what is called a superposition. This is different from the classical mixture because the expected average value of an observable [math]O[/math] in the superposition state [math]|\psi\rangle=c_{1}|\psi_{1}\rangle+c_{2}|\psi_{2}\rangle[/math] is given by [math]|c_{1}|^2|\langle\psi_{1}|O\psi_{1}\rangle+c_{1}^*c_{2}\langle\psi_{1}|O\psi_{2}\rangle+c_{2}^*c_{1}\langle\psi_{2}|O\psi_{1}\rangle+|c_{2}|^2|\langle\psi_{2}|O\psi_{2}\rangle[/math] and the cross terms represent the fact that, if [math]|\psi_{i}\rangle[/math] are not both eigenstates of [math]O[/math], then observation of [math]O[/math] has a mixing effect which produces interference between them.
Second, with regard to composite systems:
If the systems A and B have pure state vectors of the form [math]|a_{\alpha}\rangle[/math] and [math]|b_{\beta}\rangle[/math] in Hilbert spaces [math]\mathcal{H}_{A}[/math] and [math]\mathcal{H}_{B}[/math], then any (normalized) linear combination of pure tensors corresponds to a pure state of the combined system.
The special feature of pure tensors is that they represent states in which the properties of the two subsystems are statistically independent, sometimes called separable states. But in most states (which are formed by taking linear combinations of pure tensors) the properties are correlated (and we say that in such other states, which are not represented by pure tensors, the two systems are “entangled”).
In this setting any pure tensor corresponding to a state vector of the form [math]|a_{\alpha}\rangle\otimes|b_{\beta}\rangle[/math] has a rank one density matrix [math]\rho=(|a_{\alpha}\rangle\otimes|b_{\beta}\rangle)(\langle a_{\alpha}|\otimes\langle b_{\beta}|)=|a_{\alpha}b_{\beta}\rangle\langle a_{\alpha}b_{\beta}|[/math] and is not entangled.
But neither is any classical statistical mixture of such states with density matrix [math]\rho=\Sigma p_{i}|\psi_{i}\rangle\langle \psi_{i}|[/math] with each contributing [math]\psi_{i}[/math] being a pure tensor of the form [math]\psi_{i}=|a_{i}\rangle\otimes|b_{i}\rangle[/math].
On the other hand an entangled state might be pure state represented by a vector of the form [math]c_{1}|a_{1}\rangle\otimes|b_{1}\rangle+c_{2}|a_{2}\rangle\otimes|b_{2}\rangle[/math] (which is a linear superposition of pure tensors ), but it might also be a classical mixture in which one or more of the possible [math]\psi_{i}[/math] is of that form.
In order to make sense of the second question we need so decide what is meant by the state of a subsystem of a composite system.
For an unentangled (pure tensor product) pure state it is natural to take the corresponding factor in the tensor product. For an entangled state it is less obvious what to do, but it makes sense to think of the relative state of system A as giving any observable [math]O_{A}[/math] the expectation value that results from observing the combined state but ignoring the state of system B. This amounts to observing the identity operator in system B so the corresponding observable on the combined system would correspond to the operator [math]O_{A}\otimes I_{B}[/math]
For the pure entangled state [math]\psi = c_{1}|a_{1}\rangle\otimes|b_{1}\rangle+c_{2}|a_{2}\rangle\otimes|b_{2}\rangle[/math], if [math]\langle b_{1}|b_{2}\rangle = 0[/math], this gives the expectation [math]\begin{align}&\langle\psi|(O_{A}\otimes I_{B})\psi\rangle \\&= |c_{1}|^2\langle a_{1}|O_{A}a_{1}\rangle\langle b_{1}|I_{B}b_{1}\rangle + c_{1}^*c_{2}\langle a_{1}|O_{A}a_{2}\rangle\langle b_{1}|I_{B}b_{2}\rangle \\&+ c_{2}^*c_{1}\langle a_{2}|O_{A}a_{1}\rangle\langle b_{2}|I_{B}b_{1}\rangle + |c_{2}|^2\langle a_{2}|O_{A}a_{2}\rangle\langle b_{2}|I_{B}b_{2}\rangle \\& = |c_{1}|^2\langle a_{1}|O_{A}a_{1}\rangle + |c_{2}|^2\langle a_{2}|O_{A}a_{2}\rangle\end{align}[/math]
which corresponds to the mixed state for system A with probability [math]|c_{i}|^2[/math] of being in state [math]a_{i}[/math] (and density matrix [math]\rho=|c_{1}|^2|a_{1}\rangle \langle a_{1}| + |c_{2}|^2|a_{2}\rangle \langle a_{2}|[/math]).
This procedure of mapping the density matrix [math]\rho=|\psi\rangle \langle \psi|[/math] which is an operator in [math]\mathcal{H}_{A}\bigotimes \mathcal{H}_{B}[/math] to an operator on just [math]\mathcal{H}_{A}[/math] is often referred to as taking the partial trace.
Source: (354) Is a system of two entangled particles a pure state, and are its subsystems in a mixed state? – Quora