Statistics partner project


By Sean H and Jimmy W

1.)  lot of things can influence the collection of data. Bias can influence data collection because the collected data can be taken from a specific age group or race etc… and not everyone in the world. The use of language can also affect it because they can word things creatively that will mislead the reader like 9/10 dentists recommend Colgate, that’s misleading because 9/10 dentists will recommend any toothpastes. Ethics can affect the data, because if they plan making an unethical ad, then they would collect misleading data that would help mislead the reader. Cost can affect data collection because it can decide how many data can be affected, for example if there is a low budget then less data will be collected as there won’t be enough money to cover the expenses of larger audience. While with a high budget there would be more data collected, as there would be a larger audience. Also they look if the benefits outweigh the overall cost of the survey. Time is another factor that influences data, because if the data was collected at an inappropriate time then it can affect the quality of the data, like if the audience was in a rush or was busy. Privacy can be a factor, as less people would answer truthfully if their information was not going to be kept private, like if the information was going to be posted on YouTube for the whole world to see. Finally, cultural sensitivity also affects the data collection, as it could offend a certain cultural in a negative way which could provoke them to respond to the survey on a more bias opinion.
A population is every member of a specific group, like if someone wants to ask the school if they prefer pencil or pen then then the population would be everyone inside the school including staff and students. A sample is a part of a population used to describe a whole group, like instead of asking the whole school they would ask only the students in the Math Department.

2.) Convenience sampling is when subjects are selected based on the convenience of the researcher, like the distance and accessibility. An example can be: 6/10 of my neighbors prefer chocolate over candy. The benefit from convenience sampling is low cost, fast and easy.
Random sampling is when subjects are selected completely by chance to represent a whole population. The benefits that come from this are that it eliminates any bias and is more accurate in representing a whole population. For example: to decide what is the average weight of students, 60 students in a school were selected randomly by a computer to take part of the research.
Stratified sampling is when you divide a population into a sub groups that are based on an attribute, like gender, age or ethnicity. For example, a teacher wants to conduct an experiment on math students to see if they understand the material taught, the population would be divided into sub groups based on grade level. All the results will represent the average understanding of each grade level which can then be put together to form an average of the school’s math department. The benefits from this are that it can be more accurate than random sampling, and it will usually require less subjects which will lower the cost.

3.) Systematic sampling is when the subject is chosen based on intervals like every ninth subject out of a population of 100 will be chosen for the research. The benefits out this is that it can be done manually and it is less bias. For example, an employer wants to conduct a research on his employees to see if they feel stressed in their working environment. The employer will have a list of employees and choose based on an interval.
Voluntary response sampling is when the subject volunteers to take part in the survey due to interest. The benefits are that it is very easy as the researcher does not need to select the subjects. For example, when a twitter account conducts a poll to their followers to decide which drink is better Coca-Cola or Pepsi.
Choosing an inappropriate sampling can bias data because it may not represent the whole population targeted. For example, if you only choose athletes for a research on: Which matters more sports or fine arts, the answer will be more biased as the researcher chose an inappropriate sample to represent an audience.

4.) What is the different between theoretical probability and experimental probability? Theoretical probability is what we expect to happen when we test the probability and experimental probability is the probability we get after we test the probability. For example. If We flip a coin 10 times, a theoretical probability would be it will land of heads 5 times and tails 5 times because that’s what we expect to happen.If the coin landed on heads 6 times and tails 4 times the experimental probability is it has a 10% more chance of landing on heads because that was the probability we got after we tested it.

5.) 3 misleading charts and why

This chart is very misleading because if you compare last year to last week just by looking at it it looks like the prices went up almost more than double, which isn’t true. This chart is also misleading because they didn’t start at $0 so that’s why it looks like it went up so much more than it really did.


This chart is extremely misleading because it looks like there are almost 4 times more people on welfare than people with a full time job. If you look at the numbers there aren’t that many more people on welfare than with a full time job. They want people to think more people are on welfare for some reason because of how they made the block for the people on welfare so much bigger when the actual numbers are.


Just by looking at the chart it looks like if the tax cuts expire they will go 4 times higher which is completely false if you actually look at the numbers. This chart is also misleading because it doesn’t start at 0% so just by looking at the blocks it looks like a huge rise in tax rates if the cuts expire.