Different Sampling Methods

The main types of sampling methods are:

1. Random sampling, which is just a random subset of a statistical population where everyone in the population has an equal chance to be chosen.

2. Stratified sampling is dividing the entire population into 3 subgroups, and randomly choosing an equal amount of people from each subgroup.

3. Systematic random sampling is a type of probability sampling method where sample members from a larger population are selected according to a random starting point and a fixed periodic interval.

Benefits of the sampling methods

1. Random sampling is easy to use and is an accurate representation of a large population.

2. Stratified sampling provides greater precision than a random sampling of the same size, and because of this stratified sampling requires a smaller sample, saving money.

3. Systematic sampling is simpler than random sampling and it assures that the population will be evenly sampled.

eg: if you want to find how America as a whole feels about their gun control laws, random sampling would be the way to go as it captures a large population. If you wanted to find opinions on undocumented immigrants, stratified sampling would be better because you will get more precision with opinions through different ethnicities. You could use systematic sampling for the first of these examples because of the way systematic sampling chooses subjects.

How inappropriate sampling may cause biased results

1. I’m coming back to this example, but if you chose to use systematic sampling for thoughts on undocumented immigrants, but limited yourself to a state like Texas, you are going to end up with an extremely biased result.

2. If you were to use stratified sampling for something like an opinion like control (again going back to the same example because I’m lazy) you wouldn’t really accomplish any more than if you used random sampling. People don’t need to know what certain groups of people think about an issue like this, people want to know what the population feels like as a whole. If you sort by putting subjects into subgroups, you’re painting those subgroups in a negative light depending on who sees your result and it may lead to bias against certain subgroups.

3 Examples of Misleading Statistics

Wilt Chamberlain’s 50.4 PPG season from 1961-1962

This stat line posted by Wilt Chamberlain in the 1961-1962 season is misleading because if you just looked averages of 50-25 you would think that this was the greatest statistical season of all time. Which it isn’t. The numbers on paper are simply staggering. It is difficult to comprehend anything like this, and impossible to imagine anyone in the NBA will ever even come close to such a ridiculous season statistically. But because we know that this season was in the 60s you will probably see some holes in this achievement. During the 1950s-60s nobody had ever seen someone of Wilt’s stature and strength, and no one had ever witnessed the way he seemed to simply obliterate anyone who was in his way. While most centers at the time were around 6’7”-6’9” Wilt was 7’1” and far stronger than anyone else. But back to statistics. The pace of games was much faster back then too. Teams in the NBA that season averaged 107.7 shots per game while teams today average 84.4 shots per game. If you adjust wilt’s stats to today’s possessions per game, you get 39-20, which is still unreal but far less impressive than 50-25. But wilt averaged 48.5 minutes per game meaning that he averaged MORE minutes than per GAME than there are in an actual game. James Harden, this year and last year’s season leader in minutes averaged only 38 per game compared to wilt’s 48.5. If we normalize his minutes to James Harden, he gets a 31-16. THESE stats look more comparable to a young Kareem. Which makes more sense, and makes look even less impressive. If Curry’s stats were adjusted to Wilt’s era he would have averaged a 41-9-11. Which is comparable in incredibility to Wilt’s original stat line. Thus, a misleading stat line from Wilt.

Female vs Male wages

Statistics say Women earn $0.77 to ever $1 than men earn, but women on average work 14% shorter hours than men.

Sanitizer kills 99% of germs

When tested in labs for real world situations they kill as little as 46%.

1. Bias in a question may influence the collection of data by encouraging one outcome or answer over others ones. For example: Asking horseback riders if they would support building a public park on the site of their stable. The bias in this question is that it is only asking horseback riders and includes “on the site of the stable.” A better way to ask the question would be: Where should a public park be built? Use of language may influence the collection of data as people may not fully understand what is being asked and thus not answer the questions properly. For example: Free samples of sunscreen are sent to every home in fall and winter. A mail reply card asks people of they would use the product again. The language in this is unclear as people who haven’t tested the product wouldn’t understand why they would use it again if they’ve never used it before. Ethics may influence the collection of data as the question might refer to inappropriate behavior. For example: Everybody who completes an online survey gets a free MP3 file of a song that the company has not bought the rights to use. The ethics in this question is wrong as it is not ethical to reward participants in a survey and downloading music from the internet without buying is also ethically wrong as the musicians are not being paid for it. Cost may influence the collection of data as the cost of the study may outweigh the benefits. For example: A company sending samples of summertime outdoor slippers to people during the wintertime and asking the to complete a mail reply card containing a survey. The cost of this study is extremely high as sending these products during the wintertime is expensive. Time and timing may influence the collection of data as the timing of the survey may not be good. For example: A survey conducted on the environmental health of the school takes place the same week that the school is under construction and renovation. This is not a good time survey the people who may have disturbed routines and frustrations with the temporary changes as this would reflect in a negative way on the survey. Privacy may affect the collection of data as the people might have the right to refuse to answer. For example: A teacher asks his/her students about their parents’ total income. This affects the privacy of the students’ parents and some students might not know what their parents’ total income is. Finally, cultural sensitivity may affect the collection of data as the question may offend people form different cultural groups. For example: A grocery store employee conducts a survey on people living within 10 km of the store and asks them what type of red meat they prefer. First of all, there is bias in the question. The bias is that the question assumes that everybody eats red meat. Furthermore, this is also a cultural sensitive question as red meat may be unacceptable to some people.

2. In a population, all of the individuals of a group are being studied. An example of population is all of the eligible voters in a federal election. In a sample, any group of individuals are selected from a population. An example of a sample would be 100 individuals selected from each province of territory in a federal election.

4. Theoretical probability is the probability that is calculated using the formula: number of favorable outcomes divided by the number of total numbers. Theoretical probability could be used if I were trying to find the number of times a coin will land on a black square in an 8 by 8 square checkerboard. Experimental probability is calculated when the actual situation or problem is performed as an experiment. Experimental probability can be used if you wanted to see the number of different outcomes there would be on a six-sided dice in 10 tosses.

By Alerik and Andrew