Home > Amazon Web Services > Weekly Record

Weekly Record

December 13, 2011 Leave a comment Go to comments

1. Ran into problem of “Task attempt failed to report status for 6003 seconds. Killing!”

Figure out that it is due to eliminating (not output anything) when have missing features. Some part of data could have hugh part as missing feature, which cause the map-reduce status not updating for a long time. Basically, the error means that the task stayed in map or reduce for more then allowed time but with no stdin/stdout.

Changed the code, but there are also other ways to solve this by increasing the timeout parameter. Here’s a link to this

Another way is to use the Reporter.

Reporter is a facility for MapReduce applications to report progress, set application-level status messages and update Counters.

Mapper and Reducer implementations can use the Reporter to report progress or just indicate that they are alive. In scenarios where the application takes a significant amount of time to process individual key/value pairs, this is crucial since the framework might assume that the task has timed-out and kill that task. Another way to avoid this is to set the configuration parameter mapred.task.timeout to a high-enough value (or even set it to zero for no time-outs).

A Java code to check the status:



if ((++count % 1000)==0) {
context.setStatus((count / len) +” done!”);

2. Boto connection error:

The requested instance type’s architecture (i386) does not match the architecture in the manifest for ami-c9c70da0 (x86_64)</Message></Error></Errors><RequestID>b71b1ee4-5a98-45e2-af1d-7da0db114afb

It is saying that my created image is 64bit, while the instance going to lunch is32bit. Tried a test on lunch the instance using a 32bit image. Went through well.

Finally, found out that it is because of instance_type is not correct.


3. Other collections on AWS

Possible to use the customized AMI for Elastic MapReduce on AW: Elastic MapReduce doesn’t support customer AMIs at this time. The service instead has a feature called “Bootstrap Actions” that allows you to pass a reference to a script stored in Amazon S3 and relatedarguments to Elastic MapReduce when creating a job flow. This script is executed on each job flowinstance before the actual job flow runs. This post describes how to create bootstrap actions:
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=3938&categoryID=265 (section “Creating a Job Flow with Bootstrap Actions”)

Processing images is one of the typical Elastic MapReduce use cases.

Use S3 or hdfs:

I would like to be able to access and use HDFS directly instead of having to worry about using the S3 bucket for initial or intermediate IO. I am worried about the IO performance of the S3 bucket against the HDFS performance when accessing the S3 bucket.I have seen multiple posts that say it doesn’t matter and others that say it can matter.

HDFS vs S3 provide different benefits; HDFS has lower latency but S3 has higher durability. In terms of long term storage (without compute) S3 is the cheaper option.

Would people recommend using EMR or EC2 with a Hadoop 0.20 image for doing something like this?

EMR is highly tuned to offer the best performance possible with S3.

Does the EMR setup support using the HDFS like this with custom JARs?

Definitely. Intermediate data is stored in HDFS unless you configure things otherwise. You are able to choose whether to use HDFS or S3 for your initial data.

Common Problems Running Job flows. <https://forums.aws.amazon.com/thread.jspa?threadID=30925>
Using s3:// instead of s3n://

The Amazon Elastic Mapreduce Instances run on s pre-defied AMI. To use the customized Intance for MapReduce, an way is to run a boostrap action

[2]< http://aws.typepad.com/aws/2010/04/new-elastic-mapreduce-feature-bootstrap-actions.html>

[3]< http://atbrox.com/2010/10/01/programmatic-deployment-to-elastic-mapreduce-with-boto-and-bootstrap-action/>

[4]< https://github.com/atbrox/atbroxexamples>

Q: What is Amazon Elastic MapReduce Bootstrap Actions?

Bootstrap Actions is a feature in Amazon Elastic MapReduce that provides users a way to run custom set-up prior to the execution of their job flow. Bootstrap Actions can be used to install software or configure instances before running your job flow.

Q: How can I use Bootstrap Actions?

You can write a Bootstrap Action script in any language already installed on the job flow instance including Bash,Perl, Python, Ruby, C++, or Java. There are several pre-defined Bootstrap Actions available. Once the script is written, you need to upload it to Amazon S3 and reference its location when you start a job flow. Please refer to the “Developer’s Guide”: http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/ for details on how to use Bootstrap Actions.

Q: How do I configure Hadoop settings for my job flow?

The Elastic MapReduce default Hadoop configuration is appropriate for most workloads. However, based on your job flow’s specific memory and processing requirements, it may be appropriate to tune these settings. For example, if your job flow tasks are memory-intensive, you may choose to use fewer tasks per core and reduce your job tracker heap size. For this situation, a pre-defined Bootstrap Action is available to configure your job flow on startup. See the Configure Memory Intensive Bootstrap Action in the Developer’s Guide for configuration details and usage instructions. An additional predefined bootstrap action is available that allows you to customize your cluster settings to any value of your choice. See the Configure Hadoop Bootstrap Action in the Developer’s Guide for usage instructions.



Using AmazonEC2 create the image with hadoop <http://wiki.apache.org/hadoop/AmazonEC2>

Categories: Amazon Web Services
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: