In this post, i will show you how to work with MongoDB Cloud. The good point is that MongoDB offers a free version that you can test with a limitation of 512 mega bytes which is a good starting point.
Organization can help to wrapp your projects and databases for instance in regions, departements or any idea you could have to separate your information. Click on the green button on the top left “Create New Organization”. Provide a name to your new organization.
Select your new organization and create new Project. Click on the green button on the top left “New Project”. Provide a name to your new project.
Select your new project and create a new Database. Click on the green button on the top left “+ Create”.
During the creation process, keep the cluster tier configuration to 512MB Storage and change de cluster name. The cluster will be composed of 3 loadbalanced mongodb servers.
Wait until the cluster creation process is completed. You will end up with cluster up and running.
If you click on the button “Connect” and choose “Connect your application”, this will show some snippet code example for many different programming language such as Java, Python, GO, Perl, C, C++, C#, …
In my demo i will choose Python and a future post will be created with a git repo containing python code example to connect to mongodb, create a collection and apply CRUD operations to json documents.
Now if you select the cluster by clicking on it, you will arrive to the cluster overview page.
Select the “Metrics” tab and you will get an overview of the operation counters for the 3 servers.
Select the “Collections” tab to browse data documents.
Select to “Search” tab to run DSL queries.
The “Profiler” tab, “Performance Advisor” tab, “Online archive” tab are only available with a paid plan so let’s skip those three options.
The last tab “Cmd line tools” will help you to see all the options to run command lines:
Connect instructions: will help you to connect from shell or programming language or with MongoDB Compass (GUI to browse and manage your Databases).
Atlas CLI: tool to install to manage DB by command line from brew for MacOS, yum or apt or tar.gz or deb or rpm for Linux or msi or zip for Windows.
MongoDB Database tools: is a suite of useful command lines to install on MacOs, Linux or Windows.
Mongorestore (binary exec file): is a tool to restore data comming from a dump (of Mongodump).
Mongodump (binary exec file): is a tool to create binary dump from a database.
Mongoimport (binary exec file): is a tool to import data from a csv, json, tsv file.
Mongoexport (binary exec file): is a tool to export data to a csv, json file.
Mongostat (binary exec file): is a tool to get metadata of a mongodb service.
Mongotop (binary exec file): is a tool providing time run for read write transaction of mongodb service.
In the DATA SERVICES section, you will see two different options:
Triggers: allow you to run events based on crud operation that occured in a collection but you can also use the schedulded trigger to cron the code to execute.
Data API: allow you to operate actions on one of multiple collections.
In the SECURITY section, you will see three different options:
Database access: this option is very important. This is where you will manage your IAM. You can create users with the option user/password access, of user/certificate access or link your Amazon WS IAM to manage your users. Each user can be configured with some basic roles attribution.
Network access: this option allows you to create an access list to filter the IP that will be allowed or denied to access the DB.
Advanced: this option allows you enable LDAP feature, data encryption and audit feature.
One last point i would like to mention is that if you would like to get a free to access any mongodb, i suggest you to download this free tool “MongoDB Compass”, you find it here ==> https://www.mongodb.com/products/compass
This concludes this small topic about Cloud Atlas.
What is an ingest pipeline: it is a watcher analyzing data entering to an index and beforing being save can be transformed.
Possible actions: The transformation options available are: remove field, add field, enrich value of a field, convert field type.
The ingest pipeline option is located into the Stack Management section.
Use case:
If you have logstash between an agent or a software feeding data to elastic, you may use filter and/or grok system to do the same actions than an ingest pipeline.
But if you have agents or softwares feeding data directly to elastic and would like to manipulate data before being indexed you can use the ingest pipeline to do transformation.
It is also a good use case when you are now allowed to change the agent or the software that feed the data.
How to use it:
In the image above, you see the home page of the ingest pipeline menu.
Clic on the blue button “Create pipeline”. Choose “New pipeline“.
Give a relevant name to your new pipeline and a small description.
Clic on the button add processor. You can add many processor in the same pipeline.
In my example i convert the type of a field from integer to string.
I will use the json field response.
Next click on the button “Add”.
In front of the text “Test Pipeline:” Click on the link “Add documents”.
Insert a json sample you would like to test and run the test with the button “Run the pipeline”.
See the result if the transformation worked.
When your pipeline is complete, it is possible to save its configuration as an HTTP PUT request which will allow you to deploy it on other ELK environment or clusters.
Here is the json sample i used, see the field in red below:
To create an alert from the dev tool, we are going to send to the Watch API of elastic an HTTP PUT operation
In this exampl the alert is configured with a cron, targets all the logstash indexes, search for the 404 reponse in the json body field during a certain time range, if the condition matches, an email is sent.
The alerting can really help to monitor message passing through the log, but there are some limitations.
To be able to use some connectors, the minimum subscription is to have the GOLD subscription.
With the free and basic subscription, the only connector available are Log server (Write your message alert into a log file), Index (Create an index with your message alert into it).
So without a GOLD subscription, i suggest to not focus a lot on alerting seen that the only connector types will need another monitoring system to be notified.
If you check the dashboard of the sample called “[Logs] Total Requests and Bytes” and the data, there is a link between a the worldmap and this part of data
“agent”: “Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24”,
In This post i will provide some topics, example and code to play with MongoDB.
Before going deeper in detail, i suggest you to read this post explaining the difference between RDBMS and Nosql Big Data but also the comparison of technical terms to give you a clear view. See the post
In this post, i will explain you the basics of logstash. This tool is a powerfull gateway that can apply transformation during the process of a message. It can listen to a port and wait for a message or connect to a service to extract the data as an ETL.
I will show you how to create a small logstash port listener and forward the data to elasticsearch.
Either download the .deb file or rpm file for an easy and quick install or the compressed file for Windows, Linux or MacOS.
The folder contains at the root folder the binary file “logstash” or “logstash.exe” and a folder conf containing the “pipeline.yml” and “logstash.yml” configuration.
Create a logstash config file into the conf folder and name the file example.conf
With the configuration example, logstash will listen for the same service at the same on the HTTP port 5891 and on the beats protocol 5947 and will forward the data to Elastic on the url http://localhost:9200. Logstash will create every day an index with this name convention tomcat-local-yyyy-mm-dd
#Logstash configuration file
#Log messages can be received using http on port 5891
# or
#Log messages can be received using beats on port 5947
input {
http {
port => 5891
codec => json
}
beats {
port => 5947
codec => json
}
}
Data is sent to Elasticsearch to port 9200
output {
elasticsearch { hosts => ["localhost:9200"]
index => "tomcat-local-%{+yyyy-MM-dd}"
}
}
Specify to logstash to take the config file example.conf in consideration
Add the config file into the pipeline.yml file
Give a unique pipeline id to this listener worker group
Point to the configuration file example.conf
Specify how many concurent thread will manage the data inputs that will be processed at the same time. (3 by default if pipeline.workers is not specified)
- pipeline.id: example
path.config: "C:\logstash-8.1\conf\example.conf"
pipeline.workers: 3
Start logstash and ensure that the the ports are listening and send a json example to see if logstash will forward it to elasticsearch.
Now i will show you how to connect to a msql DB and sends table rows to Elastic
The input section will use the mysql connector library, connect to mysql, run the SELECT statement every 5 minutes, the filter part will create field id, and remove 3 other fields before sending the data to elastic. The example also stores the latest value processed to be sure that the rows wont be processed twice.
We can see it this log that it begins with [TIMESTAMP]
So the purpose will be to create a regex that will recognize the beginning of each line.
Here is the regex that will match each beginning of lines: [\d{2}\/[^0-9]{3}\/\d{4}:\d{2}:\d{2}:\d{2} +\d{4}]
Configure filebeat:
Enable the filebeat input section, point to the file path and add the regex into the filebeat.yml configuration file (!!! you can have multiple input streams meaning you can send data from multiple log or steam):
Now you see how to be able to ingest logs with the main options that filebeat offers.
Forward data:
To forward data, filebeat offfers two options (!!! you can have only ONE output stream meaning you can only have one destination):
Send to Logstash:
You send to logstash by http, https, beats, beats over TLS.
With authentication or anonymously.
Send to Elasticsearch:
You send to elastci by http, https.
With authentication or anonymously.
Here is a schema example to propose a point of view of filebeat usage: