reger
39dd244693
fix ConcurrentScoreMap.set() calculation of totalCount()
...
+ test case
9 years ago
reger
3b47a07dd1
change unused servletProperties entry CONNECTION_PROP_CLIENT_REQUEST_HEADER to
...
use directly HttpServletRequest. This is used to get the http protocol version
in HTTPDProxyHandler.fulfillRequestFromWeb() for error response to client.
- adjust YaCyProxyServlet and UrlProxyServlet accordingly
- use more http_version constants in headerframework and httpdeamon
- equalize servlets (3) use of HeaderFramework.CONNECTION_PROP_HOST to HeaderFramework.HOST
9 years ago
reger
036c1dc6ef
fix CookieTest_p formatting (output of <br> as text),
...
change to dataoutput only by servlet, leave formatting to html.
+ removed link to obsolete env/grafics gif
9 years ago
luccioman
744c9a2615
Opensearch desc : handle https protocol url with default port (443)
...
This completes modifications made for mantis 669
(http://mantis.tokeek.de/view.php?id=669 )
9 years ago
reger
226f81cfcf
declare poison pill url MultiProtocolURL() as protected to make sure not
...
used from outside.
After double checking use of poison url revert path init from commit
f8632ad292
9 years ago
reger
f8632ad292
prevent string index out of bounds MultiProtocolURL.getPaths
...
as path maybe a empty string
+ init path to "" also in init for poison url (to guarantee success for
all existing uses of path w/o check for null)
9 years ago
reger
9b07bbf955
deprecate newurl(), not used and already replaced
...
instead of making it handle all supported the protocols
9 years ago
reger
774b3906a9
fix GenericFormatter.parse ("time","timeoffset")
...
change: UTC offset internally expected in minutes
9 years ago
reger
27163af0e1
improve detection of referenced links by taking http and https link protocol
...
into account
+ correct query start detection of commit f89d4eb51d
9 years ago
reger
f89d4eb51d
fix MultiProtocolURL init (assign of host) for urls with '/' in query part
...
+ add to test case
9 years ago
reger
87fcfc6d78
Adjusted hash computation and toNormalform for file:// protocol to deliver
...
same hash same file on Windows filesystem path with forward- and backslash in path.
Background see http://mantis.tokeek.de/view.php?id=671
+Test case
9 years ago
luccioman
a73c9327a5
JavaScript License fixes for LibreJS compatibility
9 years ago
reger
b3c9041f79
remove with localHostNames redundant (but unused) publicIPv4HostNames and publicIPv6HostNames
...
to free unused resources
9 years ago
reger
9e94989237
upd to PDFBox 2.0.1
9 years ago
reger
24b0fa2a38
extend snapshot Html2Image.pdf2image to use PDFBox image export capability
...
if no external tool installed (and for Win)
Resulting jpg are not always perfect (if graphic included) but imho sufficient.
9 years ago
reger
3adb670f44
remove never used Domains.myHostNames set
9 years ago
reger
ec24a0c85a
add test case for optimized toTokens()
9 years ago
reger
258cd41577
reduce logging (EmbeddedSolrConnector.query)
...
mainly to reduce the frequent metadat checks like
> EmbeddedSolrConnector.query QUERY: q={!cache=false raw f=id}xXxXxX&rows=1&start=0&fl=id,load_date_dt
(p.s. direct servlet queries logged via AccessTracker.addToDump)
9 years ago
reger
6d56beaed8
fix assertion exception in toString of MultiProtocolURL
...
toString of AnchorURL and MultiProtocolURL are identical code
(no need to override or to protect call to parent)
as reported in https://github.com/yacy/yacy_search_server/issues/43
9 years ago
reger
937fbb0b9f
correct isHidden() for smb from last commit
9 years ago
reger
535d4bf75f
respect hidden attribute for file and smb directory listing
...
(hidden directories are not listed, effects crawling of local file system)
9 years ago
reger
a6617ad887
expand initRemoteCrawler() to terminate worker threads if called to deactivate
...
remote crawl.
On startup we save the resources for remote crawler if disabled. Once started
threads are running idle after disable remote crawl. Now threads are terminated
to save the resources also while disabeling during runtime.
+ remove empty class Channels
9 years ago
reger
c91e712178
further refactor using standard java / (one) utf-8 charset variable
...
extending initiative of commit 9a25751850
9 years ago
luc
571bc55937
Refactoring : use StandardCharsets constants instead of hard-coded
...
charset names.
9 years ago
sixcooler
5a35f9383a
bump to solr/lucene 5.4.0
9 years ago
reger
a5faf73afa
remove obsolete yacy.init entries interaction.*
...
(related to removed triplestore)
9 years ago
reger
45b9bd8403
adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
...
and feeding hyperlinks to webgraph processing.
9 years ago
reger
b7e8358645
make use of header.getContentType where possible (mime is normalized afterwards)
...
otherwise use header.mime() differentiated in prev. commit.
9 years ago
reger
7a8c077838
fix HeaderFramework.mime() to strip charset parameter.
...
Differentiate mime() and getContentType() which gives the raw header field.
This improves parser detection if charsets are included in http content-type field.
9 years ago
reger
dec3e6ad96
fix: adjust urlstub for mailto links
...
(skip protocol)
9 years ago
reger
71c416f383
show mailto links in ViewFile.html linklist
9 years ago
reger
4d2b934487
prevent mailto links getting into parser result document's in/outbound link collection
...
by checking mailto scheme early.
- fix upper case mailto protocol assignment
- add test case for getProtocol
9 years ago
sixcooler
1be67d9ab6
CachedSolrConnector was replaced by ConcurrentUpdateSolrConnector years
...
ago - time to let it go
Commented out unused table of cache-objects
9 years ago
reger
28b8bc290a
fix use of NETWORK_SEARCHVERIFY for rwi verification
...
was not used to set the searchevent parameter (done in SearchEventCache.getEvent)
- remove unused corresponding QueryParams.filterfailurls param.
9 years ago
reger
020630efd8
remove unused network scanner parameter from queryparameter
...
Search event is not using networkscanner
(removed filterscannerfail param always init to false)
9 years ago
luc
f01d49c37a
Process large or local file images dealing directly with content
...
InputStream.
9 years ago
luc
3c4c77099d
If available, check content length before downloading. Check also
...
content length is not over Integer.MAX_VALUE.
9 years ago
reger
2985baaa01
Exclude repetitive protocol part in tokenized url
...
used as description if none is avail. from parser.
9 years ago
Michael Peter Christen
d1ae999ef9
replaced HashMap with LinkedHashMap to preserve the object order
10 years ago
reger
c9937973e3
unescape MultiProtocolURL getAttributes() return values.
...
use getAttributes() to get query parameters as clear text (w/o url encoding)
use getSearchpartMap() to get in internal format (url encoded)
fix for http://mantis.tokeek.de/view.php?id=606
10 years ago
reger
43c27aa550
upd to solr/lucene 5.3.1
10 years ago
reger
688f7b2a5c
allow/display svg images in image results previews
...
svg is not supported by awt but by most browser. Image content is delivered as received (without size adjustment)
10 years ago
Michael Peter Christen
8e555d79a3
add also 1-character tokens to the token list because that could be also
...
searched for. A full-string search for a filename may fail if those
1-char tokens are omitted
10 years ago
reger
bad34804fe
optimize parseInt for <img> tag attribute parsing
...
Performance better as using Numberformat.parse or parseInt(substring())
10 years ago
reger
52e3eb4ce8
harmonize/correct assignment to Ymarkmeta.mime
...
replace use of deprecated
10 years ago
Michael Peter Christen
87f358058e
Fix for index entries which have id's not computed as hash from the url.
...
This makes it possible to operate with outside-computed url hashes in
enterprise environments not using the build-in crawler from YaCy.
10 years ago
Michael Peter Christen
5f706797cb
patch for a bug inside of solr since solr 5.0 when using a boost
...
function with a numeric date field:
"unexpected docvalues type NUMERIC for field 'last_modified' (expected
one of [SORTED, SORTED_SET]). Use UninvertingReader or index with
docvalues."
This is a well-known bug inside solr which prevents that now the 'sort
by date' in the YaCy search interface can be used. Without this patch no
results at all is displayed (since the exception prevents that). Now
there is at least a result but it is not ordered properly.
10 years ago
reger
b4cbdea1e7
adapt SolrServerConnector.add to handle error on partial update input document.
...
In case of error we deleted the original document and added the new doc to the index.
This is not valid for partial update documents (which contain only a subset of the fields).
Remove the "delete" error handling step.
10 years ago
reger
e37a4f0b3d
prevent metadata records in index w/o valid url
...
by throwing MalformedURL exception on URIMetadataNode creation
10 years ago
reger
4cf875336c
complete TODO: getFileExtension handle dot in query part
...
+ testcase
10 years ago